[x265] [PATCH v2 2/7] AArch64: Add Neon implementation of 4x4 IDST
chen
chenm003 at 163.com
Fri Dec 6 04:44:47 UTC 2024
At 2024-12-04 23:38:12, "Micro Daryl Robles" <microdaryl.robles at arm.com> wrote:
>+template<int shift>
>+static inline void inverseDst4_neon(const int16_t *src, int16_t *dst, intptr_t dstStride)
>+{
>+ int16x4_t s0 = vld1_s16(src + 0);
>+ int16x4_t s1 = vld1_s16(src + 4);
s0 and s1 may load by 128-bits instruction
>+ int16x4_t s2 = vld1_s16(src + 8);
>+ int16x4_t s3 = vld1_s16(src + 12);
>+
>+ int32x4_t c0 = vaddl_s16(s0, s2);
>+ int32x4_t c1 = vaddl_s16(s2, s3);
>+ int32x4_t c2 = vsubl_s16(s0, s3);
>+ int32x4_t c3 = vmull_n_s16(s1, 74);
with above optimize, s1 may use by instcution smull2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20241206/d6d794db/attachment.htm>
More information about the x265-devel
mailing list