<div data-ntes="ntes_mail_body_root" style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div id="spnEditorContent"><p style="margin: 0;">Hi Li,</p><p style="margin: 0;"><br></p><p style="margin: 0;">Thank for the improve patches.</p><p style="margin: 0;">It looks good to me, just a little comment below</p><p style="margin: 0;"><br></p><p style="margin: 0;">In the most function,<br><span style="font-family: arial; white-space-collapse: preserve;">+            int16x8_t a0 = vld1q_s16(src + w + 0);
</span><span style="font-family: arial; white-space-collapse: preserve;">+            int16x8_t a1 = vld1q_s16(src + w + 8);</span></p></div><div>How about performance compare to vld1q_s16_x2 ?</div><div><br></div><pre><div>Regards,
Chen

</div>At 2025-05-20 00:41:39, "Li Zhang" <li.zhang2@arm.com> wrote:
>Hello,
>
>This patch series optimizes and implements several AArch64 block copy
>primitives using Neon intrinsics. It also cleans up and removes the Neon
>and SVE assembly implementations that are either slower or offer no
>performance benefit.
>
>Many thanks,
>Li
>
>Li Zhang (8):
>  AArch64: Optimize blockcopy_pp_neon intrinsics implementation
>  AArch64: Optimize blockcopy_ps Neon intrinsics implementation
>  AArch64: Implement blockcopy_ss primitives using Neon intrinsics
>  AArch64: Implement blockcopy_sp primitives using Neon intrinsics
>  AArch64: Optimize cpy1Dto2D_shl Neon intrinsics implementation
>  AArch64: Optimize cpy2Dto1D_shl Neon intrinsics implementation
>  AArch64: Implement cpy2Dto1D_shr using Neon intrinsics
>  AArch64: Implement cpy1Dto2D_shr using Neon intrinsics
>
> source/common/CMakeLists.txt              |    2 +-
> source/common/aarch64/asm-primitives.cpp  |  180 ---
> source/common/aarch64/blockcopy8-common.S |   54 -
> source/common/aarch64/blockcopy8-sve.S    | 1346 ---------------------
> source/common/aarch64/blockcopy8.S        | 1049 ----------------
> source/common/aarch64/pixel-prim.cpp      |  358 +++++-
> 6 files changed, 305 insertions(+), 2684 deletions(-)
> delete mode 100644 source/common/aarch64/blockcopy8-common.S
>
>--
>2.39.5 (Apple Git-154)
>
>_______________________________________________
>x265-devel mailing list
>x265-devel@videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
</pre></div>