[x265] [PATCH 0/8] AArch64: Clean up and optimize block copy primitives
Li Zhang
li.zhang2 at arm.com
Mon May 19 16:41:39 UTC 2025
Hello,
This patch series optimizes and implements several AArch64 block copy
primitives using Neon intrinsics. It also cleans up and removes the Neon
and SVE assembly implementations that are either slower or offer no
performance benefit.
Many thanks,
Li
Li Zhang (8):
AArch64: Optimize blockcopy_pp_neon intrinsics implementation
AArch64: Optimize blockcopy_ps Neon intrinsics implementation
AArch64: Implement blockcopy_ss primitives using Neon intrinsics
AArch64: Implement blockcopy_sp primitives using Neon intrinsics
AArch64: Optimize cpy1Dto2D_shl Neon intrinsics implementation
AArch64: Optimize cpy2Dto1D_shl Neon intrinsics implementation
AArch64: Implement cpy2Dto1D_shr using Neon intrinsics
AArch64: Implement cpy1Dto2D_shr using Neon intrinsics
source/common/CMakeLists.txt | 2 +-
source/common/aarch64/asm-primitives.cpp | 180 ---
source/common/aarch64/blockcopy8-common.S | 54 -
source/common/aarch64/blockcopy8-sve.S | 1346 ---------------------
source/common/aarch64/blockcopy8.S | 1049 ----------------
source/common/aarch64/pixel-prim.cpp | 358 +++++-
6 files changed, 305 insertions(+), 2684 deletions(-)
delete mode 100644 source/common/aarch64/blockcopy8-common.S
--
2.39.5 (Apple Git-154)
More information about the x265-devel
mailing list