[x265] [PATCH 00/12] AArch64: Optimise low bitdepth ipfilter primitives
Hari Limaye
hari.limaye at arm.com
Fri Aug 30 19:18:48 UTC 2024
This patch series optimises the existing Neon intrinsics implementations of the ipfilter primitives, and removes the assembly implementations in favour of these new implementations.
Relative performance observed for the new Neon intrinsics implementations, compared to the existing assembly implementations, is in the respective commit messages.
Many thanks,
Hari
Hari Limaye (12):
Test: Remove check for unused coeffIdx in ipfilter tests
Move ipfilter primitives into X265_NS
AArch64: Move ipfilter primitives into X265_NS
AArch64: Support all block sizes in p2s Neon
AArch64: Optimise low bitdepth interp_horiz_pp_neon
AArch64: Optimise low bitdepth interp_horiz_ps_neon
AArch64: Optimise low bitdepth interp_vert_ss_neon
AArch64: Optimise low bitdepth interp_vert_pp_neon
AArch64: Optimise low bitdepth interp_vert_ps_neon
AArch64: Optimise low bitdepth interp_vert_sp_neon
AArch64: Define all low bitdepth Neon ipfilter primitives
AArch64: Remove Assembly ipfilter primitives
source/common/CMakeLists.txt | 4 +-
source/common/aarch64/asm-primitives.cpp | 186 --
source/common/aarch64/filter-prim.cpp | 2877 ++++++++++++++++++----
source/common/aarch64/fun-decls.h | 15 -
source/common/aarch64/ipfilter-common.S | 1436 -----------
source/common/aarch64/ipfilter-sve2.S | 1282 ----------
source/common/aarch64/ipfilter.S | 1054 --------
source/common/aarch64/mem-neon.h | 193 ++
source/common/ipfilter.cpp | 8 +-
source/test/ipfilterharness.cpp | 24 +-
10 files changed, 2580 insertions(+), 4499 deletions(-)
delete mode 100644 source/common/aarch64/ipfilter-common.S
delete mode 100644 source/common/aarch64/ipfilter-sve2.S
delete mode 100644 source/common/aarch64/ipfilter.S
--
2.42.1
More information about the x265-devel
mailing list