[x265] [PATCH 00/12] AArch64: Optimise low bitdepth ipfilter primitives

Hari Limaye hari.limaye at arm.com
Fri Aug 30 19:18:48 UTC 2024


This patch series optimises the existing Neon intrinsics implementations of the ipfilter primitives, and removes the assembly implementations in favour of these new implementations.

Relative performance observed for the new Neon intrinsics implementations, compared to the existing assembly implementations, is in the respective commit messages.

Many thanks,
Hari

Hari Limaye (12):
  Test: Remove check for unused coeffIdx in ipfilter tests
  Move ipfilter primitives into X265_NS
  AArch64: Move ipfilter primitives into X265_NS
  AArch64: Support all block sizes in p2s Neon
  AArch64: Optimise low bitdepth interp_horiz_pp_neon
  AArch64: Optimise low bitdepth interp_horiz_ps_neon
  AArch64: Optimise low bitdepth interp_vert_ss_neon
  AArch64: Optimise low bitdepth interp_vert_pp_neon
  AArch64: Optimise low bitdepth interp_vert_ps_neon
  AArch64: Optimise low bitdepth interp_vert_sp_neon
  AArch64: Define all low bitdepth Neon ipfilter primitives
  AArch64: Remove Assembly ipfilter primitives

 source/common/CMakeLists.txt             |    4 +-
 source/common/aarch64/asm-primitives.cpp |  186 --
 source/common/aarch64/filter-prim.cpp    | 2877 ++++++++++++++++++----
 source/common/aarch64/fun-decls.h        |   15 -
 source/common/aarch64/ipfilter-common.S  | 1436 -----------
 source/common/aarch64/ipfilter-sve2.S    | 1282 ----------
 source/common/aarch64/ipfilter.S         | 1054 --------
 source/common/aarch64/mem-neon.h         |  193 ++
 source/common/ipfilter.cpp               |    8 +-
 source/test/ipfilterharness.cpp          |   24 +-
 10 files changed, 2580 insertions(+), 4499 deletions(-)
 delete mode 100644 source/common/aarch64/ipfilter-common.S
 delete mode 100644 source/common/aarch64/ipfilter-sve2.S
 delete mode 100644 source/common/aarch64/ipfilter.S

-- 
2.42.1



More information about the x265-devel mailing list