[x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

Hari Limaye hari.limaye at arm.com
Thu May 23 17:12:04 UTC 2024


Hi,

This patch-series optimises the Neon implementations of SAD/SADxN primitives, adds new Armv8.4 Neon DotProd implementations, and performs some refactoring to AArch64 code.

This series is based on the previously submitted refactoring patch-series (AArch64 saoCuStats Optimisations).

Geometric mean of performance uplift when compiled with LLVM 17 on a Neoverse V1 machine (higher is better):

Existing Neon  -> Optimised Neon:       1.45x
Optimised Neon -> Armv8.4 Neon DotProd: 1.03x

Many thanks,

Hari

Hari Limaye (8):
  AArch64: Optimise Neon assembly implementations of SAD
  AArch64: Optimise Neon assembly implementations of SADxN
  AArch64: Remove SVE2 SAD/SADxN primitives
  AArch64: Clean up CMake feature detection
  AArch64: Add Armv8.4 Neon DotProd feature detection
  AArch64: Refactor setup of optimised assembly primitives
  AArch64: Add Armv8.4 Neon DotProd implementations of SAD
  AArch64: Add Armv8.4 Neon DotProd implementations of SADxN

 build/README.txt                         |   8 +
 source/CMakeLists.txt                    |  89 ++-
 source/cmake/FindNEON_DOTPROD.cmake      |  21 +
 source/common/CMakeLists.txt             |   6 +-
 source/common/aarch64/asm-primitives.cpp | 832 ++---------------------
 source/common/aarch64/fun-decls.h        |  21 +
 source/common/aarch64/sad-a-common.S     | 514 --------------
 source/common/aarch64/sad-a-sve2.S       | 511 --------------
 source/common/aarch64/sad-a.S            | 506 +++++++++++++-
 source/common/aarch64/sad-neon-dotprod.S | 302 ++++++++
 source/common/cpu.cpp                    |  19 +-
 source/test/testbench.cpp                |   3 +-
 source/x265.h                            |  11 +-
 13 files changed, 958 insertions(+), 1885 deletions(-)
 create mode 100644 source/cmake/FindNEON_DOTPROD.cmake
 delete mode 100644 source/common/aarch64/sad-a-common.S
 delete mode 100644 source/common/aarch64/sad-a-sve2.S
 create mode 100644 source/common/aarch64/sad-neon-dotprod.S

-- 
2.42.1



More information about the x265-devel mailing list