[x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations
Hari Limaye
hari.limaye at arm.com
Thu May 23 17:12:04 UTC 2024
Hi,
This patch-series optimises the Neon implementations of SAD/SADxN primitives, adds new Armv8.4 Neon DotProd implementations, and performs some refactoring to AArch64 code.
This series is based on the previously submitted refactoring patch-series (AArch64 saoCuStats Optimisations).
Geometric mean of performance uplift when compiled with LLVM 17 on a Neoverse V1 machine (higher is better):
Existing Neon -> Optimised Neon: 1.45x
Optimised Neon -> Armv8.4 Neon DotProd: 1.03x
Many thanks,
Hari
Hari Limaye (8):
AArch64: Optimise Neon assembly implementations of SAD
AArch64: Optimise Neon assembly implementations of SADxN
AArch64: Remove SVE2 SAD/SADxN primitives
AArch64: Clean up CMake feature detection
AArch64: Add Armv8.4 Neon DotProd feature detection
AArch64: Refactor setup of optimised assembly primitives
AArch64: Add Armv8.4 Neon DotProd implementations of SAD
AArch64: Add Armv8.4 Neon DotProd implementations of SADxN
build/README.txt | 8 +
source/CMakeLists.txt | 89 ++-
source/cmake/FindNEON_DOTPROD.cmake | 21 +
source/common/CMakeLists.txt | 6 +-
source/common/aarch64/asm-primitives.cpp | 832 ++---------------------
source/common/aarch64/fun-decls.h | 21 +
source/common/aarch64/sad-a-common.S | 514 --------------
source/common/aarch64/sad-a-sve2.S | 511 --------------
source/common/aarch64/sad-a.S | 506 +++++++++++++-
source/common/aarch64/sad-neon-dotprod.S | 302 ++++++++
source/common/cpu.cpp | 19 +-
source/test/testbench.cpp | 3 +-
source/x265.h | 11 +-
13 files changed, 958 insertions(+), 1885 deletions(-)
create mode 100644 source/cmake/FindNEON_DOTPROD.cmake
delete mode 100644 source/common/aarch64/sad-a-common.S
delete mode 100644 source/common/aarch64/sad-a-sve2.S
create mode 100644 source/common/aarch64/sad-neon-dotprod.S
--
2.42.1
More information about the x265-devel
mailing list