[x265] [PATCH 0/3] AArch64 sse_pp Optimisations
Hari Limaye
hari.limaye at arm.com
Tue Jun 25 12:49:00 UTC 2024
Hi,
This series is based on the previously submitted patch-sets (AArch64 saoCuStats Optimisations, AArch64 SAD/SADxN Optimisations), and depends on CMake refactoring performed in those patch-sets.
Geometric mean of performance speedup on a Neoverse V1 machine (higher is better):
Existing Neon -> Optimised Neon: 1.60x
Optimised Neon -> Armv8.4 Neon DotProd: 1.73x
Many thanks,
Hari
Hari Limaye (3):
AArch64: Optimise Neon assembly implementations of sse_pp
AArch64: Remove SVE and SVE2 sse_pp primitives
AArch64: Add Armv8.4 Neon DotProd implementations of sse_pp
source/common/CMakeLists.txt | 4 +-
source/common/aarch64/asm-primitives.cpp | 24 +--
source/common/aarch64/fun-decls.h | 1 +
source/common/aarch64/ssd-a-common.S | 4 +-
source/common/aarch64/ssd-a-sve.S | 78 -------
source/common/aarch64/ssd-a-sve2.S | 261 -----------------------
source/common/aarch64/ssd-a.S | 259 ++++++++--------------
source/common/aarch64/ssd-neon-dotprod.S | 165 ++++++++++++++
8 files changed, 272 insertions(+), 524 deletions(-)
delete mode 100644 source/common/aarch64/ssd-a-sve.S
create mode 100644 source/common/aarch64/ssd-neon-dotprod.S
--
2.42.1
More information about the x265-devel
mailing list