[x265] [PATCH 0/3] AArch64 sse_pp Optimisations

Hari Limaye hari.limaye at arm.com
Tue Jun 25 12:49:00 UTC 2024


Hi,

This series is based on the previously submitted patch-sets (AArch64 saoCuStats Optimisations, AArch64 SAD/SADxN Optimisations), and depends on CMake refactoring performed in those patch-sets.

Geometric mean of performance speedup on a Neoverse V1 machine (higher is better):

Existing Neon  -> Optimised Neon:       1.60x
Optimised Neon -> Armv8.4 Neon DotProd: 1.73x

Many thanks,

Hari

Hari Limaye (3):
  AArch64: Optimise Neon assembly implementations of sse_pp
  AArch64: Remove SVE and SVE2 sse_pp primitives
  AArch64: Add Armv8.4 Neon DotProd implementations of sse_pp

 source/common/CMakeLists.txt             |   4 +-
 source/common/aarch64/asm-primitives.cpp |  24 +--
 source/common/aarch64/fun-decls.h        |   1 +
 source/common/aarch64/ssd-a-common.S     |   4 +-
 source/common/aarch64/ssd-a-sve.S        |  78 -------
 source/common/aarch64/ssd-a-sve2.S       | 261 -----------------------
 source/common/aarch64/ssd-a.S            | 259 ++++++++--------------
 source/common/aarch64/ssd-neon-dotprod.S | 165 ++++++++++++++
 8 files changed, 272 insertions(+), 524 deletions(-)
 delete mode 100644 source/common/aarch64/ssd-a-sve.S
 create mode 100644 source/common/aarch64/ssd-neon-dotprod.S

-- 
2.42.1



More information about the x265-devel mailing list