[x265] [PATCH 0/4] Optimise AArch64 quant/nquant primitives

Mon Aug 12 21:14:28 UTC 2024

Hi,

This series optimises the AArch64 Neon implementations of quant and
nquant primitives.

The series has no dependencies on any other on-list patch sets, except
for patch:
    [PATCH 1/4] AArch64: Remove SVE assembly implementation of quant
which is based on refactoring changes to aarch64/asm-primitives.cpp made
in the SAD patch series:
    https://mailman.videolan.org/pipermail/x265-devel/2024-May/013707.html

Relative performance observed compared to the existing Neon
implementation:

quant_neon:

  Neoverse N1: 1.57x
  Neoverse V1: 1.59x
  Neoverse N2: 1.54x
  Neoverse V2: 1.59x

nquant_neon:

  Neoverse N1: 1.79x
  Neoverse V1: 1.77x
  Neoverse N2: 1.70x
  Neoverse V2: 1.73x

Many thanks,

Hari

Hari Limaye (4):
  AArch64: Remove SVE assembly implementation of quant
  AArch64: Optimise quant_neon
  Test: Update values used in check_nquant_primitive
  AArch64: Optimise nquant_neon

 source/common/aarch64/asm-primitives.cpp |   3 -
 source/common/aarch64/fun-decls.h        |   2 -
 source/common/aarch64/pixel-util-sve.S   |  57 -------------
 source/common/aarch64/pixel-util.S       | 103 ++++++++++++-----------
 source/test/mbdstharness.cpp             |  10 ++-
 5 files changed, 61 insertions(+), 114 deletions(-)

-- 
2.42.1