[x265] [PATCH 0/9] AArch64: Optimise DCT primitives

Hari Limaye hari.limaye at arm.com
Thu Aug 22 15:17:39 UTC 2024


This patch series optimises the existing Neon implementations of the DCT primitives, and also adds new SVE implementations of these functions.

Relative performance observed for the new Neon implementations compared to the existing Neon implementations:

dct8_neon:
  Neoverse N1: 2.69x
  Neoverse V1: 4.57x
  Neoverse N2: 2.87x
  Neoverse V2: 4.87x

dct16_neon:
  Neoverse N1: 1.31x
  Neoverse V1: 1.69x
  Neoverse N2: 1.29x
  Neoverse V2: 1.78x

dct32_neon:
  Neoverse N1: 1.37x
  Neoverse V1: 1.59x
  Neoverse N2: 1.30x
  Neoverse V2: 1.28x

Relative performance observed for the SVE implementations compared to the new Neon implementations:

dct8_sve:
  Neoverse-V1: 1.00x
  Neoverse-V2: 1.23x
  Neoverse-N2: 1.27x

dct16_sve:
  Neoverse-V1: 1.04x
  Neoverse-V2: 1.35x
  Neoverse-N2: 1.42x

dct32_sve:
  Neoverse-V1: 1.13x
  Neoverse-V2: 1.44x
  Neoverse-N2: 1.55x

The patches are based on the preceding refactoring patch sets:
- AArch64: Enable building with -flax-vector-conversions=none
- AArch64: Enable compilation of intrinsics files with -Werror

Many thanks,
Hari

Hari Limaye (5):
  Move C DCT implementations into X265_NS
  AArch64: Move Neon DCT implementations into X265_NS
  AArch64: Optimise partialButterfly8_neon
  AArch64: Optimise partialButterfly16_neon
  AArch64: Optimise partialButterfly32_neon

Jonathan Wright (4):
  AArch64: Move Neon-SVE bridge helpers into dedicated header
  AArch64: Add SVE implementation of 8x8 DCT
  AArch64: Add SVE implementation of 16x16 DCT
  AArch64: Add SVE implementation of 32x32 DCT

 source/common/CMakeLists.txt             |   2 +-
 source/common/aarch64/asm-primitives.cpp |   1 +
 source/common/aarch64/dct-prim-sve.cpp   | 501 ++++++++++++++++++++
 source/common/aarch64/dct-prim.cpp       | 563 +++++++++++++++--------
 source/common/aarch64/dct-prim.h         |  31 ++
 source/common/aarch64/neon-sve-bridge.h  |  67 +++
 source/common/aarch64/sao-prim.h         |  32 +-
 source/common/dct.cpp                    | 340 +++++++-------
 8 files changed, 1147 insertions(+), 390 deletions(-)
 create mode 100644 source/common/aarch64/dct-prim-sve.cpp
 create mode 100644 source/common/aarch64/neon-sve-bridge.h

-- 
2.42.1



More information about the x265-devel mailing list