[x265] [PATCH 0/9] AArch64: Optimise DCT primitives
Hari Limaye
hari.limaye at arm.com
Thu Aug 22 15:17:39 UTC 2024
This patch series optimises the existing Neon implementations of the DCT primitives, and also adds new SVE implementations of these functions.
Relative performance observed for the new Neon implementations compared to the existing Neon implementations:
dct8_neon:
Neoverse N1: 2.69x
Neoverse V1: 4.57x
Neoverse N2: 2.87x
Neoverse V2: 4.87x
dct16_neon:
Neoverse N1: 1.31x
Neoverse V1: 1.69x
Neoverse N2: 1.29x
Neoverse V2: 1.78x
dct32_neon:
Neoverse N1: 1.37x
Neoverse V1: 1.59x
Neoverse N2: 1.30x
Neoverse V2: 1.28x
Relative performance observed for the SVE implementations compared to the new Neon implementations:
dct8_sve:
Neoverse-V1: 1.00x
Neoverse-V2: 1.23x
Neoverse-N2: 1.27x
dct16_sve:
Neoverse-V1: 1.04x
Neoverse-V2: 1.35x
Neoverse-N2: 1.42x
dct32_sve:
Neoverse-V1: 1.13x
Neoverse-V2: 1.44x
Neoverse-N2: 1.55x
The patches are based on the preceding refactoring patch sets:
- AArch64: Enable building with -flax-vector-conversions=none
- AArch64: Enable compilation of intrinsics files with -Werror
Many thanks,
Hari
Hari Limaye (5):
Move C DCT implementations into X265_NS
AArch64: Move Neon DCT implementations into X265_NS
AArch64: Optimise partialButterfly8_neon
AArch64: Optimise partialButterfly16_neon
AArch64: Optimise partialButterfly32_neon
Jonathan Wright (4):
AArch64: Move Neon-SVE bridge helpers into dedicated header
AArch64: Add SVE implementation of 8x8 DCT
AArch64: Add SVE implementation of 16x16 DCT
AArch64: Add SVE implementation of 32x32 DCT
source/common/CMakeLists.txt | 2 +-
source/common/aarch64/asm-primitives.cpp | 1 +
source/common/aarch64/dct-prim-sve.cpp | 501 ++++++++++++++++++++
source/common/aarch64/dct-prim.cpp | 563 +++++++++++++++--------
source/common/aarch64/dct-prim.h | 31 ++
source/common/aarch64/neon-sve-bridge.h | 67 +++
source/common/aarch64/sao-prim.h | 32 +-
source/common/dct.cpp | 340 +++++++-------
8 files changed, 1147 insertions(+), 390 deletions(-)
create mode 100644 source/common/aarch64/dct-prim-sve.cpp
create mode 100644 source/common/aarch64/neon-sve-bridge.h
--
2.42.1
More information about the x265-devel
mailing list