[x265] [PATCH 0/7] AArch64 saoCuStats Optimisations

Hari Limaye hari.limaye at arm.com
Mon May 20 16:14:35 UTC 2024


Hi,

This patch-series adds AArch64 Neon, SVE, and SVE2 implementations of
the saoCuStats function primitives for low and high bitdepth.

This series is based on the previously submitted refactoring patch
series.

Performance numbers:

C -> Neon on Neoverse V1:
    Low bitdepth:
        saoCuStatsBO | 1.09x
        saoCuStatsE0 | 2.67x
        saoCuStatsE1 | 2.82x
        saoCuStatsE2 | 2.93x
        saoCuStatsE3 | 3.26x

    High bitdepth:
        saoCuStatsBO | 1.09x
        saoCuStatsE0 | 2.39x
        saoCuStatsE1 | 2.67x
        saoCuStatsE2 | 2.47x
        saoCuStatsE3 | 2.86x

Neon -> SVE on Neoverse V1:
    Low bitdepth:
        saoCuStatsE0 | 1.12x
        saoCuStatsE1 | 1.15x
        saoCuStatsE2 | 1.21x
        saoCuStatsE3 | 1.14x

    High bitdepth:
        saoCuStatsE0 | 1.19x
        saoCuStatsE1 | 1.28x
        saoCuStatsE2 | 1.19x
        saoCuStatsE3 | 1.12x

SVE -> SVE2 on Neoverse V2:
    Low bitdepth:
        saoCuStatsE0 | 1.08x
        saoCuStatsE1 | 1.06x
        saoCuStatsE2 | 1.06x
        saoCuStatsE3 | 1.09x

    High bitdepth:
        saoCuStatsE0 | 1.03x
        saoCuStatsE1 | 1.10x
        saoCuStatsE2 | 1.08x
        saoCuStatsE3 | 1.09x

Many thanks,

Hari

Hari Limaye (7):
  Test: Relax constraints of check_saoCuStatsE*
  Move duplicated signOf function to common header
  AArch64: Add Neon saoCuStats primitives for low bitdepth
  AArch64: Add Neon saoCuStats primitives for high bitdepth
  AArch64: Add check for arm_neon_sve_bridge.h
  AArch64: Add SVE saoCuStats primitives
  AArch64: Add SVE2 saoCuStats primitives

 source/CMakeLists.txt                     |  35 +-
 source/common/CMakeLists.txt              |  19 +-
 source/common/aarch64/asm-primitives.cpp  |  14 +
 source/common/aarch64/loopfilter-prim.cpp |  19 +-
 source/common/aarch64/sao-prim-sve.cpp    | 271 +++++++++++++++
 source/common/aarch64/sao-prim-sve2.cpp   | 317 ++++++++++++++++++
 source/common/aarch64/sao-prim.cpp        | 380 ++++++++++++++++++++++
 source/common/aarch64/sao-prim.h          | 100 ++++++
 source/common/common.h                    |   6 +
 source/common/loopfilter.cpp              |  16 +-
 source/encoder/sao.cpp                    |  74 ++---
 source/test/pixelharness.cpp              |  11 +-
 12 files changed, 1187 insertions(+), 75 deletions(-)
 create mode 100644 source/common/aarch64/sao-prim-sve.cpp
 create mode 100644 source/common/aarch64/sao-prim-sve2.cpp
 create mode 100644 source/common/aarch64/sao-prim.cpp
 create mode 100644 source/common/aarch64/sao-prim.h

-- 
2.42.1



More information about the x265-devel mailing list