[x265] [PATCH 0/7] AArch64 saoCuStats Optimisations

chen chenm003 at 163.com
Tue May 21 06:38:36 UTC 2024


Hi Hari,




Thanks for the new ARM patches.

In signOf_neon
>+ // signOf(a - b) = -(a > b) | (b > a)
comments is not clear, suggest
-(a > b ? -1 : 0) | ( a < b)
In saoCuStatsBO_neon
It is memory bandwidth optimize only, interval memory access strong depends on CPU pipeline design and compiler, it is not generic, not sure how about on other kind of CPUs.

In saoCuStatsE*_neon
No comments, it looks vmulq_s16+vmlaq_s16 reduce 1 instruction than vandq_s16+vandq_s16+vaddq_s16 or tbl/tbx, it mostly faster on modern CPUs
In saoCuStats*_sve, saoCuStats*_sve2
No comments since it is similar algorithm as Neon



Regards,
Chen

At 2024-05-21 00:14:35, "Hari Limaye" <hari.limaye at arm.com> wrote:

>Hi,
>
>This patch-series adds AArch64 Neon, SVE, and SVE2 implementations of
>the saoCuStats function primitives for low and high bitdepth.
>
>This series is based on the previously submitted refactoring patch
>series.
>
>Performance numbers:
>
>C -> Neon on Neoverse V1:
>    Low bitdepth:
>        saoCuStatsBO | 1.09x
>        saoCuStatsE0 | 2.67x
>        saoCuStatsE1 | 2.82x
>        saoCuStatsE2 | 2.93x
>        saoCuStatsE3 | 3.26x
>
>    High bitdepth:
>        saoCuStatsBO | 1.09x
>        saoCuStatsE0 | 2.39x
>        saoCuStatsE1 | 2.67x
>        saoCuStatsE2 | 2.47x
>        saoCuStatsE3 | 2.86x
>
>Neon -> SVE on Neoverse V1:
>    Low bitdepth:
>        saoCuStatsE0 | 1.12x
>        saoCuStatsE1 | 1.15x
>        saoCuStatsE2 | 1.21x
>        saoCuStatsE3 | 1.14x
>
>    High bitdepth:
>        saoCuStatsE0 | 1.19x
>        saoCuStatsE1 | 1.28x
>        saoCuStatsE2 | 1.19x
>        saoCuStatsE3 | 1.12x
>
>SVE -> SVE2 on Neoverse V2:
>    Low bitdepth:
>        saoCuStatsE0 | 1.08x
>        saoCuStatsE1 | 1.06x
>        saoCuStatsE2 | 1.06x
>        saoCuStatsE3 | 1.09x
>
>    High bitdepth:
>        saoCuStatsE0 | 1.03x
>        saoCuStatsE1 | 1.10x
>        saoCuStatsE2 | 1.08x
>        saoCuStatsE3 | 1.09x
>
>Many thanks,
>
>Hari
>
>Hari Limaye (7):
>  Test: Relax constraints of check_saoCuStatsE*
>  Move duplicated signOf function to common header
>  AArch64: Add Neon saoCuStats primitives for low bitdepth
>  AArch64: Add Neon saoCuStats primitives for high bitdepth
>  AArch64: Add check for arm_neon_sve_bridge.h
>  AArch64: Add SVE saoCuStats primitives
>  AArch64: Add SVE2 saoCuStats primitives
>
> source/CMakeLists.txt                     |  35 +-
> source/common/CMakeLists.txt              |  19 +-
> source/common/aarch64/asm-primitives.cpp  |  14 +
> source/common/aarch64/loopfilter-prim.cpp |  19 +-
> source/common/aarch64/sao-prim-sve.cpp    | 271 +++++++++++++++
> source/common/aarch64/sao-prim-sve2.cpp   | 317 ++++++++++++++++++
> source/common/aarch64/sao-prim.cpp        | 380 ++++++++++++++++++++++
> source/common/aarch64/sao-prim.h          | 100 ++++++
> source/common/common.h                    |   6 +
> source/common/loopfilter.cpp              |  16 +-
> source/encoder/sao.cpp                    |  74 ++---
> source/test/pixelharness.cpp              |  11 +-
> 12 files changed, 1187 insertions(+), 75 deletions(-)
> create mode 100644 source/common/aarch64/sao-prim-sve.cpp
> create mode 100644 source/common/aarch64/sao-prim-sve2.cpp
> create mode 100644 source/common/aarch64/sao-prim.cpp
> create mode 100644 source/common/aarch64/sao-prim.h
>
>-- 
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240521/38b071cc/attachment.htm>


More information about the x265-devel mailing list