[x265] [PATCH 0/8] AArch64 SAD/SADxN Optimisations

chen chenm003 at 163.com
Sat May 25 01:04:47 UTC 2024


Hi Hari,




These 8 patches looks good, the only comment on below code




=================================

.macro SAD_START_4 f

-    ld1             {v0.s}[0], [x0], x1
+    ldr             s0, [x0]
+    ldr             s1, [x2]
+    add             x0, x0, x1
+    add             x2, x2, x3
     ld1             {v0.s}[1], [x0], x1
-    ld1             {v1.s}[0], [x2], x3
     ld1             {v1.s}[1], [x2], x3
     \f              v16.8h, v0.8b, v1.8b
 .endm

In the document

LDR latency 5/-, throughput 2

ADD latency 2, throughput 2 

LD1  latency 7, throughput 2  (latency may optimize to 5)




In this case, replace LD1 by LDR+ADD is not get benefit

btw: same comment in SAD_X_START_4




=================================



At 2024-05-24 01:12:04, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Hi, > >This patch-series optimises the Neon implementations of SAD/SADxN primitives, adds new Armv8.4 Neon DotProd implementations, and performs some refactoring to AArch64 code. > >This series is based on the previously submitted refactoring patch-series (AArch64 saoCuStats Optimisations). > >Geometric mean of performance uplift when compiled with LLVM 17 on a Neoverse V1 machine (higher is better): > >Existing Neon -> Optimised Neon: 1.45x >Optimised Neon -> Armv8.4 Neon DotProd: 1.03x > >Many thanks, > >Hari > >Hari Limaye (8): > AArch64: Optimise Neon assembly implementations of SAD > AArch64: Optimise Neon assembly implementations of SADxN > AArch64: Remove SVE2 SAD/SADxN primitives > AArch64: Clean up CMake feature detection > AArch64: Add Armv8.4 Neon DotProd feature detection > AArch64: Refactor setup of optimised assembly primitives > AArch64: Add Armv8.4 Neon DotProd implementations of SAD > AArch64: Add Armv8.4 Neon DotProd implementations of SADxN > > build/README.txt | 8 + > source/CMakeLists.txt | 89 ++- > source/cmake/FindNEON_DOTPROD.cmake | 21 + > source/common/CMakeLists.txt | 6 +- > source/common/aarch64/asm-primitives.cpp | 832 ++--------------------- > source/common/aarch64/fun-decls.h | 21 + > source/common/aarch64/sad-a-common.S | 514 -------------- > source/common/aarch64/sad-a-sve2.S | 511 -------------- > source/common/aarch64/sad-a.S | 506 +++++++++++++- > source/common/aarch64/sad-neon-dotprod.S | 302 ++++++++ > source/common/cpu.cpp | 19 +- > source/test/testbench.cpp | 3 +- > source/x265.h | 11 +- > 13 files changed, 958 insertions(+), 1885 deletions(-) > create mode 100644 source/cmake/FindNEON_DOTPROD.cmake > delete mode 100644 source/common/aarch64/sad-a-common.S > delete mode 100644 source/common/aarch64/sad-a-sve2.S > create mode 100644 source/common/aarch64/sad-neon-dotprod.S > >-- >2.42.1 > >_______________________________________________ >x265-devel mailing list >x265-devel at videolan.org >https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240525/41c31dc5/attachment.htm>


More information about the x265-devel mailing list