<div data-ntes="ntes_mail_body_root" style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div id="spnEditorContent"><p style="margin: 0;">Hi Hari,</p><p style="margin: 0;"><br></p><p style="margin: 0;">Thank for the new patches, most looks good to me, just one comment.</p><p style="margin: 0;"><br></p><p style="margin: 0;">[PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp</p><p style="margin: 0;">performance result looks not good enough, and why shortcut branch in case <span style="font-family: arial; white-space: pre-wrap;">(coeffIdx == 4)?</span></p><p style="margin: 0;"><br></p></div><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><div style="margin: 0;">Regards,</div><div style="margin: 0;">Chen</div><pre><br>At 2024-09-06 21:32:25, "Hari Limaye" <hari.limaye@arm.com> wrote:
>Hi,
>
>This patch series adds further optimised implementations of the ipfilter primitives, using Armv8.4 Neon DotProd and Armv8.6 Neon I8MM instructions.
>
>Relative performance numbers are in the individual commit messages.
>
>The series is based on the x265_git master branch.
>
>Many thanks,
>Hari
>
>George Steed (1):
>  testbench.cpp: Guard extensions based on architecture
>
>Hari Limaye (13):
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hpp
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hps
>  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
>  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hps
>  AArch64: Add Armv8.4 Neon DotProd implementation of interp_hv_pp
>  AArch64: Add Armv8.6 Neon I8MM feature detection
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hpp
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hps
>  AArch64: Add Armv8.6 Neon I8MM implementations of chroma_hpp
>  AArch64: Add Armv8.6 Neon I8MM implementation of interp_hv_pp
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_vps
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vps
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vpp
>
> build/README.txt                              |   23 +-
> source/CMakeLists.txt                         |   32 +-
> source/cmake/FindNEON_I8MM.cmake              |   21 +
> source/common/CMakeLists.txt                  |   14 +
> source/common/aarch64/asm-primitives.cpp      |   14 +
> source/common/aarch64/filter-neon-dotprod.cpp | 1131 +++++++++++++
> source/common/aarch64/filter-neon-dotprod.h   |   37 +
> source/common/aarch64/filter-neon-i8mm.cpp    | 1412 +++++++++++++++++
> source/common/aarch64/filter-neon-i8mm.h      |   37 +
> source/common/aarch64/mem-neon.h              |   16 +
> source/common/cpu.cpp                         |   18 +-
> source/test/testbench.cpp                     |    4 +
> source/x265.h                                 |    1 +
> 13 files changed, 2742 insertions(+), 18 deletions(-)
> create mode 100644 source/cmake/FindNEON_I8MM.cmake
> create mode 100644 source/common/aarch64/filter-neon-dotprod.cpp
> create mode 100644 source/common/aarch64/filter-neon-dotprod.h
> create mode 100644 source/common/aarch64/filter-neon-i8mm.cpp
> create mode 100644 source/common/aarch64/filter-neon-i8mm.h
>
>-- 
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel@videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
</pre></div>