[x265] [PATCH 00/14] AArch64: Add Armv8.4 Neon DotProd and Armv8.6 Neon I8MM implementations of ipfilter primitives

chen chenm003 at 163.com
Sun Sep 8 01:05:36 UTC 2024


Hi Hari,




Thank for the new patches, most looks good to me, just one comment.




[PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp

performance result looks not good enough, and why shortcut branch in case (coeffIdx == 4)?




Regards,
Chen

At 2024-09-06 21:32:25, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Hi,
>
>This patch series adds further optimised implementations of the ipfilter primitives, using Armv8.4 Neon DotProd and Armv8.6 Neon I8MM instructions.
>
>Relative performance numbers are in the individual commit messages.
>
>The series is based on the x265_git master branch.
>
>Many thanks,
>Hari
>
>George Steed (1):
>  testbench.cpp: Guard extensions based on architecture
>
>Hari Limaye (13):
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hpp
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_hps
>  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
>  AArch64: Add Armv8.4 Neon DotProd implementations of filter_hps
>  AArch64: Add Armv8.4 Neon DotProd implementation of interp_hv_pp
>  AArch64: Add Armv8.6 Neon I8MM feature detection
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hpp
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_hps
>  AArch64: Add Armv8.6 Neon I8MM implementations of chroma_hpp
>  AArch64: Add Armv8.6 Neon I8MM implementation of interp_hv_pp
>  AArch64: Add Armv8.4 Neon DotProd implementations of luma_vps
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vps
>  AArch64: Add Armv8.6 Neon I8MM implementations of luma_vpp
>
> build/README.txt                              |   23 +-
> source/CMakeLists.txt                         |   32 +-
> source/cmake/FindNEON_I8MM.cmake              |   21 +
> source/common/CMakeLists.txt                  |   14 +
> source/common/aarch64/asm-primitives.cpp      |   14 +
> source/common/aarch64/filter-neon-dotprod.cpp | 1131 +++++++++++++
> source/common/aarch64/filter-neon-dotprod.h   |   37 +
> source/common/aarch64/filter-neon-i8mm.cpp    | 1412 +++++++++++++++++
> source/common/aarch64/filter-neon-i8mm.h      |   37 +
> source/common/aarch64/mem-neon.h              |   16 +
> source/common/cpu.cpp                         |   18 +-
> source/test/testbench.cpp                     |    4 +
> source/x265.h                                 |    1 +
> 13 files changed, 2742 insertions(+), 18 deletions(-)
> create mode 100644 source/cmake/FindNEON_I8MM.cmake
> create mode 100644 source/common/aarch64/filter-neon-dotprod.cpp
> create mode 100644 source/common/aarch64/filter-neon-dotprod.h
> create mode 100644 source/common/aarch64/filter-neon-i8mm.cpp
> create mode 100644 source/common/aarch64/filter-neon-i8mm.h
>
>-- 
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240908/d3ae8b40/attachment.htm>


More information about the x265-devel mailing list