[x265] [PATCH 00/14] AArch64: Add Armv8.4 Neon DotProd and Armv8.6 Neon I8MM implementations of ipfilter primitives
chen
chenm003 at 163.com
Sun Sep 8 01:05:36 UTC 2024
Hi Hari,
Thank for the new patches, most looks good to me, just one comment.
[PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
performance result looks not good enough, and why shortcut branch in case (coeffIdx == 4)?
Regards,
Chen
At 2024-09-06 21:32:25, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Hi,
>
>This patch series adds further optimised implementations of the ipfilter primitives, using Armv8.4 Neon DotProd and Armv8.6 Neon I8MM instructions.
>
>Relative performance numbers are in the individual commit messages.
>
>The series is based on the x265_git master branch.
>
>Many thanks,
>Hari
>
>George Steed (1):
> testbench.cpp: Guard extensions based on architecture
>
>Hari Limaye (13):
> AArch64: Add Armv8.4 Neon DotProd implementations of luma_hpp
> AArch64: Add Armv8.4 Neon DotProd implementations of luma_hps
> AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
> AArch64: Add Armv8.4 Neon DotProd implementations of filter_hps
> AArch64: Add Armv8.4 Neon DotProd implementation of interp_hv_pp
> AArch64: Add Armv8.6 Neon I8MM feature detection
> AArch64: Add Armv8.6 Neon I8MM implementations of luma_hpp
> AArch64: Add Armv8.6 Neon I8MM implementations of luma_hps
> AArch64: Add Armv8.6 Neon I8MM implementations of chroma_hpp
> AArch64: Add Armv8.6 Neon I8MM implementation of interp_hv_pp
> AArch64: Add Armv8.4 Neon DotProd implementations of luma_vps
> AArch64: Add Armv8.6 Neon I8MM implementations of luma_vps
> AArch64: Add Armv8.6 Neon I8MM implementations of luma_vpp
>
> build/README.txt | 23 +-
> source/CMakeLists.txt | 32 +-
> source/cmake/FindNEON_I8MM.cmake | 21 +
> source/common/CMakeLists.txt | 14 +
> source/common/aarch64/asm-primitives.cpp | 14 +
> source/common/aarch64/filter-neon-dotprod.cpp | 1131 +++++++++++++
> source/common/aarch64/filter-neon-dotprod.h | 37 +
> source/common/aarch64/filter-neon-i8mm.cpp | 1412 +++++++++++++++++
> source/common/aarch64/filter-neon-i8mm.h | 37 +
> source/common/aarch64/mem-neon.h | 16 +
> source/common/cpu.cpp | 18 +-
> source/test/testbench.cpp | 4 +
> source/x265.h | 1 +
> 13 files changed, 2742 insertions(+), 18 deletions(-)
> create mode 100644 source/cmake/FindNEON_I8MM.cmake
> create mode 100644 source/common/aarch64/filter-neon-dotprod.cpp
> create mode 100644 source/common/aarch64/filter-neon-dotprod.h
> create mode 100644 source/common/aarch64/filter-neon-i8mm.cpp
> create mode 100644 source/common/aarch64/filter-neon-i8mm.h
>
>--
>2.42.1
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240908/d3ae8b40/attachment.htm>
More information about the x265-devel
mailing list