[x265] [PATCH 0/4] AArch64: Optimize and add pixel_var Implementations
Li Zhang
Li.Zhang2 at arm.com
Wed Jun 18 08:13:37 UTC 2025
Hi Chen,
Thanks for the feedback.
The intrinsics is 1.07-1.18x speedup comparing to SVE asm, 1.03-1.18x speedup comparing
to Neon asm, depending on the block sizes.
Regards,
Li
From: chen <chenm003 at 163.com>
Date: Wednesday, 2025. June 18. at 6:37
To: Development for x265 <x265-devel at videolan.org>
Cc: nd <nd at arm.com>, Li Zhang <Li.Zhang2 at arm.com>
Subject: Re:[x265] [PATCH 0/4] AArch64: Optimize and add pixel_var Implementations
Hi Li,
Thank for the patches, it looks good to me, the only question is how much
improve on the performance after change asm to intrinsic.
Regards,
Chen
At 2025-06-18 02:22:25, "Li Zhang" <li.zhang2 at arm.com> wrote:
>Hi,
>
>This patch series optimizes the exisiting standard bit-depth pixel_var
>Neon intrinsics implementation, deletes the slower assembly
>implementation. It also adds Neon DotProd intrinsics implementation for
>the standard bit-depth and Neon, SVE intrinsics implementations for the
>high bit-depth of pixel_var function.
>
>Many thanks,
>Li
>
>Li Zhang (4):
> AArch64: Optimize and clean up SBD pixel_var functions
> AArch64: Add HBD pixel_var Neon intrinscis implementations
> AArch64: Add SBD pixel_var Neon DotProd intrinsics implementations
> AArch64: Add HBD pixel_var SVE intrinsics implementations
>
> source/common/CMakeLists.txt | 4 +-
> source/common/aarch64/asm-primitives.cpp | 14 +-
> source/common/aarch64/fun-decls.h | 10 -
> source/common/aarch64/neon-sve-bridge.h | 7 +
> .../aarch64/pixel-prim-neon-dotprod.cpp | 111 ++++++++++
> source/common/aarch64/pixel-prim-sve.cpp | 137 ++++++++++++
> source/common/aarch64/pixel-prim.cpp | 197 +++++++++++++++---
> source/common/aarch64/pixel-prim.h | 6 +
> source/common/aarch64/pixel-util-common.S | 27 ---
> source/common/aarch64/pixel-util-sve2.S | 195 -----------------
> source/common/aarch64/pixel-util.S | 61 ------
> 11 files changed, 434 insertions(+), 335 deletions(-)
> create mode 100644 source/common/aarch64/pixel-prim-neon-dotprod.cpp
> create mode 100644 source/common/aarch64/pixel-prim-sve.cpp
>
>--
>2.39.5 (Apple Git-154)
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20250618/5a15d601/attachment.htm>
More information about the x265-devel
mailing list