[x265] [PATCH 00/11] AArch64: Add Neon and SVE asm impl. of HBD SSE/SSD
chen
chenm003 at 163.com
Mon Dec 16 03:26:11 UTC 2024
Hi Gerda,
Thank for the explain.
LDP give more bandwidth in most ARM CPU, the extra ADD instruction may execute parallelism in pipeline, so it may faster.
However, in this function, the affect is small, we can keep your code.
Regards,
Chen
At 2024-12-14 00:06:12, "Gerda Zsejke More" <GerdaZsejke.More at arm.com> wrote:
Hi Chen,
LD1 was used here because LDP can’t post increment the x0 and x2 registers (we are loading into two registers, but the same applies to LDR as well).
We would need a separate ADD instruction after the load, and this performs the same as the existing code.
Thanks,
Gerda
> Thank for the patches, I have some comments
> * In current version, we support pixel up to 12 bits, so sse_pp equal to sse_ss, of course, separate 16-bits version is not bad idea.
> * In below code, LD1 vs LDR, which one better?
> + ld1 {v16.8h-v17.8h}, [x0], x1
> + ld1 {v18.8h-v19.
At 2024-12-10 23:59:15, "Gerda Zsejke More" <gerdazsejke.more at arm.com> wrote:
>Hi,
>
>This patch series adds Neon and SVE asm implementation of HBD SSE_PP, SSE_SS and SSD_S functions.
>The added HBD SSE_SS and SSD_S SVE implementation is suitable for SBD as well, so enable it for that.
>Delete unused Neon intrinsics functions for SSE and SSD_S.
>
>This series is based on the master branch.
>
>Many thanks,
>Gerda
>
>Gerda Zsejke More (11):
> Avoid aliasing HBD SSE_PP functions for AArch64 platforms
> AArch64: Add Neon asm implementation of HBD SSE_PP
> AArch64: Add SVE asm implementation of HBD SSE_PP
> AArch64: Add Neon asm implementation of HBD SSE_SS
> AArch64: Add SVE asm implementation of HBD SSE_SS
> AArch64: Enable existing SSE_SS SVE impl for SBD
> AArch64: Delete sse_neon implementation
> AArch64: Add Neon asm implementation of HBD SSD_S
> AArch64: Add SVE asm implementation of HBD SSD_S
> AArch64: Enable existing SSD_S SVE impl for SBD
> AArch64: Delete pixel_ssd_s_neon implementation
>
> source/common/CMakeLists.txt | 4 +-
> source/common/aarch64/asm-primitives.cpp | 84 +--
> source/common/aarch64/pixel-prim.cpp | 89 ----
> source/common/aarch64/ssd-a-sve.S | 483 +++++++++++++++++
> source/common/aarch64/ssd-a-sve2.S | 626 -----------------------
> source/common/aarch64/ssd-a.S | 525 +++++++++++++++++++
> source/common/primitives.cpp | 2 +
> 7 files changed, 1063 insertions(+), 750 deletions(-)
> create mode 100644 source/common/aarch64/ssd-a-sve.S
> delete mode 100644 source/common/aarch64/ssd-a-sve2.S
>
>--
>2.39.5 (Apple Git-154)
>
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20241216/05534916/attachment.htm>
More information about the x265-devel
mailing list