[x265] [PATCH] AArch64: Optimize pixel_avg_pp_12x16_neon
chen
chenm003 at 163.com
Wed May 7 15:13:01 UTC 2025
Hi,
Thank for improve instruction, it looks good to me.
Regards,
Chen
At 2025-05-07 14:49:51, "Gerda Zsejke More" <gerdazsejke.more at arm.com> wrote:
>Optimize pixel_avg_pp_12x16_neon by using more suitable load and
>store instructions. Using LD1 for the 32-bit lane is a constructive
>operation - needing to merge the new value for lane 0 with the
>existing top half of the vector. Using LDR turns this into a wholly
>destructive operation since LDR zeros the rest of the vector -
>removing the false dependency.
>---
> source/common/aarch64/mc-a.S | 6 +++---
> 1 file changed, 3 insertions(+), 3 deletions(-)
>
>diff --git a/source/common/aarch64/mc-a.S b/source/common/aarch64/mc-a.S
>index 8c2878b3e..130bf1a4a 100644
>--- a/source/common/aarch64/mc-a.S
>+++ b/source/common/aarch64/mc-a.S
>@@ -73,13 +73,13 @@ function PFX(pixel_avg_pp_12x16_neon)
> sub x3, x3, #4
> sub x5, x5, #4
> .rept 16
>- ld1 {v0.s}[0], [x2], #4
>+ ldr s0, [x2], #4
> ld1 {v1.8b}, [x2], x3
>- ld1 {v2.s}[0], [x4], #4
>+ ldr s2, [x4], #4
> ld1 {v3.8b}, [x4], x5
> urhadd v4.8b, v0.8b, v2.8b
> urhadd v5.8b, v1.8b, v3.8b
>- st1 {v4.s}[0], [x0], #4
>+ str s4, [x0], #4
> st1 {v5.8b}, [x0], x1
> .endr
> ret
>--
>2.39.5 (Apple Git-154)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20250507/26719bf2/attachment.htm>
More information about the x265-devel
mailing list