[x265] [PATCH] AArch64: Optimize pixel_avg_pp_4xh
chen
chenm003 at 163.com
Fri Jun 20 05:08:35 UTC 2025
The code looks good to me
btw: The LDR support Register Indirect Addressing, how about unroll(2) to reduce ADD operators?
At 2025-06-19 22:58:53, "Li Zhang" <li.zhang2 at arm.com> wrote:
>Use LDR and STR instead of LD1 to lane in the pixel_avg_pp_4xh assembly
>implementation. The new approach is a wholly destructive operation and
>removes a false dependency on the existing register contents.
>
>The change provides up to 2.5x speed up.
>---
> source/common/aarch64/mc-a.S | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
>diff --git a/source/common/aarch64/mc-a.S b/source/common/aarch64/mc-a.S
>index 130bf1a4a..ff18713fa 100644
>--- a/source/common/aarch64/mc-a.S
>+++ b/source/common/aarch64/mc-a.S
>@@ -38,10 +38,13 @@
> .macro pixel_avg_pp_4xN_neon h
> function PFX(pixel_avg_pp_4x\h\()_neon)
> .rept \h
>- ld1 {v0.s}[0], [x2], x3
>- ld1 {v1.s}[0], [x4], x5
>+ ldr s0, [x2]
>+ ldr s1, [x4]
>+ add x2, x2, x3
>+ add x4, x4, x5
> urhadd v2.8b, v0.8b, v1.8b
>- st1 {v2.s}[0], [x0], x1
>+ str s2, [x0]
>+ add x0, x0, x1
> .endr
> ret
> endfunc
>--
>2.39.5 (Apple Git-154)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20250620/ed1689ca/attachment.htm>
More information about the x265-devel
mailing list