[x265] [PATCH] AArch64: Optimize pixel_avg_pp_12x16_neon

Gerda Zsejke More gerdazsejke.more at arm.com
Wed May 7 06:49:51 UTC 2025


Optimize pixel_avg_pp_12x16_neon by using more suitable load and
store instructions. Using LD1 for the 32-bit lane is a constructive
operation - needing to merge the new value for lane 0 with the
existing top half of the vector. Using LDR turns this into a wholly
destructive operation since LDR zeros the rest of the vector -
removing the false dependency.
---
 source/common/aarch64/mc-a.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/source/common/aarch64/mc-a.S b/source/common/aarch64/mc-a.S
index 8c2878b3e..130bf1a4a 100644
--- a/source/common/aarch64/mc-a.S
+++ b/source/common/aarch64/mc-a.S
@@ -73,13 +73,13 @@ function PFX(pixel_avg_pp_12x16_neon)
     sub             x3, x3, #4
     sub             x5, x5, #4
 .rept 16
-    ld1             {v0.s}[0], [x2], #4
+    ldr             s0, [x2], #4
     ld1             {v1.8b}, [x2], x3
-    ld1             {v2.s}[0], [x4], #4
+    ldr             s2, [x4], #4
     ld1             {v3.8b}, [x4], x5
     urhadd          v4.8b, v0.8b, v2.8b
     urhadd          v5.8b, v1.8b, v3.8b
-    st1             {v4.s}[0], [x0], #4
+    str             s4, [x0], #4
     st1             {v5.8b}, [x0], x1
 .endr
     ret
-- 
2.39.5 (Apple Git-154)

-------------- next part --------------
>From 56a22d5ea62fe1d86f4032c0858832bb80d88972 Mon Sep 17 00:00:00 2001
From: Gerda Zsejke More <gerdazsejke.more at arm.com>
Date: Sun, 27 Apr 2025 10:32:45 +0200
Subject: [PATCH] AArch64: Optimize pixel_avg_pp_12x16_neon

Optimize pixel_avg_pp_12x16_neon by using more suitable load and
store instructions. Using LD1 for the 32-bit lane is a constructive
operation - needing to merge the new value for lane 0 with the
existing top half of the vector. Using LDR turns this into a wholly
destructive operation since LDR zeros the rest of the vector -
removing the false dependency.
---
 source/common/aarch64/mc-a.S | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/source/common/aarch64/mc-a.S b/source/common/aarch64/mc-a.S
index 8c2878b3e..130bf1a4a 100644
--- a/source/common/aarch64/mc-a.S
+++ b/source/common/aarch64/mc-a.S
@@ -73,13 +73,13 @@ function PFX(pixel_avg_pp_12x16_neon)
     sub             x3, x3, #4
     sub             x5, x5, #4
 .rept 16
-    ld1             {v0.s}[0], [x2], #4
+    ldr             s0, [x2], #4
     ld1             {v1.8b}, [x2], x3
-    ld1             {v2.s}[0], [x4], #4
+    ldr             s2, [x4], #4
     ld1             {v3.8b}, [x4], x5
     urhadd          v4.8b, v0.8b, v2.8b
     urhadd          v5.8b, v1.8b, v3.8b
-    st1             {v4.s}[0], [x0], #4
+    str             s4, [x0], #4
     st1             {v5.8b}, [x0], x1
 .endr
     ret
-- 
2.39.5 (Apple Git-154)



More information about the x265-devel mailing list