[x265] [PATCH] aarch64/pixel-util.S: Optimize scanPosLast_neon

chen chenm003 at 163.com
Fri Mar 7 21:17:36 UTC 2025


Hi George,




Thank for the improve patch.

I just a little comments below,




At 2025-03-08 00:41:05, "George Steed" <george.steed at arm.com> wrote:
> source/common/aarch64/pixel-util.S | 94 +++++++++++++-----------------
> 1 file changed, 42 insertions(+), 52 deletions(-)
>
>diff --git a/source/common/aarch64/pixel-util.S b/source/common/aarch64/pixel-util.S
>index d8b3f4365..6635e52b1 100644
>--- a/source/common/aarch64/pixel-util.S
>+++ b/source/common/aarch64/pixel-util.S
>@@ -2213,27 +2213,25 @@ endfunc
> //     const uint16_t* scanCG4x4, // x6
> //     const int trSize)          // x7
> function PFX(scanPosLast_neon)
>-.Loop_spl:
>-    // position of current CG
>+    ldr             q28, [x10]              // v28 = mask for pmovmskb
>+    add             x10, x7, x7             // 2*x7
>+    add             x11, x7, x7, lsl #1     // 3*x7
>+    add             x9, x4, #1              // CG count
>+

>+1:
This is GCC style label, please keep generic style of local label




>     // coeffFlag = reverse_bit(w15) in 16-bit
>-    rbit            w12, w15
>-    lsr             w12, w12, #16
>-    fmov            s30, w12
>+    rbit            w12, w13

>+    and             w12, w12, #0xffff
Is this necessary?


>     strh            w12, [x3], #2

> 
>-    // compute coeffNum = popcount(coeffFlag)
>-    cnt             v30.8b, v30.8b
>-    addp            v30.8b, v30.8b, v30.8b
>-    fmov            w6, s30

>-    sub             x5, x5, x6
We are not need 64bits x5


>-    strb            w6, [x4], #1
>-
>-    cbnz            x5, .Loop_spl

>+    cbnz            x5, 1b
Same x5 here

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20250308/e3c46e08/attachment.htm>


More information about the x265-devel mailing list