[x265] [PATCH] aarch64/pixel-util.S: Optimize scanPosLast_neon
chen
chenm003 at 163.com
Fri Mar 7 21:17:36 UTC 2025
Hi George,
Thank for the improve patch.
I just a little comments below,
At 2025-03-08 00:41:05, "George Steed" <george.steed at arm.com> wrote:
> source/common/aarch64/pixel-util.S | 94 +++++++++++++-----------------
> 1 file changed, 42 insertions(+), 52 deletions(-)
>
>diff --git a/source/common/aarch64/pixel-util.S b/source/common/aarch64/pixel-util.S
>index d8b3f4365..6635e52b1 100644
>--- a/source/common/aarch64/pixel-util.S
>+++ b/source/common/aarch64/pixel-util.S
>@@ -2213,27 +2213,25 @@ endfunc
> // const uint16_t* scanCG4x4, // x6
> // const int trSize) // x7
> function PFX(scanPosLast_neon)
>-.Loop_spl:
>- // position of current CG
>+ ldr q28, [x10] // v28 = mask for pmovmskb
>+ add x10, x7, x7 // 2*x7
>+ add x11, x7, x7, lsl #1 // 3*x7
>+ add x9, x4, #1 // CG count
>+
>+1:
This is GCC style label, please keep generic style of local label
> // coeffFlag = reverse_bit(w15) in 16-bit
>- rbit w12, w15
>- lsr w12, w12, #16
>- fmov s30, w12
>+ rbit w12, w13
>+ and w12, w12, #0xffff
Is this necessary?
> strh w12, [x3], #2
>
>- // compute coeffNum = popcount(coeffFlag)
>- cnt v30.8b, v30.8b
>- addp v30.8b, v30.8b, v30.8b
>- fmov w6, s30
>- sub x5, x5, x6
We are not need 64bits x5
>- strb w6, [x4], #1
>-
>- cbnz x5, .Loop_spl
>+ cbnz x5, 1b
Same x5 here
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20250308/e3c46e08/attachment.htm>
More information about the x265-devel
mailing list