[x265] Some more Arm64 patches to bring performance up on Graviton processors

chen chenm003 at 163.com
Fri Mar 25 07:20:48 UTC 2022


Hello,


a little comments


+function PFX(cpy2Dto1D_shl_64x64_neon)
+    cpy2Dto1D_shl_start
+    mov             w12, #32
+.loop_cpy2Dto1D_shl_64:
+    sub             w12, w12, #1
+.rept 2
+    ldp             q2, q3, [x1]
+    ldp             q4, q5, [x1, #32]
[MC] Why not LD1? same as STP





-#if X86_64

+#if X86_64 || defined(__aarch64__)

[MC] This is right, but for more generic, we can check with sizeof(long*)==8




Other are fine.


Regards,
Min Chen







2022-03-25 00:24:01,"Pop, Sebastian" <spop at amazon.com> 

Hi,





Please find attached a few more changes that bring up the performance of x265 on Arm64 processors.


Patches tested on Graviton2 aarch64-linux.


Ok to commit?





Thanks,


Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220325/8a772391/attachment.html>


More information about the x265-devel mailing list