[x265] [arm64] port LUMA_VPP_4xN
chen
chenm003 at 163.com
Fri Jul 2 00:16:57 UTC 2021
Hello,
Thank your patch, I make some comments.
+#ifdef __MACH__
+# define MACH
+#else
+# define MACH #
This is not good idea to bypass .const_data
+ ld1 {v0.s}[0], [x0], x1
+ ld1 {v0.s}[1], [x0], x1
+ ushll v0.8h, v0.8b, #0
...
+ // row[0-1]
+ mul v16.8h, v0.8h, v24.8h
Why not MULL?
+ ext v21.16b, v0.16b, v1.16b, #8
+ mul v17.8h, v21.8h, v24.8h
+ orr v0.16b, v1.16b, v1.16b
This is equal to MOV, I guess compiler will replace to right instruction on ARM64
+ // sum row[0-7]
+ dup v18.2d, v16.d[1]
+ dup v19.2d, v17.d[1]
+ add v16.4h, v16.4h, v18.4h
+ add v17.4h, v17.4h, v19.4h
How about ADDP?
2021-07-02 01:18:42,"Pop, Sebastian" <spop at amazon.com>
Hi,
the attached patch ports to arm64 the following kernels:
luma_vpp[ 4x4] 18.77x 27.66 519.22
luma_vpp[ 4x8] 22.73x 45.35 1030.72
luma_vpp[ 4x16] 25.10x 82.32 2066.41
Ok to commit?
Thanks,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210702/63e1d84c/attachment.html>
More information about the x265-devel
mailing list