[x265] [arm64] port LUMA_VPP_4xN

Fri Jul 2 00:16:57 UTC 2021

Hello,

Thank your patch, I make some comments.

+#ifdef __MACH__

+#   define MACH

+#else

+#   define MACH #

This is not good idea to bypass .const_data

+    ld1             {v0.s}[0], [x0], x1
+    ld1             {v0.s}[1], [x0], x1
+    ushll           v0.8h, v0.8b, #0
...
+    // row[0-1]
+    mul             v16.8h, v0.8h, v24.8h
Why not MULL?

+    ext             v21.16b, v0.16b, v1.16b, #8

+    mul             v17.8h, v21.8h, v24.8h

+    orr             v0.16b, v1.16b, v1.16b

This is equal to MOV, I guess compiler will replace to right instruction on ARM64

+    // sum row[0-7]

+    dup             v18.2d, v16.d[1]

+    dup             v19.2d, v17.d[1]

+    add             v16.4h, v16.4h, v18.4h

+    add             v17.4h, v17.4h, v19.4h

How about ADDP?

 2021-07-02 01:18:42，"Pop, Sebastian" <spop at amazon.com> 

Hi,

the attached patch ports to arm64 the following kernels:

luma_vpp[  4x4]         18.77x   27.66           519.22

luma_vpp[  4x8]         22.73x   45.35           1030.72

luma_vpp[ 4x16]         25.10x   82.32           2066.41

Ok to commit?

Thanks,

Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210702/63e1d84c/attachment.html>