[x265] [arm64] port sad
chen
chenm003 at 163.com
Sat Jul 17 08:56:41 UTC 2021
Hi Sebastian,
Thank you for your code.
At first, sorry for delay, I am very busy on my family and my toy hardware codec in last week, I just have a little spare-time during weekend.
The next, I didn't take a look all of functions, but I made some comments on 64x64.
On the function, unroll=8 (4*2) will get good performance on Out-Of-Order (OOO) CPU, but may drain performance due to cache miss and related issues on low-end CPU such as Cortex-A53, Of course, this is not problem on this versiong of patch.
In the 64x64, the sum calculate by below code.
==========
+.macro SAD_END_64
+ add v16.8h, v16.8h, v17.8h
+ add v17.8h, v18.8h, v19.8h
+ add v16.8h, v16.8h, v17.8h
+ uaddlv s0, v16.8h
+ fmov w0, s0
+ add v18.8h, v20.8h, v21.8h
+ add v19.8h, v22.8h, v23.8h
+ add v17.8h, v18.8h, v19.8h
+ uaddlv s1, v17.8h
+ fmov w1, s1
+ add w0, w0, w1
+ ret
+.endm
==========
You use two of UADDLV to avoid overflow, how about sum these partial registers on NEON field to reduce instruction UADDLV?
e.g.
UADDLP v16,v16
UADDLP v17,v17
ADD v16,v17
UADDLV s0,v16
Regards,
Min Chen
2021-07-17 04:44:05,"Pop, Sebastian" <spop at amazon.com>
Hi,
the attached patch ports to arm64 the following kernels:
sad[ 4x4] 10.11x 6.50 65.72
sad[ 8x8] 28.95x 8.50 246.00
sad[ 8x4] 23.03x 5.45 125.43
sad[ 4x8] 12.09x 10.64 128.68
sad[16x16] 53.37x 19.19 1024.05
sad[ 16x8] 43.09x 11.62 500.84
sad[ 8x16] 31.03x 16.87 523.44
sad[ 16x4] 39.73x 6.27 249.10
sad[16x12] 50.55x 15.10 763.44
sad[ 4x16] 14.23x 19.39 275.91
sad[12x16] 33.68x 22.95 772.81
sad[32x32] 62.10x 64.84 4026.97
sad[32x16] 59.82x 33.74 2018.56
sad[16x32] 57.94x 35.01 2028.17
sad[ 32x8] 53.98x 18.77 1013.48
sad[32x24] 61.29x 49.36 3024.90
sad[ 8x32] 31.84x 32.49 1034.56
sad[24x32] 53.61x 56.39 3022.97
sad[64x64] 65.24x 255.86 16692.29
sad[64x32] 61.77x 131.16 8100.90
sad[32x64] 62.31x 128.90 8031.79
sad[64x16] 60.28x 67.35 4060.31
sad[64x48] 62.53x 193.59 12104.64
sad[16x64] 61.10x 66.13 4040.26
sad[48x64] 61.75x 194.68 12022.14
Ok to commit?
Thanks,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210717/387426e9/attachment.html>
More information about the x265-devel
mailing list