[x265] [arm64] port count_nonzero, blkfill, and copy_{ss, sp, ps}

Pop, Sebastian spop at amazon.com
Mon Jul 26 15:14:17 UTC 2021


> @@ -508,19 +508,17 @@ function x265_copy_cnt_4_neon
> ......
> +    uaddlv          s4, v4.4h
> Unsigned?

Thanks for catching this.
The attached patch fixes the problem and gets a little more speedup:

Before:
         copy_cnt[4x4]  14.76x   7.12            105.12
         copy_cnt[8x8]  37.56x   10.60           398.25
       copy_cnt[16x16]  52.57x   29.74           1563.60
       copy_cnt[32x32]  62.22x   98.37           6120.29

After:
         copy_cnt[4x4]  15.28x   6.87            104.93
         copy_cnt[8x8]  38.20x   10.32           394.19
       copy_cnt[16x16]  52.40x   29.90           1566.65
       copy_cnt[32x32]  62.67x   97.81           6130.08
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210726/0856e6af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-count_nonzero-blkfill-and-copy_-ss-sp-ps.patch
Type: application/octet-stream
Size: 38846 bytes
Desc: 0001-arm64-port-count_nonzero-blkfill-and-copy_-ss-sp-ps.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210726/0856e6af/attachment-0001.obj>


More information about the x265-devel mailing list