[x265] [arm64] port count_nonzero, blkfill, and copy_{ss, sp, ps}
Pop, Sebastian
spop at amazon.com
Mon Jul 26 15:14:17 UTC 2021
> @@ -508,19 +508,17 @@ function x265_copy_cnt_4_neon
> ......
> + uaddlv s4, v4.4h
> Unsigned?
Thanks for catching this.
The attached patch fixes the problem and gets a little more speedup:
Before:
copy_cnt[4x4] 14.76x 7.12 105.12
copy_cnt[8x8] 37.56x 10.60 398.25
copy_cnt[16x16] 52.57x 29.74 1563.60
copy_cnt[32x32] 62.22x 98.37 6120.29
After:
copy_cnt[4x4] 15.28x 6.87 104.93
copy_cnt[8x8] 38.20x 10.32 394.19
copy_cnt[16x16] 52.40x 29.90 1566.65
copy_cnt[32x32] 62.67x 97.81 6130.08
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210726/0856e6af/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-count_nonzero-blkfill-and-copy_-ss-sp-ps.patch
Type: application/octet-stream
Size: 38846 bytes
Desc: 0001-arm64-port-count_nonzero-blkfill-and-copy_-ss-sp-ps.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210726/0856e6af/attachment-0001.obj>
More information about the x265-devel
mailing list