[x265] [arm64] port count_nonzero, blkfill, and copy_{ss,sp,ps}
Pop, Sebastian
spop at amazon.com
Thu Jul 22 20:13:41 UTC 2021
Hi,
the attached patch ports to arm64 the following kernels:
count_nonzero[4x4] 19.23x 2.95 56.77
count_nonzero[8x8] 32.07x 7.11 228.15
count_nonzero[16x16] 35.16x 26.09 917.23
count_nonzero[32x32] 37.31x 98.07 3658.49
blkfill[4x4] 31.39x 3.72 116.84
blkfill[8x8] 85.97x 5.78 497.26
blkfill[16x16] 102.63x 16.28 1670.56
blkfill[32x32] 100.07x 62.89 6293.62
copy_ss[ 4x4] 16.87x 6.21 104.78
copy_sp[ 4x4] 16.21x 6.34 102.69
copy_ps[ 4x4] 18.06x 5.91 106.69
copy_ss[ 8x8] 51.50x 8.30 427.52
copy_sp[ 8x8] 43.34x 9.32 403.79
copy_ps[ 8x8] 49.00x 8.50 416.36
[i420] copy_ss[ 4x4] 15.40x 6.62 101.98
[i420] copy_ps[ 4x4] 16.50x 6.26 103.28
[i420] copy_sp[ 4x4] 14.14x 6.82 96.48
[i422] copy_ss[ 4x8] 25.79x 8.28 213.57
[i422] copy_ps[ 4x8] 24.74x 8.62 213.35
[i422] copy_sp[ 4x8] 22.01x 9.27 204.03
copy_ss[16x16] 82.20x 19.79 1626.69
copy_sp[16x16] 85.13x 18.78 1599.19
copy_ps[16x16] 72.51x 22.28 1615.58
[i420] copy_ss[ 8x8] 49.16x 8.49 417.24
[i420] copy_ps[ 8x8] 46.52x 8.71 405.34
[i420] copy_sp[ 8x8] 42.68x 9.47 404.13
[i422] copy_ss[ 8x16] 56.55x 14.98 847.42
[i422] copy_ps[ 8x16] 57.71x 15.12 872.39
[i422] copy_sp[ 8x16] 49.76x 16.83 837.44
copy_ss[32x32] 98.60x 67.47 6652.77
copy_sp[32x32] 96.31x 65.07 6266.88
copy_ps[32x32] 77.71x 81.02 6295.59
[i420] copy_ss[16x16] 83.93x 20.52 1722.55
[i420] copy_ps[16x16] 72.66x 22.13 1608.30
[i420] copy_sp[16x16] 85.67x 18.73 1604.77
[i422] copy_ss[16x32] 91.45x 36.56 3343.09
[i422] copy_ps[16x32] 75.73x 42.40 3211.16
[i422] copy_sp[16x32] 91.93x 34.32 3154.89
copy_ss[64x64] 104.11x 254.52 26498.82
copy_sp[64x64] 98.81x 252.38 24937.40
copy_ps[64x64] 80.97x 308.55 24983.04
[i420] copy_ss[32x32] 99.49x 67.40 6706.31
[i420] copy_ps[32x32] 76.50x 81.51 6235.63
[i420] copy_sp[32x32] 97.43x 65.84 6414.64
[i422] copy_ss[32x64] 102.57x 129.82 13315.36
[i422] copy_ps[32x64] 78.95x 159.47 12590.31
[i422] copy_sp[32x64] 99.54x 128.29 12769.10
copy_cnt[4x4] 13.91x 7.48 104.10
copy_cnt[8x8] 31.01x 12.69 393.40
copy_cnt[16x16] 42.88x 36.23 1553.66
copy_cnt[32x32] 47.43x 129.19 6127.58
Ok to commit?
Thanks,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210722/1ec340c2/attachment.html>
More information about the x265-devel
mailing list