[x265] [arm64] port scale1D_128to64 and scale2D_64to32

Pop, Sebastian spop at amazon.com
Fri Jul 30 18:19:26 UTC 2021


Thanks Min Chen for your very useful reviews!

> LD2+UADDL equal to LD1+ADDLP

You are right!
The cost for LD2 is 7 cycles and LD1 is 5 cycles for Neoverse-N1.
With your suggested change I see a big speedup.

Before:
        scale2D_64to32  62.21x   220.95          13744.77

After, see attached patch:
        scale2D_64to32  86.66x   158.61          13746.23

Thanks,
Sebastian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210730/219ce569/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
Type: application/octet-stream
Size: 3346 bytes
Desc: 0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210730/219ce569/attachment.obj>


More information about the x265-devel mailing list