[x265] [arm64] port scale1D_128to64 and scale2D_64to32
Pop, Sebastian
spop at amazon.com
Fri Jul 30 18:19:26 UTC 2021
Thanks Min Chen for your very useful reviews!
> LD2+UADDL equal to LD1+ADDLP
You are right!
The cost for LD2 is 7 cycles and LD1 is 5 cycles for Neoverse-N1.
With your suggested change I see a big speedup.
Before:
scale2D_64to32 62.21x 220.95 13744.77
After, see attached patch:
scale2D_64to32 86.66x 158.61 13746.23
Thanks,
Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210730/219ce569/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
Type: application/octet-stream
Size: 3346 bytes
Desc: 0001-arm64-port-scale1D_128to64-and-scale2D_64to32.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20210730/219ce569/attachment.obj>
More information about the x265-devel
mailing list