[x265] [PATCH] asm: code for scale1D_128to64 routine
chen
chenm003 at 163.com
Thu Nov 14 07:31:11 CET 2013
>+pand m0, [pw_00ff]
>+pand m2, [pw_00ff]
>+pand m4, [pw_00ff]
>+pand m6, [pw_00ff]
>+
>+packuswb m0, m1
>+packuswb m2, m3
>+packuswb m4, m5
>+packuswb m6, m7
1. If you don't buffer [pw_00ff] into register, you can merge pand+packuswb to pshufb, most time buffer constant into register is faster
2. packuswb m0,m0 is better, since it depends on one register.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131114/7581fd87/attachment.html>
More information about the x265-devel
mailing list