[x265] [PATCH] asm: code for scale1D_128to64 routine

chen chenm003 at 163.com
Thu Nov 14 07:31:11 CET 2013


>+pand        m0,      [pw_00ff]
>+pand        m2,      [pw_00ff]
>+pand        m4,      [pw_00ff]
>+pand        m6,      [pw_00ff]
>+
>+packuswb    m0,      m1
>+packuswb    m2,      m3
>+packuswb    m4,      m5
>+packuswb    m6,      m7
1. If you don't buffer [pw_00ff] into register, you can merge pand+packuswb to pshufb, most time buffer constant into register is faster
2. packuswb m0,m0 is better, since it depends on one register.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131114/7581fd87/attachment.html>


More information about the x265-devel mailing list