<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>--- a/source/common/x86/intrapred8.asm Thu Jan 30 06:46:23 2014 +0530<BR>>+++ b/source/common/x86/intrapred8.asm Thu Jan 30 19:05:45 2014 +0530<BR>>@@ -34,6 +34,9 @@<BR>> c_mode32_17_0: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0<BR>> c_shuf8_0: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8<BR>> c_deinterval8: db 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15<BR>>+tab_S0: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8<BR>same as c_shuf8_0</DIV>
<DIV> </DIV>
<DIV>>+tab_S1: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0<BR>>+tab_S2: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0<BR>same?</DIV>
<DIV> </DIV>
<DIV>>+psrldq m0, 1<BR>>+pinsrb m0, [r4 + 19], 15<BR>psrldq+pinsrb(15) can replace by movd+palignr</DIV>
<DIV>pinsrb need 2uops, it is expendsive operators.</DIV>
<DIV> </DIV>
<DIV>>+; mode 4 [row 0]<BR>>+movu m6, [r5 + 21 * 16]<BR>>+pmaddubsw m1, m0, m6<BR>>+pmulhrsw m1, m3<BR>>+pmaddubsw m2, m7, m6<BR>>+pmulhrsw m2, m3<BR>>+packuswb m1, m2<BR>>+movu [r0 + 32 * 16], m1<BR>r5 pointer to ang_table, it is alignment data</DIV>
<DIV>you save a constant table load operator, but do two hidden copy operator, it is slower since the constant table most possible in CPU cache here.</DIV>
<DIV> </DIV></div>