[x265] [PATCH] all_angs_pred_16x16, asm code

chen chenm003 at 163.com
Thu Jan 30 15:15:11 CET 2014


>--- a/source/common/x86/intrapred8.asm Thu Jan 30 06:46:23 2014 +0530
>+++ b/source/common/x86/intrapred8.asm Thu Jan 30 19:05:45 2014 +0530
>@@ -34,6 +34,9 @@
> c_mode32_17_0:  db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0
> c_shuf8_0:      db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
> c_deinterval8:  db 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15
>+tab_S0: db  0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
same as c_shuf8_0
 
>+tab_S1: db 15, 14, 12, 11, 10,  9,  7,  6,  5,  4,  2,  1, 0, 0, 0, 0
>+tab_S2: db 15, 14, 12, 11, 10,  9,  7,  6,  5,  4,  2,  1, 0, 0, 0, 0
same?
 
>+psrldq   m0,               1
>+pinsrb   m0,               [r4 + 19],   15
psrldq+pinsrb(15) can replace by movd+palignr
pinsrb need 2uops, it is expendsive operators.
 
>+; mode 4 [row 0]
>+movu          m6,             [r5 + 21 * 16]
>+pmaddubsw     m1,             m0,         m6
>+pmulhrsw      m1,             m3
>+pmaddubsw     m2,             m7,         m6
>+pmulhrsw      m2,             m3
>+packuswb      m1,             m2
>+movu          [r0 + 32 * 16], m1
r5 pointer to ang_table, it is alignment data
you save a constant table load operator, but do two hidden copy operator, it is slower since the constant table most possible in CPU cache here.
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140130/12d9df31/attachment.html>


More information about the x265-devel mailing list