[x265] [PATCH] all_angs_pred_16x16, asm code
chen
chenm003 at 163.com
Thu Jan 30 15:15:11 CET 2014
>--- a/source/common/x86/intrapred8.asm Thu Jan 30 06:46:23 2014 +0530
>+++ b/source/common/x86/intrapred8.asm Thu Jan 30 19:05:45 2014 +0530
>@@ -34,6 +34,9 @@
> c_mode32_17_0: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0
> c_shuf8_0: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
> c_deinterval8: db 0, 8, 1, 9, 2, 10, 3, 11, 4, 12, 5, 13, 6, 14, 7, 15
>+tab_S0: db 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8
same as c_shuf8_0
>+tab_S1: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0
>+tab_S2: db 15, 14, 12, 11, 10, 9, 7, 6, 5, 4, 2, 1, 0, 0, 0, 0
same?
>+psrldq m0, 1
>+pinsrb m0, [r4 + 19], 15
psrldq+pinsrb(15) can replace by movd+palignr
pinsrb need 2uops, it is expendsive operators.
>+; mode 4 [row 0]
>+movu m6, [r5 + 21 * 16]
>+pmaddubsw m1, m0, m6
>+pmulhrsw m1, m3
>+pmaddubsw m2, m7, m6
>+pmulhrsw m2, m3
>+packuswb m1, m2
>+movu [r0 + 32 * 16], m1
r5 pointer to ang_table, it is alignment data
you save a constant table load operator, but do two hidden copy operator, it is slower since the constant table most possible in CPU cache here.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140130/12d9df31/attachment.html>
More information about the x265-devel
mailing list