[x265] [PATCH] assembly code for pixel_sad_x3_24x32
chen
chenm003 at 163.com
Wed Oct 30 16:51:42 CET 2013
>+ psadbw m5, m3
>+ psadbw m6, m4
>+ pshufd m6, m6, 84
You want to clear high 96 bits to zero, why not use pand, of course, we can avoid this, see below
>+ paddd m5, m6
>+ paddd m0, m5
we can sum as 32xN and drop high 64 bits in last step
>+%macro SAD_X3_W24 0
>+cglobal pixel_sad_x3_24x32, 5, 6, 8
>+ pxor m0, m0
>+ pxor m1, m1
>+ pxor m2, m2
>+ mov r6, 32
>+
>+.loop
>+ SAD_X3_24x4
>+ SAD_X3_24x4
>+ SAD_X3_24x4
>+ SAD_X3_24x4
>+
>+ sub r6, 16
>+ cmp r6, 0
>+jnz .loop
loop problem as my previous mail, and instruction SUB affect FLAG, so I think you don't need "cmp r6,0"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131030/70b12412/attachment.html>
More information about the x265-devel
mailing list