[x265] [PATCH] assembly code for pixel_sad_x3_24x32

chen chenm003 at 163.com
Wed Oct 30 16:51:42 CET 2013


>+    psadbw  m5,  m3
>+    psadbw  m6,  m4
>+    pshufd  m6,  m6, 84
You want to clear high 96 bits to zero, why not use pand, of course, we can avoid this, see below
 
>+    paddd   m5,  m6
>+    paddd   m0,  m5
we can sum as 32xN and drop high 64 bits in last step
 
>+%macro SAD_X3_W24 0
>+cglobal pixel_sad_x3_24x32, 5, 6, 8
>+    pxor  m0, m0
>+    pxor  m1, m1
>+    pxor  m2, m2
>+    mov   r6, 32
>+
>+.loop
>+    SAD_X3_24x4
>+    SAD_X3_24x4
>+    SAD_X3_24x4
>+    SAD_X3_24x4
>+
>+    sub r6,  16
>+    cmp r6,  0
>+jnz .loop
loop problem as my previous mail, and instruction SUB affect FLAG, so I think you don't need "cmp r6,0"
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20131030/70b12412/attachment.html>


More information about the x265-devel mailing list