<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>diff -r 5bb46ef28bc5 -r 02b888130ed4 source/common/x86/pixeladd8.asm<BR>>--- a/source/common/x86/pixeladd8.asm Mon Dec 09 10:59:45 2013 +0800<BR>>+++ b/source/common/x86/pixeladd8.asm Mon Dec 09 12:13:29 2013 +0530<BR>>@@ -364,6 +364,75 @@<BR>> ; void pixel_add_ps_%1x%2(pixel *dest, intptr_t destride, pixel *src0, int16_t *scr1, intptr_t srcStride0, intptr_t srcStride1)<BR>> ;-----------------------------------------------------------------------------<BR>> %macro PIXEL_ADD_PS_W6_H4 2<BR>>+%if HIGH_BIT_DEPTH<BR>>+INIT_XMM sse2<BR>>+cglobal pixel_add_ps_%1x%2, 6, 7, 6, dest, destride, src0, scr1, srcStride0, srcStride1<BR>>+ mov r6d, %2/4<BR>>+ add r1, r1<BR>>+ add r4, r4<BR>>+ add r5, r5<BR>>+ pxor m4, m4<BR>>+ mova m5, [pw_pixel_max]<BR>>+.loop<BR>>+ movu m0, [r2]<BR>>+ movu m1, [r3]<BR>>+ mova m2, m0<BR>>+ mova m3, m1<BR>>+ punpckhqdq m2, m2<BR>>+ punpckhqdq m3, m3<BR>punpckhqdq m2, m0, m0</DIV>
<DIV>write like this will more performance on AVX, of course, you don't need it, see below</DIV>
<DIV> </DIV>
<DIV>>+ paddw m0, m1<BR>>+ paddw m2, m3</DIV>
<DIV>the latest paddw can process 8 pixels, so you don't need m2m3</DIV>
<DIV><BR>>+ CLIPW m0, m4, m5<BR>>+ CLIPW m2, m4, m5<BR>>+<BR>>+ movh [r0], m0<BR>>+ movd [r0 + 8], m2</DIV>
<DIV>SSE4: pextrd [r0+8],m0,2 </DIV>
<DIV>SSE2: pshufd m0, 2 + movd</DIV>
<DIV><BR> </DIV></div>