<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>I give you some algorithm details here:<BR>In:<BR>A B<BR>C D</DIV>
<DIV>Out:<BR>(A + B + C + D + 2) / 4</DIV>
<DIV>This is standard MPEG4 interpolateHV, you may reference Xvid's code<BR>or use pmaddubsw + pmulhrsw<BR></DIV>
<DIV>>+;-----------------------------------------------------------------<BR>>+; void scale2D_64to32(pixel *dst, pixel *src, intptr_t stride)<BR>>+;-----------------------------------------------------------------<BR>>+INIT_XMM ssse3<BR>>+cglobal scale2D_64to32, 3, 7, 8, dest, src, stride<BR>>+<BR>>+ mova m7, [pw_00ff]<BR>>+ mova m6, [pw_2]<BR>>+ xor r3, r3<BR>>+ mov r6d, 32<BR>>+.loop<BR>>+<BR>>+ mov r4, r3<BR>>+ imul r4, r2<BR>>+<BR>>+ mov r5, r3<BR>>+ inc r5<BR>>+ imul r5, r2<BR>>+<BR>>+ movu m0, [r1 + r4]<BR>>+ palignr m1, m0, 1<BR>>+ movu m2, [r1 + r5]<BR>>+ palignr m3, m2, 1<BR>>+<BR>>+ pand m0, m7<BR>>+ pand m1, m7<BR>>+ pand m2, m7<BR>>+ pand m3, m7<BR>>+<BR>>+ paddusw m0, m1<BR>>+ paddusw m0, m2<BR>>+ paddusw m0, m3<BR>>+ paddusw m0, m6<BR>>+<BR>>+ psrlw m0, 2<BR>>+<BR>>+ movu m4, [r1 + r4 + 16]<BR>>+ palignr m5, m4, 1<BR>>+ movu m1, [r1 + r5 + 16]<BR>>+ palignr m2, m1, 1<BR>>+<BR>>+ pand m4, m7<BR>>+ pand m5, m7<BR>>+ pand m1, m7<BR>>+ pand m2, m7<BR>>+<BR>>+ paddusw m4, m5<BR>>+ paddusw m4, m1<BR>>+ paddusw m4, m2<BR>>+ paddusw m4, m6<BR>>+ psrlw m4, 2<BR>>+<BR>>+ packuswb m0, m4<BR>>+ movu [r0], m0<BR>>+<BR>>+ movu m0, [r1 + r4 + 32]<BR>>+ palignr m1, m0, 1<BR>>+ movu m2, [r1 + r5 + 32]<BR>>+ palignr m3, m2, 1<BR>>+<BR>>+ pand m0, m7<BR>>+ pand m1, m7<BR>>+ pand m2, m7<BR>>+ pand m3, m7<BR>>+<BR>>+ paddusw m0, m1<BR>>+ paddusw m0, m2<BR>>+ paddusw m0, m3<BR>>+ paddusw m0, m6<BR>>+<BR>>+ psrlw m0, 2<BR>>+<BR>>+ movu m4, [r1 + r4 + 48]<BR>>+ palignr m5, m4, 1<BR>>+ movu m1, [r1 + r5 + 48]<BR>>+ palignr m2, m1, 1<BR>>+<BR>>+ pand m4, m7<BR>>+ pand m5, m7<BR>>+ pand m1, m7<BR>>+ pand m2, m7<BR>>+<BR>>+ paddusw m4, m5<BR>>+ paddusw m4, m1<BR>>+ paddusw m4, m2<BR>>+ paddusw m4, m6<BR>>+ psrlw m4, 2<BR>>+<BR>>+ packuswb m0, m4<BR>>+ movu [r0 + 16], m0<BR>>+<BR>>+ lea r0, [r0 + 32]<BR>>+ add r3, 2<BR>>+ dec r6d<BR>>+<BR>>+ jnz .loop<BR>>+<BR>>+RET<BR></DIV></div>