<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>>+%macro transpose_8x8 0</DIV>
<DIV>macro name upper</DIV>
<DIV><BR>>+<BR>>+ movh m0, [r1]<BR>>+ movh m1, [r1 + r2]<BR>>+ movh m2, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m3, [r1 + r2]<BR>>+ movh m4, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m5, [r1 + r2]<BR>>+ movh m6, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m7, [r1 + r2]<BR>>+<BR>>+ punpcklbw m0, m1<BR>>+ punpcklbw m2, m3<BR>>+ punpcklbw m4, m5<BR>>+ punpcklbw m6, m7<BR>>+<BR>>+ punpckhwd m1, m0, m2<BR>>+ punpcklwd m0, m2<BR>>+ punpckhwd m5, m4, m6<BR>>+ punpcklwd m4, m6<BR>>+ punpckhdq m2, m0, m4<BR>>+ punpckldq m0, m4<BR>>+ punpckhdq m3, m1, m5<BR>>+ punpckldq m1, m5<BR>>+<BR>>+ movlps [r0], m0<BR>>+ movhps [r0 + r3], m0<BR>>+ movlps [r0 + 2 * r3], m2<BR>>+ lea r0, [r0 + 2 * r3]<BR>>+ movhps [r0 + r3], m2<BR>>+ movlps [r0 + 2 * r3], m1<BR>>+ lea r0, [r0 + 2 * r3]<BR>>+ movhps [r0 + r3], m1<BR>>+ movlps [r0 + 2 * r3], m3<BR>>+ lea r0, [r0 + 2 * r3]<BR>>+ movhps [r0 + r3], m3<BR>>+<BR>>+%endmacro<BR>this macro is right, but need some modify, see below</DIV>
<DIV> </DIV>
<DIV><BR>>+<BR>>+;-----------------------------------------------------------------<BR>>+; void transpose_16x16(pixel *dst, pixel *src, intptr_t stride)<BR>>+;-----------------------------------------------------------------<BR>>+INIT_XMM sse2<BR>>+cglobal transpose16, 3, 5, 8, dest, src, stride<BR>>+<BR>>+ mov r4, r0<BR>>+ mov r5, r1</DIV>
<DIV>you declare you use r0-r4 only</DIV>
<DIV><BR>>+ mov r3, 16</DIV>
<DIV>when stride is constant, inlin r3 is better, so you have to modify 8x8 macro and below</DIV>
<DIV><BR>>+ transpose_8x8<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ lea r0, [r4 + 8]<BR>>+ transpose_8x8<BR>>+ lea r1, [r5 + 8]<BR>>+ lea r0, [r4 + r3 * 8]<BR>>+ transpose_8x8<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ lea r0, [r4 + r3 * 8 +8]<BR>>+ transpose_8x8<BR>>+<BR>>+ RET<BR>>diff -r 3a94cc365533 -r 435c48eb30e1 source/common/x86/pixel.h<BR>>--- a/source/common/x86/pixel.h Tue Nov 19 11:53:09 2013 +0530<BR>>+++ b/source/common/x86/pixel.h Tue Nov 19 19:19:30 2013 +0530<BR>>@@ -367,5 +367,6 @@<BR>> void x265_getResidual32_sse4(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);<BR>> void x265_transpose4_sse2(pixel *dest, pixel *src, intptr_t stride);<BR>> void x265_transpose8_sse2(pixel *dest, pixel *src, intptr_t stride);<BR>>+void x265_transpose16_sse2(pixel *dest, pixel *src, intptr_t stride);<BR>> <BR>> #endif // ifndef X265_I386_PIXEL_H<BR>>_______________________________________________<BR>>x265-devel mailing list<BR>>x265-devel@videolan.org<BR>>https://mailman.videolan.org/listinfo/x265-devel<BR></DIV></div>