<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>At 2013-11-19 14:23:41,murugan@multicorewareinc.com wrote:<BR>># HG changeset patch<BR>># User Murugan Vairavel <murugan@multicorewareinc.com><BR>># Date 1384842189 -19800<BR>># Tue Nov 19 11:53:09 2013 +0530<BR>># Node ID 3a94cc365533bf7def255dc5b28e6a6a1d1bfa50<BR>># Parent f6a050b79cfa400aa432f49ee8a4c2b9f20cf930<BR>>asm: code for transpose_8x8 routine<BR>><BR>>diff -r f6a050b79cfa -r 3a94cc365533 source/common/x86/asm-primitives.cpp<BR>>--- a/source/common/x86/asm-primitives.cpp Tue Nov 19 11:25:00 2013 +0530<BR>>+++ b/source/common/x86/asm-primitives.cpp Tue Nov 19 11:53:09 2013 +0530<BR>>@@ -546,6 +546,7 @@<BR>> p.calcresidual[BLOCK_4x4] = x265_getResidual4_sse2;<BR>> p.calcresidual[BLOCK_8x8] = x265_getResidual8_sse2;<BR>> p.transpose[BLOCK_4x4] = x265_transpose4_sse2;<BR>>+ p.transpose[BLOCK_8x8] = x265_transpose8_sse2;<BR>> }<BR>> if (cpuMask & X265_CPU_SSSE3)<BR>> {<BR>>diff -r f6a050b79cfa -r 3a94cc365533 source/common/x86/pixel-a.asm<BR>>--- a/source/common/x86/pixel-a.asm Tue Nov 19 11:25:00 2013 +0530<BR>>+++ b/source/common/x86/pixel-a.asm Tue Nov 19 11:53:09 2013 +0530<BR>>@@ -8359,3 +8359,45 @@<BR>> movu [r0], m0<BR>> <BR>> RET<BR>>+<BR>>+;-----------------------------------------------------------------<BR>>+; void transpose_8x8(pixel *dst, pixel *src, intptr_t stride)<BR>>+;-----------------------------------------------------------------<BR>>+INIT_XMM sse2<BR>>+cglobal transpose8, 3, 3, 8, dest, src, stride<BR>>+<BR>>+ movh m0, [r1]<BR>>+ movh m1, [r1 + r2]<BR>>+ movh m2, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m3, [r1 + r2]<BR>>+ movh m4, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m5, [r1 + r2]<BR>>+ movh m6, [r1 + 2 * r2]<BR>>+ lea r1, [r1 + 2 * r2]<BR>>+ movh m7, [r1 + r2]<BR>>+<BR>>+ punpcklbw m0, m1<BR>>+ punpcklbw m2, m3<BR>>+ punpcklbw m4, m5<BR>>+ punpcklbw m6, m7<BR>>+ movu m1, m0</DIV>
<DIV>register to register copy use mova is better, of course, use "punpckhwd m1, m0, m2" is best way</DIV>
<DIV><BR>>+ punpcklwd m0, m2<BR>>+ punpckhwd m1, m2<BR>>+ movu m5, m4<BR>>+ punpcklwd m4, m6<BR>>+ punpckhwd m5, m6<BR>>+ movu m2, m0<BR>>+ punpckldq m0, m4<BR>>+ punpckhdq m2, m4<BR>>+ movu m3, m1<BR>>+ punpckldq m1, m5<BR>>+ punpckhdq m3, m5<BR>>+<BR>>+ movu [r0], m0<BR>>+ movu [r0 + 16], m2<BR>>+ movu [r0 + 32], m1<BR>>+ movu [r0 + 48], m3<BR>>+<BR>>+ RET<BR>>diff -r f6a050b79cfa -r 3a94cc365533 source/common/x86/pixel.h<BR>>--- a/source/common/x86/pixel.h Tue Nov 19 11:25:00 2013 +0530<BR>>+++ b/source/common/x86/pixel.h Tue Nov 19 11:53:09 2013 +0530<BR>>@@ -366,5 +366,6 @@<BR>> void x265_getResidual16_sse4(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);<BR>> void x265_getResidual32_sse4(pixel *fenc, pixel *pred, int16_t *residual, intptr_t stride);<BR>> void x265_transpose4_sse2(pixel *dest, pixel *src, intptr_t stride);<BR>>+void x265_transpose8_sse2(pixel *dest, pixel *src, intptr_t stride);<BR>> <BR>> #endif // ifndef X265_I386_PIXEL_H<BR>>_______________________________________________<BR>>x265-devel mailing list<BR>>x265-devel@videolan.org<BR>>https://mailman.videolan.org/listinfo/x265-devel<BR></DIV></div>