<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><DIV>At 2013-11-26 18:45:25,yuvaraj@multicorewareinc.com wrote:<BR>># HG changeset patch<BR>># User Yuvaraj Venkatesh <yuvaraj@multicorewareinc.com><BR>># Date 1385462702 -19800<BR>># Tue Nov 26 16:15:02 2013 +0530<BR>># Node ID 52738c22dce02e8d59cc4b09f1e1b23a0a8360c5<BR>># Parent 116d91f08fcb123d4b088df5c1400e599306b6f8<BR>>asm: assembly code for pixel_sse_ss_24x32<BR>><BR>>diff -r 116d91f08fcb -r 52738c22dce0 source/common/x86/asm-primitives.cpp<BR>>--- a/source/common/x86/asm-primitives.cpp Tue Nov 26 14:19:27 2013 +0800<BR>>+++ b/source/common/x86/asm-primitives.cpp Tue Nov 26 16:15:02 2013 +0530<BR>>@@ -103,6 +103,7 @@<BR>> p.sse_ss[LUMA_16x16] = x265_pixel_ssd_ss_16x16_ ## cpu; \<BR>> p.sse_ss[LUMA_16x32] = x265_pixel_ssd_ss_16x32_ ## cpu; \<BR>> p.sse_ss[LUMA_16x64] = x265_pixel_ssd_ss_16x64_ ## cpu; \<BR>>+ p.sse_ss[LUMA_24x32] = x265_pixel_ssd_ss_24x32_ ## cpu; \<BR>> p.sse_ss[LUMA_32x8] = x265_pixel_ssd_ss_32x8_ ## cpu; \<BR>> p.sse_ss[LUMA_32x16] = x265_pixel_ssd_ss_32x16_ ## cpu; \<BR>> p.sse_ss[LUMA_32x24] = x265_pixel_ssd_ss_32x24_ ## cpu; \<BR>>diff -r 116d91f08fcb -r 52738c22dce0 source/common/x86/pixel-a.asm<BR>>--- a/source/common/x86/pixel-a.asm Tue Nov 26 14:19:27 2013 +0800<BR>>+++ b/source/common/x86/pixel-a.asm Tue Nov 26 16:15:02 2013 +0530<BR>>@@ -469,17 +469,62 @@<BR>> SSD_SS_32 64<BR>> %endmacro<BR>> <BR>>+%macro SSD_SS_24 0<BR>>+cglobal pixel_ssd_ss_24x32, 4,7,6<BR>>+ FIX_STRIDES r1, r3<BR>>+ mov r4d, 16<BR>>+ pxor m0, m0<BR>>+.loop<BR>>+ mova m1, [r0]<BR>>+ psubw m1, [r2]<BR>this is right, but it is unsafe, I am not sure the input pointer is alignment</DIV>
<DIV> </DIV>
<DIV>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ mova m1, [r0 + 16]<BR>>+ psubw m1, [r2 + 16]<BR>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ mova m1, [r0 + 32]<BR>>+ psubw m1, [r2 + 32]<BR>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ lea r0, [r0 + 2*r1]<BR>>+ lea r2, [r2 + 2*r3]<BR>>+ mova m1, [r0]<BR>>+ psubw m1, [r2]<BR>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ mova m1, [r0 + 16]<BR>>+ psubw m1, [r2 + 16]<BR>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ mova m1, [r0 + 32]<BR>>+ psubw m1, [r2 + 32]<BR>>+ pmaddwd m1, m1<BR>>+ paddd m0, m1<BR>>+ lea r0, [r0 + 2*r1]<BR>>+ lea r2, [r2 + 2*r3]<BR>>+ dec r4d<BR>>+ jnz .loop<BR>>+ phaddd m0, m0<BR>>+ phaddd m0, m0<BR>>+ movd eax, m0<BR>>+ RET<BR>>+%endmacro<BR>>+<BR>> INIT_XMM sse2<BR>> SSD_SS_ONE<BR>> SSD_SS_12x16<BR>>+SSD_SS_24<BR>> SSD_SS_32xN<BR>> INIT_XMM sse4<BR>> SSD_SS_ONE<BR>> SSD_SS_12x16<BR>>+SSD_SS_24<BR>> SSD_SS_32xN<BR>> INIT_XMM avx<BR>> SSD_SS_ONE<BR>> SSD_SS_12x16<BR>>+SSD_SS_24<BR>> SSD_SS_32xN<BR>> %endif ; !HIGH_BIT_DEPTH<BR>> <BR>>@@ -7696,9 +7741,6 @@<BR>> %endif ; !ARCH_X86_64<BR>> %endmacro ; SA8D<BR>> <BR>>-;=============================================================================<BR>>-; INTRA SATD<BR>>-;=============================================================================<BR>> %define TRANS TRANS_SSE2<BR>> %define DIFFOP DIFF_UNPACK_SSE2<BR>> %define LOAD_SUMSUB_8x4P LOAD_DIFF_8x4P<BR>>_______________________________________________<BR>>x265-devel mailing list<BR>>x265-devel@videolan.org<BR>>https://mailman.videolan.org/listinfo/x265-devel<BR></DIV></div>