<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div> </div><pre><br>At 2014-09-17 19:33:16,praveen@multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1410953432 -19800
># Node ID e919c3dde6bd9a3b74177e48a14e8b151556caee
># Parent de0b737ed7165b4739128ee430f259ea0f8a5e81
>denoiseDct: SSE version of asm code
>
>+;-----------------------------------------------------------------------------
>+; void denoise_dct(int32_t *dct, uint32_t *sum, uint16_t *offset, int size)
>+;-----------------------------------------------------------------------------
>+INIT_XMM sse4
>+cglobal denoise_dct, 4, 4, 6
>+ pxor m5, m5
>+ shr r3d, 2
>+.loop:
>+ mova m0, [r0]
>+ pabsd m1, m0
>+ mova m2, [r1]
>+ paddd m2, m1
>+ mova [r1], m2
>+ movh m2, [r2]
>+ pmovzxwd m3, m2
</pre><pre>pmovzx didn't need alignment address</pre><pre>>+ psubd m1, m3
>+ pcmpgtd m4, m1, m5
>+ pand m1, m4
>+ psignd m1, m0
>+ mova [r0], m1
>+ add r0, 16
>+ add r1, 16
>+ add r2, 8
>+ dec r3d
>+ jg .loop
</pre><pre>jnz</pre><pre> </pre><pre> </pre><pre>this version is similar to origin x264 version, just ABSD vs pabsd and PSIGND vs psignd, maybe our macro have some issue</pre><pre> </pre></div>