<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><div> </div><pre><br>At 2014-09-02 22:13:31,praveen@multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1409660231 -19800
># Node ID 61f7c056cd6e01e5a24a51b40c20c53bf4593ec7
># Parent 2667a0e3afdc2b95ff73c962b3e25366162d8e8d
>added copy_shl primitive
>
>diff -r 2667a0e3afdc -r 61f7c056cd6e source/common/x86/blockcopy8.asm
>--- a/source/common/x86/blockcopy8.asm Tue Sep 02 15:31:10 2014 +0530
>+++ b/source/common/x86/blockcopy8.asm Tue Sep 02 17:47:11 2014 +0530
>@@ -4476,3 +4476,152 @@
> jg .loop_row
>
> RET
>+
>+;--------------------------------------------------------------------------------------
>+; void copy_shl(int16_t *dst, int16_t *src, intptr_t stride, int shift)
>+;--------------------------------------------------------------------------------------
>+INIT_XMM sse2
>+cglobal copy_shl_4, 3,3,3
>+ add r2d, r2d
>+ movd m0, r3m
>+
>+ ; Row 0-3
>+ movu m1, [r1 + 0 * mmsize]
>+ movu m2, [r1 + 1 * mmsize]
>+ psllw m1, m0
>+ psllw m2, m0
>+ movh [r0], m1
>+ movhps [r0 + r2], m1
>+ movh [r0 + r2 * 2], m2
>+ lea r2, [r2 * 3]
>+ movhps [r0 + r2], m2
</pre><pre>reorder movh and lea, we may get same speed and less code size.</pre></div>