[x265] [PATCH] added copy_shl primitive
chen
chenm003 at 163.com
Wed Sep 3 17:53:18 CEST 2014
At 2014-09-02 22:13:31,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1409660231 -19800
># Node ID 61f7c056cd6e01e5a24a51b40c20c53bf4593ec7
># Parent 2667a0e3afdc2b95ff73c962b3e25366162d8e8d
>added copy_shl primitive
>
>diff -r 2667a0e3afdc -r 61f7c056cd6e source/common/x86/blockcopy8.asm
>--- a/source/common/x86/blockcopy8.asm Tue Sep 02 15:31:10 2014 +0530
>+++ b/source/common/x86/blockcopy8.asm Tue Sep 02 17:47:11 2014 +0530
>@@ -4476,3 +4476,152 @@
> jg .loop_row
>
> RET
>+
>+;--------------------------------------------------------------------------------------
>+; void copy_shl(int16_t *dst, int16_t *src, intptr_t stride, int shift)
>+;--------------------------------------------------------------------------------------
>+INIT_XMM sse2
>+cglobal copy_shl_4, 3,3,3
>+ add r2d, r2d
>+ movd m0, r3m
>+
>+ ; Row 0-3
>+ movu m1, [r1 + 0 * mmsize]
>+ movu m2, [r1 + 1 * mmsize]
>+ psllw m1, m0
>+ psllw m2, m0
>+ movh [r0], m1
>+ movhps [r0 + r2], m1
>+ movh [r0 + r2 * 2], m2
>+ lea r2, [r2 * 3]
>+ movhps [r0 + r2], m2
reorder movh and lea, we may get same speed and less code size.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140903/336f2d6b/attachment.html>
More information about the x265-devel
mailing list