[x265] [PATCH] added copy_shl primitive

chen chenm003 at 163.com
Wed Sep 3 17:53:18 CEST 2014


 

At 2014-09-02 22:13:31,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1409660231 -19800
># Node ID 61f7c056cd6e01e5a24a51b40c20c53bf4593ec7
># Parent  2667a0e3afdc2b95ff73c962b3e25366162d8e8d
>added copy_shl primitive
>
>diff -r 2667a0e3afdc -r 61f7c056cd6e source/common/x86/blockcopy8.asm
>--- a/source/common/x86/blockcopy8.asm	Tue Sep 02 15:31:10 2014 +0530
>+++ b/source/common/x86/blockcopy8.asm	Tue Sep 02 17:47:11 2014 +0530
>@@ -4476,3 +4476,152 @@
>     jg         .loop_row
> 
>     RET
>+
>+;--------------------------------------------------------------------------------------
>+; void copy_shl(int16_t *dst, int16_t *src, intptr_t stride, int shift)
>+;--------------------------------------------------------------------------------------
>+INIT_XMM sse2
>+cglobal copy_shl_4, 3,3,3
>+    add         r2d, r2d
>+    movd        m0, r3m
>+
>+    ; Row 0-3
>+    movu        m1, [r1 + 0 * mmsize]
>+    movu        m2, [r1 + 1 * mmsize]
>+    psllw       m1, m0
>+    psllw       m2, m0
>+    movh        [r0], m1
>+    movhps      [r0 + r2], m1
>+    movh        [r0 + r2 * 2], m2
>+    lea         r2, [r2 * 3]
>+    movhps      [r0 + r2], m2

reorder movh and lea, we may get same speed and less code size.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140903/336f2d6b/attachment.html>


More information about the x265-devel mailing list