[x265] [PATCH] asm: avx2 asm code for dct4

chen chenm003 at 163.com
Wed Aug 27 18:32:14 CEST 2014


 

At 2014-08-27 21:32:32,dnyaneshwar at multicorewareinc.com wrote:
># HG changeset patch
># User Dnyaneshwar G dnyaneshwar at multicorewareinc.com>
># Date 1409145968 -19800
>#      Wed Aug 27 18:56:08 2014 +0530
># Node ID 27193515d4417c142fff97a1d96a3d7111b9d6d5
># Parent  77fe0cc583e8ec10275bc1b3c4bb116d5ceb51ac
>asm: avx2 asm code for dct4
>previous perf: 4.3x, with avx2: 5.4x
>
diff -r 77fe0cc583e8 -r 27193515d441 source/common/x86/dct8.asm
>--- a/source/common/x86/dct8.asm	Wed Aug 27 14:25:17 2014 +0530
>+++ b/source/common/x86/dct8.asm	Wed Aug 27 18:56:08 2014 +0530

>+    vinserti128     m0, m3, xm2, 1
>+    vpermq          m3, m3, 11101110b
>+    vinserti128     m2, m2, xm3, 0
>+    movu            [r1], m0
>+    movu            [r1 + mmsize], m2

    movu            [r1], xm3
    movu            [r1 + mmsize/2], m2
    vextracti128    [r1 + mmsize], m3, 1
    vextracti128    [r1 + mmsize + mmsize/2], m2, 1
 
replace by vextracti128 may release port5 on Haswell and reduce 6 cycles everytime
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20140828/54478b77/attachment.html>


More information about the x265-devel mailing list