[x265] [PATCH] avx2: 'integral4v' asm code -> 7.48x faster than 'C' version

chen chenm003 at 163.com
Mon May 8 16:38:49 CEST 2017


Hi Guillaume,


Our development platform is Visual Studio, the compiler can't auto-vectorize.
We also can't assume user have advanced compiler on their computer.


Regards,
Min


At 2017-05-08 19:36:24,"Guillaume POIRIER" <poirierg at gmail.com> wrote:
>Hello Praveen Tiwari,
>
>Just for curiosity, when comparing your code's performance with the
>plain C version, did you give a chance too the compiler to vectorize
>the code itself?
>Such a trivial loop should not be difficult to handle for the compiler
>I think...
>
>Cheers,
>
>Guillaume
>
>
>On Mon, May 8, 2017 at 6:31 AM,  <praveen at multicorewareinc.com> wrote:
>> # HG changeset patch
>> # User Praveen Tiwari <praveen at multicorewareinc.com>
>> # Date 1493905428 -19800
>> #      Thu May 04 19:13:48 2017 +0530
>> # Node ID 41611825c2f4661536500e1306db7d8c4bf7fd07
>> # Parent  48502979a4b21f6982dcdacbf7796bf5d9fb395c
>> avx2: 'integral4v' asm code -> 7.48x faster than 'C' version
>>
>>    integral_init4v  7.48x    202.53          1515.14
>>
>> diff -r 48502979a4b2 -r 41611825c2f4 source/common/x86/seaintegral.asm
>> --- a/source/common/x86/seaintegral.asm Wed May 03 11:26:26 2017 +0530
>> +++ b/source/common/x86/seaintegral.asm Thu May 04 19:13:48 2017 +0530
>> @@ -32,8 +32,19 @@
>>  ;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
>>  ;-----------------------------------------------------------------------------
>>  INIT_YMM avx2
>> -cglobal integral4v, 2, 2, 0
>> -
>> +cglobal integral4v, 2, 3, 2
>> +    mov r2, r1
>> +    shl r2, 4
>> +
>> +.loop
>> +    movu    m0, [r0]
>> +    movu    m1, [r0 + r2]
>> +    psubd   m1, m0
>> +    movu    [r0], m1
>> +    add     r0, 32
>> +    sub     r1, 8
>> +    cmp     r1, 0
>> +    jnz     .loop
>>      RET
>>
>>  ;-----------------------------------------------------------------------------
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>
>
>
>-- 
>Wearing a Rolex is like driving an Audi: It says you've got some
>money, but nothing to say.
>John Lefèvre
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20170508/1056fbd1/attachment.html>


More information about the x265-devel mailing list