[x265] [PATCH] avx2: 'integral4v' asm code -> 7.48x faster than 'C' version
chen
chenm003 at 163.com
Mon May 8 16:38:49 CEST 2017
Hi Guillaume,
Our development platform is Visual Studio, the compiler can't auto-vectorize.
We also can't assume user have advanced compiler on their computer.
Regards,
Min
At 2017-05-08 19:36:24,"Guillaume POIRIER" <poirierg at gmail.com> wrote:
>Hello Praveen Tiwari,
>
>Just for curiosity, when comparing your code's performance with the
>plain C version, did you give a chance too the compiler to vectorize
>the code itself?
>Such a trivial loop should not be difficult to handle for the compiler
>I think...
>
>Cheers,
>
>Guillaume
>
>
>On Mon, May 8, 2017 at 6:31 AM, <praveen at multicorewareinc.com> wrote:
>> # HG changeset patch
>> # User Praveen Tiwari <praveen at multicorewareinc.com>
>> # Date 1493905428 -19800
>> # Thu May 04 19:13:48 2017 +0530
>> # Node ID 41611825c2f4661536500e1306db7d8c4bf7fd07
>> # Parent 48502979a4b21f6982dcdacbf7796bf5d9fb395c
>> avx2: 'integral4v' asm code -> 7.48x faster than 'C' version
>>
>> integral_init4v 7.48x 202.53 1515.14
>>
>> diff -r 48502979a4b2 -r 41611825c2f4 source/common/x86/seaintegral.asm
>> --- a/source/common/x86/seaintegral.asm Wed May 03 11:26:26 2017 +0530
>> +++ b/source/common/x86/seaintegral.asm Thu May 04 19:13:48 2017 +0530
>> @@ -32,8 +32,19 @@
>> ;void integral_init4v_c(uint32_t *sum4, intptr_t stride)
>> ;-----------------------------------------------------------------------------
>> INIT_YMM avx2
>> -cglobal integral4v, 2, 2, 0
>> -
>> +cglobal integral4v, 2, 3, 2
>> + mov r2, r1
>> + shl r2, 4
>> +
>> +.loop
>> + movu m0, [r0]
>> + movu m1, [r0 + r2]
>> + psubd m1, m0
>> + movu [r0], m1
>> + add r0, 32
>> + sub r1, 8
>> + cmp r1, 0
>> + jnz .loop
>> RET
>>
>> ;-----------------------------------------------------------------------------
>> _______________________________________________
>> x265-devel mailing list
>> x265-devel at videolan.org
>> https://mailman.videolan.org/listinfo/x265-devel
>
>
>
>--
>Wearing a Rolex is like driving an Audi: It says you've got some
>money, but nothing to say.
>John Lefèvre
>_______________________________________________
>x265-devel mailing list
>x265-devel at videolan.org
>https://mailman.videolan.org/listinfo/x265-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20170508/1056fbd1/attachment.html>
More information about the x265-devel
mailing list