[x264-devel] x86: Fix integral_init4/8h_avx2

James Almer jamrial at gmail.com
Sun Oct 11 20:43:18 CEST 2015


On 10/11/2015 2:01 PM, Henrik Gramner wrote:
> x264 | branch: master | Henrik Gramner <henrik at gramner.com> | Thu Aug 27 19:53:00 2015 +0200| [67076513267907b5601828ae6864cc063c8c7548] | committer: Henrik Gramner
> 
> x86: Fix integral_init4/8h_avx2
> 
> The AVX2 implementation was using the wrong offsets. It went undetected due to
> the checkasm test being incorrect.
> 
>> http://git.videolan.org/gitweb.cgi/x264.git/?a=commit;h=67076513267907b5601828ae6864cc063c8c7548
> ---
> 
>  common/x86/mc-a2.asm |   14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/common/x86/mc-a2.asm b/common/x86/mc-a2.asm
> index 7fa72fc..727e9c8 100644
> --- a/common/x86/mc-a2.asm
> +++ b/common/x86/mc-a2.asm
> @@ -1511,11 +1511,12 @@ cglobal integral_init4h, 3,4
>      neg     r2
>      pxor    m4, m4
>  .loop:
> -    mova    m0, [r1+r2]
> +    mova   xm0, [r1+r2]
> +    mova   xm1, [r1+r2+16]
>  %if mmsize==32
> -    movu    m1, [r1+r2+8]
> +    vinserti128 m0, m0, [r1+r2+ 8], 1
> +    vinserti128 m1, m1, [r1+r2+24], 1

No sure if it will be faster, but you could try

vpermq m0, [r1+r2], q2110
vpermq m1, [r1+r2+8], q3221

Instead of mova + vinserti128, here and below.

>  %else
> -    mova    m1, [r1+r2+16]
>      palignr m1, m0, 8
>  %endif
>      mpsadbw m0, m4, 0
> @@ -1541,13 +1542,14 @@ cglobal integral_init8h, 3,4
>      neg     r2
>      pxor    m4, m4
>  .loop:
> -    mova    m0, [r1+r2]
> +    mova   xm0, [r1+r2]
> +    mova   xm1, [r1+r2+16]
>  %if mmsize==32
> -    movu    m1, [r1+r2+8]
> +    vinserti128 m0, m0, [r1+r2+ 8], 1
> +    vinserti128 m1, m1, [r1+r2+24], 1
>      mpsadbw m2, m0, m4, 100100b
>      mpsadbw m3, m1, m4, 100100b
>  %else
> -    mova    m1, [r1+r2+16]
>      palignr m1, m0, 8
>      mpsadbw m2, m0, m4, 100b
>      mpsadbw m3, m1, m4, 100b
> 
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> https://mailman.videolan.org/listinfo/x264-devel
> 



More information about the x264-devel mailing list