[x265] Fwd: [PATCH] weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code

chen chenm003 at 163.com
Fri Oct 17 18:11:10 CEST 2014


 
At 2014-10-17 13:03:42,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:



---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Fri, Oct 17, 2014 at 3:11 AM
Subject: Re: [x265] [PATCH] weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code
To: Development for x265 <x265-devel at videolan.org>



 

At 2014-10-16 17:20:13,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1413451199 -19800
># Node ID 858be8d7d7176ab6c6d01cf92d00c8478fe99b34
># Parent  79702581ec824a2a375aebe228d69c3930aeea96
>weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code
>
>diff -r 79702581ec82 -r 858be8d7d717 source/common/x86/pixel-util8.asm
>--- a/source/common/x86/pixel-util8.asm	Wed Oct 15 17:49:35 2014 -0500
>+++ b/source/common/x86/pixel-util8.asm	Thu Oct 16 14:49:59 2014 +0530
>@@ -1375,6 +1375,60 @@
> 
>     RET
> 
>+INIT_YMM avx2
>+cglobal weight_pp, 6, 7, 6
>+
>+    mov          r6d, r6m
>+    shl          r6d, 6           ; m0 = [w0<<6] 
>+    movd         xm0, r6d
>+
>+    movd         xm1, r7m         ; m1 = [round]
>+    punpcklwd    xm0, xm1
>+    pshufd       xm0, xm0, 0 
>+    vinserti128  m0, m0, xm0, 1   ; assuming both (w0<<6) and round are using maximum of 16 bits each, m0 = [w0<<6 round]

>>vpbroadcastd is better
Yeah, exactly I tried to replace  (pshufd xm0, xm0, 0) + (vinserti128  m0, m0, xm0, 1) with vpbroadcastd m0, xm0 (as per document syntax, __m256i_mm256_broadcastd_epi32 
            (__m128i a)) but it throwing build error: invalid combination of opcode and operands.
In Intel document, you can see "VPBROADCASTD ymm1, xmm2/m32", It means you just write as "vpbroadcastd m0, xm0" or "vpbroadcastd m0, [r0]"
 
 
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20141018/451cce1b/attachment-0001.html>


More information about the x265-devel mailing list