[x265] Fwd: [PATCH] weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code
chen
chenm003 at 163.com
Fri Oct 17 18:11:10 CEST 2014
At 2014-10-17 13:03:42,"Praveen Tiwari" <praveen at multicorewareinc.com> wrote:
---------- Forwarded message ----------
From: chen<chenm003 at 163.com>
Date: Fri, Oct 17, 2014 at 3:11 AM
Subject: Re: [x265] [PATCH] weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code
To: Development for x265 <x265-devel at videolan.org>
At 2014-10-16 17:20:13,praveen at multicorewareinc.com wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1413451199 -19800
># Node ID 858be8d7d7176ab6c6d01cf92d00c8478fe99b34
># Parent 79702581ec824a2a375aebe228d69c3930aeea96
>weight_pp avx2 asm code, improved from 8608.65 cycles to 5138.09 cycles over sse version of asm code
>
>diff -r 79702581ec82 -r 858be8d7d717 source/common/x86/pixel-util8.asm
>--- a/source/common/x86/pixel-util8.asm Wed Oct 15 17:49:35 2014 -0500
>+++ b/source/common/x86/pixel-util8.asm Thu Oct 16 14:49:59 2014 +0530
>@@ -1375,6 +1375,60 @@
>
> RET
>
>+INIT_YMM avx2
>+cglobal weight_pp, 6, 7, 6
>+
>+ mov r6d, r6m
>+ shl r6d, 6 ; m0 = [w0<<6]
>+ movd xm0, r6d
>+
>+ movd xm1, r7m ; m1 = [round]
>+ punpcklwd xm0, xm1
>+ pshufd xm0, xm0, 0
>+ vinserti128 m0, m0, xm0, 1 ; assuming both (w0<<6) and round are using maximum of 16 bits each, m0 = [w0<<6 round]
>>vpbroadcastd is better
Yeah, exactly I tried to replace (pshufd xm0, xm0, 0) + (vinserti128 m0, m0, xm0, 1) with vpbroadcastd m0, xm0 (as per document syntax, __m256i_mm256_broadcastd_epi32
(__m128i a)) but it throwing build error: invalid combination of opcode and operands.
In Intel document, you can see "VPBROADCASTD ymm1, xmm2/m32", It means you just write as "vpbroadcastd m0, xm0" or "vpbroadcastd m0, [r0]"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20141018/451cce1b/attachment-0001.html>
More information about the x265-devel
mailing list