<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com">chenm003@163.com</a>></span><br>Date: Tue, Sep 9, 2014 at 10:17 AM<br>Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>><br><br><br><div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">>>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?</div><div><font face="arial, sans-serif">what about</font><font color="#000000"><span style="font-size:14px;line-height:1.7"> </span></font><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.8000001907349px;line-height:normal">vinserti128 ?</span><br><br><font color="#000000"><span style="font-size:14px;line-height:1.7">At 2014-09-09 16:37:23,</span></font><a href="mailto:praveen@multicorewareinc.com" target="_blank" style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">praveen@multicorewareinc.com</a><font color="#000000"><span style="font-size:14px;line-height:1.7"> wrote:
># HG changeset patch
># User Praveen Tiwari
># Date 1410251834 -19800
># Node ID </span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">d011073f35258cb2f0ad95db6038c2</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">d9fb840b27
># Parent  </span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">ebb84e9dbb0fa0e8c4c9304b2efd57</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">f8ac3d0c05
>copy_cnt_4: faster AVX2 code
>
>diff -r ebb84e9dbb0f -r </span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">d011073f3525 source/common/</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">x86/blockcopy8.asm
>--- a/source/common/x86/</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">blockcopy8.asm        Tue Sep 09 11:36:58 2014 +0530
>+++ b/source/common/x86/</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">blockcopy8.asm        Tue Sep 09 14:07:14 2014 +0530
>@@ -3990,7 +3990,7 @@
> INIT_YMM avx2
> cglobal copy_cnt_4, 3,3,3
>     add         r2d, r2d
>-    xorpd       xm2, xm2
>+    xorpd       m2,  m2

>     ; row 0 & 1
>     movq        xm0, [r1]
>@@ -4004,11 +4004,9 @@
>     vinserti128 m0, m0, xm1,</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7"> 1
>     movu    [r0], m0

>-    vextractf128 xm1, m0, 1
>-    packsswb     xm0, xm1
>-    pcmpeqb      xm0, xm2
>-
>     ; get count
>+    packsswb    xm0, xm1
>+    pcmpeqb     xm0, xm2
>     pmovmskb    eax, xm0
>     not         ax
>     popcnt      ax, ax
>_____________________________</span></font><font color="#000000"><span style="font-size:14px;line-height:1.7">__________________
>x265-devel mailing list
></span></font><a href="mailto:x265-devel@videolan.org" target="_blank" style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">x265-devel@videolan.org</a><font color="#000000"><span style="font-size:14px;line-height:1.7">
></span></font><a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank" style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">https://mailman.videolan.org/listinfo/x265-devel</a>
</div></div><br>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<br></div><br></div>