<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com">chenm003@163.com</a>></span><br>Date: Wed, Sep 10, 2014 at 12:14 PM<br>Subject: Re: [x265] Fwd:  [PATCH] copy_cnt_4: faster AVX2 code<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>><br><br><br><div><span class="" style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><div><br> </div>At 2014-09-10 09:34:31,"Praveen Tiwari" <<a href="mailto:praveen@multicorewareinc.com" target="_blank">praveen@multicorewareinc.com</a>> wrote:<br>
</span><blockquote style="padding-left:1ex;margin:0px 0px 0px 0.8ex;border-left-color:rgb(204,204,204);border-left-width:1px;border-left-style:solid">
<div dir="ltr"><br>
<div class="gmail_quote"><span class="" style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com" target="_blank">chenm003@163.com</a>></span><br>Date: Tue, Sep 9, 2014 at 10:17 AM<br>Subject: Re: [x265] [PATCH] copy_cnt_4: faster AVX2 code<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a>><br><br><br>
</span><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="">
<div style="font-size:14px;color:rgb(0,0,0);line-height:1.7;font-family:arial">>>Most operator is SSE2, just one movu, why we need AVX2 version on 4x4?</div>
</span><div><font face="arial, sans-serif">what about</font><font color="#000000"><span style="font-size:14px;line-height:1.7"> </span></font><span style="font-size:12px;color:rgb(34,34,34);line-height:normal;font-family:arial,sans-serif">vinserti128 ?</span><br><br>>>you want to use vinserti128 combin 128bits to 256 bits, is it more cost than two of movu </div></div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><br></div><div><font color="#000000"><span style="font-size:14px;line-height:1.7">I tested both sse and avx2 code on </span></font><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;line-height:normal">HASWELL-I5 </span><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px">machine</span></font><span style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:13px;line-height:normal">, </span><span style="color:rgb(0,0,0);font-size:14px;line-height:23.7999992370605px"> avx2 code seems a bit faster so, I think we should keep both versions. Here is result of 3 runs:</span></div><div><span style="color:rgb(0,0,0);font-size:14px;line-height:23.7999992370605px"><br></span></div><div><span style="color:rgb(0,0,0);font-size:14px;line-height:23.7999992370605px"><b>SSE VERSION:-</b></span></div><div><span style="color:rgb(0,0,0);font-size:14px;line-height:23.7999992370605px"> </span><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px">        copy_cnt[4x4]  4.21x    110.16          463.86<br></span></font></div><div><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px">        copy_cnt[4x4]  4.18x    104.64          437.08</span></font></div><div><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px">        copy_cnt[4x4]  4.17x    110.23          460.02</span></font></div><div><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px"><b>AVX2 VERSION:-</b></span></font></div><div><font color="#000000"><span style="font-size:14px;line-height:23.7999992370605px"><div>        copy_cnt[4x4]  4.71x    99.23           467.63</div><div>        copy_cnt[4x4]  4.39x    104.46          458.58</div><div>        copy_cnt[4x4]  4.71x    99.27           467.91</div></span></font></div></div></div></blockquote></div><br>_______________________________________________<br>
x265-devel mailing list<br>
<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a><br>
<a href="https://mailman.videolan.org/listinfo/x265-devel" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>
<br></div><br></div>