<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com">chenm003@163.com</a>></span><br>Date: Thu, Feb 5, 2015 at 5:55 PM<br>Subject: Re: [x265] [PATCH] blockcopy_pp_12x32: SSE2 asm code optimization<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>><br><br><br><div style><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"></div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"></div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">>>this code is right</div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">>>but could you try use general register move (rN, rNd) in x64 mode?</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><br></div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">I applied your idea of using general register as buffer in x64 for 4x8 (easy to test with) but surprisingly using SIMD registers is faster. here I have the code and performance numbers:</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">copy_pp[  4x8]  2.67x    <b>139.98 </b>         374.18          [using <span style="line-height:23.7999992370605px">general register move (rN, rNd)</span><span style="line-height:1.7">] </span><span style="line-height:1.7"> </span></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">copy_pp[  4x8]  3.34x    <b>109.60 </b>         366.35          [SIMD registers as buffer]</span></font><br></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">codes: </span></font><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px">[using </span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px">general register move (rN, rNd)</span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">] </span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"> </span></div><div style><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">        ;-----------------------------------------------------------------------------</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span> ; void blockcopy_pp_4x8(pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride)</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> ;-----------------------------------------------------------------------------</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">        </span> INIT_XMM sse2</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> cglobal blockcopy_pp_4x8, 4, 10, 0</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">    </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     lea     r4,    [3 * r1]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     lea     r5,    [3 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     mov     r6d,     [r2]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">       </span>     mov     r7d,     [r2 + r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span>     mov     r8d,     [r2 + 2 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">      </span>     mov     r9d,     [r2 + r5]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     mov     [r0],          r6d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">       </span>     mov     [r0 + r1],     r7d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span>     mov     [r0 + 2 * r1], r8d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">    </span>     mov     [r0 + r4],     r9d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     lea      r2,     [r2 + 4 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     mov     r6d,     [r2]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">       </span>     mov     r7d,     [r2 + r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span>     mov     r8d,     [r2 + 2 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">      </span>     mov     r9d,     [r2 + r5]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">     </span>     lea      r0,            [r0 + 4 * r1]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span>     mov     [r0],          r6d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">       </span>     mov     [r0 + r1],     r7d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">  </span>     mov     [r0 + 2 * r1], r8d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre">    </span>     mov     [r0 + r4],     r9d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">    RET</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">code <span style="line-height:23.7999992370605px">[SIMD registers as buffer]</span></div><div style><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px"> </span><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">INIT_XMM sse2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">cglobal blockcopy_pp_4x8, 4, 6, 4</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    lea     r4,    [3 * r1]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    lea     r5,    [3 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m0,     [r2]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m1,     [r2 + r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m2,     [r2 + 2 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m3,     [r2 + r5]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0],          m0</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + r1],     m1</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + 2 * r1], m2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + r4],     m3</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    lea      r2,     [r2 + 4 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m0,     [r2]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m1,     [r2 + r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m2,     [r2 + 2 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     m3,     [r2 + r5]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    lea      r0,            [r0 + 4 * r1]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0],          m0</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + r1],     m1</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + 2 * r1], m2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    movd     [r0 + r4],     m3</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">    RET</span></font></div></div><pre style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><br></pre></div></div><br></div>