<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">chen</b> <span dir="ltr"><<a href="mailto:chenm003@163.com">chenm003@163.com</a>></span><br>Date: Thu, Feb 5, 2015 at 5:55 PM<br>Subject: Re: [x265] [PATCH] blockcopy_pp_12x32: SSE2 asm code optimization<br>To: Development for x265 <<a href="mailto:x265-devel@videolan.org">x265-devel@videolan.org</a>><br><br><br><div style><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"></div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"></div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">>>this code is right</div>
<div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">>>but could you try use general register move (rN, rNd) in x64 mode?</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><br></div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">I applied your idea of using general register as buffer in x64 for 4x8 (easy to test with) but surprisingly using SIMD registers is faster. here I have the code and performance numbers:</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">copy_pp[ 4x8] 2.67x <b>139.98 </b> 374.18 [using <span style="line-height:23.7999992370605px">general register move (rN, rNd)</span><span style="line-height:1.7">] </span><span style="line-height:1.7"> </span></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">copy_pp[ 4x8] 3.34x <b>109.60 </b> 366.35 [SIMD registers as buffer]</span></font><br></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div style><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">codes: </span></font><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px">[using </span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px">general register move (rN, rNd)</span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">] </span><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"> </span></div><div style><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"> ;-----------------------------------------------------------------------------</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> ; void blockcopy_pp_4x8(pixel* dst, intptr_t dstStride, const pixel* src, intptr_t srcStride)</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> ;-----------------------------------------------------------------------------</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> INIT_XMM sse2</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> cglobal blockcopy_pp_4x8, 4, 10, 0</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> lea r4, [3 * r1]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> lea r5, [3 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r6d, [r2]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r7d, [r2 + r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r8d, [r2 + 2 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r9d, [r2 + r5]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0], r6d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + r1], r7d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + 2 * r1], r8d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + r4], r9d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> lea r2, [r2 + 4 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r6d, [r2]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r7d, [r2 + r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r8d, [r2 + 2 * r3]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov r9d, [r2 + r5]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> </div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> lea r0, [r0 + 4 * r1]</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0], r6d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + r1], r7d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + 2 * r1], r8d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><span class="" style="white-space:pre"> </span> mov [r0 + r4], r9d</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"> RET</div><div style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7">code <span style="line-height:23.7999992370605px">[SIMD registers as buffer]</span></div><div style><span style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:23.7999992370605px"> </span><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">INIT_XMM sse2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px">cglobal blockcopy_pp_4x8, 4, 6, 4</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> lea r4, [3 * r1]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> lea r5, [3 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m0, [r2]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m1, [r2 + r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m2, [r2 + 2 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m3, [r2 + r5]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0], m0</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + r1], m1</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + 2 * r1], m2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + r4], m3</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> lea r2, [r2 + 4 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m0, [r2]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m1, [r2 + r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m2, [r2 + 2 * r3]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd m3, [r2 + r5]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"><br></span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> lea r0, [r0 + 4 * r1]</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0], m0</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + r1], m1</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + 2 * r1], m2</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> movd [r0 + r4], m3</span></font></div><div><font color="#000000" face="arial"><span style="font-size:14px;line-height:23.7999992370605px"> RET</span></font></div></div><pre style="color:rgb(0,0,0);font-family:arial;font-size:14px;line-height:1.7"><br></pre></div></div><br></div>