<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><p style="margin: 0;">Hi <span style="font-family: Calibri, sans-serif; font-size: 14.6667px;">Sebastian,</span></p><div style="margin: 0;"><br></div><div style="margin: 0;">Thank you for your code.</div><div style="margin: 0;"><br></div><div style="margin: 0;">At first, sorry for delay, I am very busy on my family and my toy hardware codec in last week, I just have a little spare-time during weekend.</div><div style="margin: 0;">The next, I didn't take a look all of functions, but I made some comments on 64x64.</div><div style="margin: 0;"><br></div><div style="margin: 0;">On the function, unroll=8 (4*2) will get good performance on Out-Of-Order (OOO) CPU, but may drain performance due to cache miss and related issues on low-end CPU such as Cortex-A53, Of course, this is not problem on this versiong of patch.</div><div style="margin: 0;"><br></div><div style="margin: 0;">In the 64x64, the sum calculate by below code.</div><div style="margin: 0;">==========</div><p style="margin: 0;">+.macro SAD_END_64</p><p style="margin: 0;">+ add v16.8h, v16.8h, v17.8h</p><p style="margin: 0;">+ add v17.8h, v18.8h, v19.8h</p><p style="margin: 0;">+ add v16.8h, v16.8h, v17.8h</p><p style="margin: 0;">+ uaddlv s0, v16.8h</p><p style="margin: 0;">+ fmov w0, s0</p><p style="margin: 0;">+ add v18.8h, v20.8h, v21.8h</p><p style="margin: 0;">+ add v19.8h, v22.8h, v23.8h</p><p style="margin: 0;">+ add v17.8h, v18.8h, v19.8h</p><p style="margin: 0;">+ uaddlv s1, v17.8h</p><p style="margin: 0;">+ fmov w1, s1</p><p style="margin: 0;">+ add w0, w0, w1</p><p style="margin: 0;">+ ret</p><p style="margin: 0;">+.endm</p><div><div style="margin: 0px;">==========</div></div><div><br></div><div><div>You use two of UADDLV to avoid overflow, how about sum these partial registers on NEON field to reduce instruction UADDLV?</div></div><div>e.g.</div><div>UADDLP v16,v16</div><div>UADDLP v17,v17</div><div>ADD v16,v17</div><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><div style="margin: 0;">UADDLV s0,v16</div><div style="margin: 0;"><br></div><div style="margin: 0;">Regards,</div><div style="margin: 0;">Min Chen</div><p>2021-07-17 04:44:05£¬"Pop, Sebastian" <spop@amazon.com> </p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<style><!--
_font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
_font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
_page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Hi,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">the attached patch ports to arm64 the following kernels:</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 4x4] 10.11x 6.50 65.72<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 8x8] 28.95x 8.50 246.00<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 8x4] 23.03x 5.45 125.43<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 4x8] 12.09x 10.64 128.68<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[16x16] 53.37x 19.19 1024.05<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 16x8] 43.09x 11.62 500.84<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 8x16] 31.03x 16.87 523.44<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 16x4] 39.73x 6.27 249.10<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[16x12] 50.55x 15.10 763.44<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 4x16] 14.23x 19.39 275.91<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[12x16] 33.68x 22.95 772.81<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[32x32] 62.10x 64.84 4026.97<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[32x16] 59.82x 33.74 2018.56<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[16x32] 57.94x 35.01 2028.17<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 32x8] 53.98x 18.77 1013.48<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[32x24] 61.29x 49.36 3024.90<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[ 8x32] 31.84x 32.49 1034.56<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[24x32] 53.61x 56.39 3022.97<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[64x64] 65.24x 255.86 16692.29<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[64x32] 61.77x 131.16 8100.90<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[32x64] 62.31x 128.90 8031.79<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[64x16] 60.28x 67.35 4060.31<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[64x48] 62.53x 193.59 12104.64<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[16x64] 61.10x 66.13 4040.26<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad[48x64] 61.75x 194.68 12022.14<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Ok to commit?</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Thanks,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Sebastian</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
</blockquote></div>