<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><p style="margin: 0;">Hi,</p><p style="margin: 0;"><br></p><p style="margin: 0;">Some comments,</p><p style="margin: 0;"><br></p><p style="margin: 0;">+.macro SAD_X_END_64 x</p><div style="margin: 0;">+ uaddlp v16.4s, v16.8h</div><div style="margin: 0;">The dynamic range is 64*255 = 16320 -> 14-bits, so we are not need extend to 32-bits in here</div><div style="margin: 0;"><br></div><p style="margin: 0;">+ uaddlp v17.4s, v17.8h</p><p style="margin: 0;">+ uaddlp v18.4s, v18.8h</p><p style="margin: 0;">+ uaddlp v20.4s, v20.8h</p><p style="margin: 0;">+ uaddlp v21.4s, v21.8h</p><p style="margin: 0;">+ uaddlp v22.4s, v22.8h</p><p style="margin: 0;">+ add v16.4s, v16.4s, v20.4s</p><p style="margin: 0;">+ add v17.4s, v17.4s, v21.4s</p><p style="margin: 0;">+ add v18.4s, v18.4s, v22.4s</p><p style="margin: 0;">+ trn2 v20.2d, v16.2d, v16.2d</p><p style="margin: 0;">+ trn2 v21.2d, v17.2d, v17.2d</p><p style="margin: 0;">+ trn2 v22.2d, v18.2d, v18.2d</p><p style="margin: 0;">+ add v16.2s, v16.2s, v20.2s</p><div style="margin: 0;"><div style="margin: 0px;"><br></div></div><div style="margin: 0;">+ add v17.2s, v17.2s, v21.2s</div><div style="margin: 0;">+ add v18.2s, v18.2s, v22.2s</div><div style="margin: 0;">+ uaddlp v16.1d, v16.2s</div><div style="margin: 0;"><div style="margin: 0px;">ADD+TRN2+ADD generate sum of v16+v20 in V.2s, follow by UADDLP into V.1s</div><div style="margin: 0px;"><br></div><div style="margin: 0px;">As we analyze dynamic range in above, we can replace it by</div><div style="margin: 0px;">ADD v16, v20 ; 15-bits</div><div style="margin: 0px;"> (ignore inst for V17=V17+V21, etc)</div><div style="margin: 0px;">ADD v16, V17 ; 16-bits</div></div><div style="margin: 0;"> (ignore other registers)</div><div style="margin: 0;">ADDLV s0,v16</div><div style="margin: 0;"><br></div><div style="margin: 0;"><br></div><div style="margin: 0;">+ uaddlp v17.1d, v17.2s</div><div style="margin: 0;">+ uaddlp v18.1d, v18.2s</div><div style="margin: 0;"><br></div><div style="margin: 0;">+ st1 {v16.s}[0], [x6], #4</div><div style="margin: 0;">+ st1 {v17.s}[0], [x6], #4</div><p style="margin: 0;">+ st1 {v18.s}[0], [x6], #4</p><div>I guess STP may store two result in a cycle</div><div><br></div><div><br></div><div>Regards,</div><div>Min Chen</div><div style="position:relative;zoom:1"></div><div id="divNeteaseMailCard"></div><p style="margin: 0;"><br></p><p> 2021-07-22 14:30:50£¬"Pop, Sebastian" <spop@amazon.com> </p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<style><!--
_font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
_font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
_page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">the attached patch ports to arm64 the following kernels:</span><span style="font-size:11.0pt"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 4x4] 12.23x 13.79 168.68<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 4x4] 14.12x 15.82 223.43<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 8x8] 35.05x 17.45 611.47<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 8x8] 38.48x 21.18 814.95<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 8x4] 27.19x 11.46 311.48<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 8x4] 30.40x 13.60 413.37<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 4x8] 14.16x 22.99 325.37<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 4x8] 15.82x 27.39 433.23<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[16x16] 40.94x 57.94 2371.97<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[16x16] 43.63x 72.44 3160.44<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 16x8] 38.84x 30.54 1186.15<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 16x8] 39.23x 40.16 1575.43<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 8x16] 38.74x 31.43 1217.71<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 8x16] 41.48x 39.01 1618.17<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 16x4] 31.82x 18.88 600.72<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 16x4] 36.35x 21.87 795.00<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[16x12] 40.27x 43.87 1766.74<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[16x12] 42.58x 55.94 2381.75<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 4x16] 15.34x 42.16 646.67<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 4x16] 17.08x 51.06 872.12<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[12x16] 29.45x 61.06 1798.28<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[12x16] 30.39x 78.94 2399.17<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[32x32] 42.85x 216.39 9272.65<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[32x32] 42.53x 294.98 12544.76<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[32x16] 42.09x 110.35 4644.86<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[32x16] 41.71x 151.05 6301.01<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[16x32] 44.19x 106.99 4728.04<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[16x32] 44.72x 139.94 6257.96<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 32x8] 40.10x 58.16 2332.47<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 32x8] 41.17x 76.65 3155.96<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[32x24] 42.69x 162.76 6947.64<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[32x24] 42.08x 223.88 9421.46<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[ 8x32] 41.86x 57.89 2423.47<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[ 8x32] 45.26x 71.56 3239.07<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[24x32] 45.10x 155.22 6999.53<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[24x32] 45.30x 205.87 9325.60<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[64x64] 39.87x 925.36 36892.50<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[64x64] 40.80x 1214.79 49557.66<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[64x32] 39.40x 468.08 18444.51<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[64x32] 40.71x 609.27 24803.74<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[32x64] 43.48x 426.05 18522.95<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[32x64] 43.31x 577.80 25024.14<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[64x16] 38.67x 238.72 9231.84<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[64x16] 40.36x 308.10 12435.08<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[64x48] 39.70x 695.95 27628.87<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[64x48] 40.74x 912.56 37173.46<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[16x64] 44.85x 208.19 9337.52<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[16x64] 45.46x 274.68 12487.54<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x3[48x64] 42.68x 653.74 27903.74<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"> sad_x4[48x64] 44.67x 835.79 37336.87<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Ok to commit?</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Thanks,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Sebastian</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
</blockquote></div></div>