<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin: 0;">Looks good for me.</div><div style="margin: 0;"><br></div><div style="margin: 0;">There have some little improve, it may update in future version.</div><div style="margin: 0;">For example,</div><div style="margin: 0;"><br></div><div style="margin: 0;"><div style="margin: 0;">+ mov w12, #32</div><div style="margin: 0;">+ dup v16.4s, w12</div><div>Equal to</div><div>MOVI v16.4s,#32</div><div><br></div><div>We may get more performance by reorder compare & branch</div><div><div>+ cmp x4, #0</div><div>+ b.eq 0f</div><div>+ cmp x4, #1</div><div>+ b.eq 1f</div><div>+ cmp x4, #2</div><div>+ b.eq 2f</div><div>+ cmp x4, #3</div><div>+ b.eq 3f</div><div>+0:</div></div><div><br></div></div><p>At 2021-07-07 00:01:17, "Pop, Sebastian" <spop@amazon.com> wrote:</p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<style><!--
_font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
_font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
_page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Thanks for your careful reviews.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">I addressed the problems for eor and rodata.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Please see the attached patch.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Sebastian<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="color:black">From: </span></b><span style="color:black">x265-devel <x265-devel-bounces@videolan.org> on behalf of chen <chenm003@163.com><br>
<b>Reply-To: </b>Development for x265 <x265-devel@videolan.org><br>
<b>Date: </b>Friday, July 2, 2021 at 8:11 PM<br>
<b>To: </b>Development for x265 <x265-devel@videolan.org><br>
<b>Subject: </b>RE: [EXTERNAL] [x265] [arm64] port LUMA_VPP_4xN<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
</div>
<div>
<table class="MsoNormalTable ntes_not_fresh_table" border="0" cellspacing="0" cellpadding="0" style="border-collapse:collapse">
<tbody>
<tr style="height:15.25pt">
<td width="1123" valign="top" style="width:842.35pt;border:solid #ED7D31 1.5pt;padding:0in 5.4pt 0in 5.4pt;height:15.25pt">
<p><strong><span style="font-family:"Calibri",sans-serif;color:black;background:#FFFF99">CAUTION</span></strong><span style="color:black;background:#FFFF99">: This email originated from outside of the organization. Do not click links or open attachments unless
you can confirm the sender and know the content is safe.</span><o:p></o:p></p>
</td>
</tr>
</tbody>
</table>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">Hi,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">I put my comments inline. thanks.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">btw: I found more improve on this patch.<o:p></o:p></span></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">+ eor v17.16b, v17.16b, v17.16b<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">The clear register operator may replace by MOVI<o:p></o:p></span></p>
</div>
</div>
<p><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">At 2021-07-03 02:43:07, "Pop, Sebastian" <spop@amazon.com> wrote:<o:p></o:p></span></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-right:0in" id="isReplyContent">
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Hi,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">thanks for your review.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> +#ifdef __MACH__</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> +# define MACH</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> +#else</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> +# define MACH #</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> This is not good idea to bypass .const_data</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">MACH uses ".const_data" directive, which is invalid for ELF.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">For ELF the directive is ".rodata":</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> ELF .section .rodata</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> MACH .const_data</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">[MC] I means you may declare MACH_RODATA so similar macro, it is empty on ELF but something on Macho, I guess it better than '#' to bypass unnecessary statement.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + ushll v0.8h, v0.8b, #0</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> ...</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + mul v16.8h, v0.8h, v24.8h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> Why not MULL?</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">That would not work for the rest of the computation.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Part of the data in v0 gets used in the next computation,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">and then I would have to split mla into a mull + add.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color:black">[MC] This is depends on your algorithm, in your code<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">below, you combin row1 & row2 and multiplier<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">coeff[0], however, it also works with 8b x 8b<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">with UMULL.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">However, it is a little complex algorithm,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">so we can keep this version and improve in<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">future.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color:black">*** Code<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + mul v16.8h, v0.8h, v24.8h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + ext v21.16b, v0.16b, v1.16b, #8</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + mul v17.8h, v21.8h, v24.8h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + mov v0.16b, v1.16b</span><span style="color:black"><o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:"Arial",sans-serif;color:black">*** End<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"><br>
<br>
</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + orr v0.16b, v1.16b, v1.16b</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> This is equal to MOV, I guess compiler will replace to right instruction on ARM64</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">I replaced orr with mov instructions.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + // sum row[0-7]</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + dup v18.2d, v16.d[1]</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + dup v19.2d, v17.d[1]</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + add v16.4h, v16.4h, v18.4h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + add v17.4h, v17.4h, v19.4h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> + trn1 v16.2d, v16.2d, v17.2d</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">> How about ADDP?</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">I replaced the above 5 instructions with the following 3 and the performance improved.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> trn1 v20.2d, v16.2d, v17.2d</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> trn2 v21.2d, v16.2d, v17.2d</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> add v16.8h, v20.8h, v21.8h</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Please see attached the amended patch.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black"> </span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Thanks,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;color:black">Sebastian</span><span style="color:black"><o:p></o:p></span></p>
</blockquote>
</div>
</div>
</div>
</blockquote></div>