<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><div style="margin: 0;">Hi Sebastian,</div><div style="margin: 0;"><br></div><div style="margin: 0;">Sorry for delay. this version looks good either.</div><div style="margin: 0;"><br></div><div style="margin: 0;">For future loop optimize, we need first to modify algorithm.</div><div style="margin: 0;">For example, we access arrar as below</div><div style="margin: 0;"><div style="margin: 0;">=====</div><div style="margin: 0;">absCoeff[numNonZero] = tmpCoeff[blkPos];</div><div style="margin: 0;">numNonZero += sig;</div><div>=====</div><div>These code break parallel due to dependent on unpredictable sig, If we allow the data in absCoeff to be stored sparsely, we can get parallel processing all of 16 elements.</div><div><br></div><div>Regards,</div><div>Min Chen</div><div><br></div></div><p>At 2022-03-05 04:24:09, "Pop, Sebastian" <spop@amazon.com> wrote:</p><blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
<style type="text/css" style="display:none"><!-- p { margin-top: 0px; margin-bottom: 0px; }--></style>
<p>Thanks Min Chen for your feedback.<br>
</p>
<p>Please see attached a patch that avoids one transfer from NEON to gpr by using `str h2, [x13]`.<br>
</p>
<p>I'm not sure how to optimize the loop, however I see that x86 avx2+bmi has a much shorter loop.<br>
</p>
<p>Do you recommend doing as the avx2 implementation?<br>
</p>
<p><br>
</p>
<p>Thanks,<br>
</p>
<p>Sebastian<br>
</p>
<p><br>
</p>
<div style="color: rgb(33, 33, 33);">
<hr tabindex="-1" style="display:inline-block; width:98%">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" color="#000000" style="font-size:11pt"><b>From:</b> x265-devel <x265-devel-bounces@videolan.org> on behalf of chen <chenm003@163.com><br>
<b>Sent:</b> Wednesday, March 2, 2022 10:20 PM<br>
<b>To:</b> Development for x265<br>
<b>Subject:</b> RE: [EXTERNAL] [x265] [arm64] port costCoeffNxN</font>
<div> </div>
</div>
<div>
<div class="WordSection1">
<table class="MsoTableGrid ntes_not_fresh_table" border="1" cellspacing="0" cellpadding="0" style="border-collapse:collapse; border:none">
<tbody>
<tr style="height:15.25pt">
<td width="711" valign="top" style="width:842.35pt; border:solid #ED7D31 1.5pt; padding:0in 5.4pt 0in 5.4pt; height:15.25pt">
<p><strong><span style="background:#FFFF99">CAUTION</span></strong><span style="background:#FFFF99">: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.</span></p>
</td>
</tr>
</tbody>
</table>
</div>
<br>
<div>
<div style="line-height:1.7; color:#000000; font-size:14px; font-family:Arial">
<div style="margin:0">Hi Sebastian,</div>
<div style="margin:0"><br>
</div>
<div style="margin:0">Thank you for your contibution, the code looks good.</div>
<div style="margin:0"><br>
</div>
<div style="margin:0">Just a little comment for future performance improve,</div>
<div style="margin:0">"fmov w12, s2" are expensive because data across Neon and Integer fields, especally it is inside the loop.</div>
<div style="margin:0">There are also some deep-seated data organization and algorithm problems, for example, we spends many instructions for absCoeff[numNonZero], if we allow spare zeros inside of array, we will reduce many of instructions.</div>
<div style="margin:0"><br>
</div>
<div style="margin:0">Regards,</div>
<div style="margin:0">Min Chen</div>
<div style="zoom:1"></div>
<div></div>
<p style="margin:0"><br>
</p>
<p>At 2022-03-02 07:28:15, "Pop, Sebastian" <spop@amazon.com> wrote:</p>
<blockquote id="isReplyContent" style="padding-left:1ex; margin:0px 0px 0px 0.8ex; border-left:#ccc 1px solid">
<style type="text/css" style="">
<!--
p
{margin-top:0px;
margin-bottom:0px}
-->
--></style>
<p>Hi,<br>
</p>
<p><br>
</p>
<p>the attached patch fixes the registration of costCoeffNxN function hook and removes the early return that I used for testing.<br>
</p>
<div dir="ltr" style="font-size:12pt; color:#000000; background-color:#FFFFFF; font-family:Calibri,Arial,Helvetica,sans-serif">
<div>
<p><br>
</p>
<p>Sebastian<br>
</p>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote><br><br></div>