<div dir="ltr">All the patches of this series have been pushed to the master branch. <br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"></div></div></div><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><b>__________________________</b></div><div><b>Karam Singh</b></div><div><b>Ph.D. IIT Guwahati</b></div><div><font size="1">Senior Software (Video Coding) Engineer  </font></div><div><font size="1">Mobile: +91 8011279030</font></div><div><font size="1">Block 9A, 6th floor, DLF Cyber City</font></div><div><font size="1">Manapakkam, Chennai 600 089</font></div></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Sep 9, 2024 at 8:11 PM chen <<a href="mailto:chenm003@163.com">chenm003@163.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div style="line-height:1.7;color:rgb(0,0,0);font-size:14px;font-family:Arial"><div id="m_6826401163858910656spnEditorContent"><p style="margin:0px">Hi Hari,</p><p style="margin:0px"><br></p><p style="margin:0px">Thank for the details, we may keep your current verion, we may rewrite assembly to improve performance future.</p></div><pre><div>Regards,</div><div>Chen</div><div><br></div>At 2024-09-09 16:27:51, "Hari Limaye" <<a href="mailto:hari.limaye@arm.com" target="_blank">hari.limaye@arm.com</a>> wrote:

>Hi Chen,

>

>Thank you for reviewing the patches.

>

>Regarding the patch that you highlighted:

>    [PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp

>

>> performance result looks not good enough,

>The key result for this patch is the performance uplift for Neoverse N1 (1.123x), as this machine does not support Neon I8MM instructions.

>The results for the other machines are stated for completeness - however these machines will instead run the Neon I8MM implementation:

>

>  <a href="https://mailman.videolan.org/pipermail/x265-devel/2024-September/013907.html" target="_blank">https://mailman.videolan.org/pipermail/x265-devel/2024-September/013907.html</a>

>

>the uplift from which is copied here:

>

>  Geomean uplift across all block sizes for chroma filters, relative to

>  Armv8.4 Neon DotProd implementations:

>

>      Neoverse N2: 1.402x

>      Neoverse V1: 1.214x

>      Neoverse V2: 1.289x

>

>>and why shortcut branch in case (coeffIdx == 4)?

>As the Armv8.0 Neon implementation can be highly specialized for coeffIdx of 4, the Armv8.4 Neon DotProd implementation is not faster for this filter - so we dispatch to the Armv8.0 Neon implementation in this case.

>The uplift for the other values of coeffIdx from the Armv8.4 Neon DotProd implementation (on Neoverse N1) is significant.

>

>Many thanks,

>Hari

>

</pre></div>_______________________________________________<br>

x265-devel mailing list<br>

<a href="mailto:x265-devel@videolan.org" target="_blank">x265-devel@videolan.org</a><br>

<a href="https://mailman.videolan.org/listinfo/x265-devel" rel="noreferrer" target="_blank">https://mailman.videolan.org/listinfo/x265-devel</a><br>

</blockquote></div>