[x265] [PATCH 00/14] AArch64: Add Armv8.4 Neon DotProd and Armv8.6 Neon I8MM implementations of ipfilter primitives
chen
chenm003 at 163.com
Mon Sep 9 14:40:42 UTC 2024
Hi Hari,
Thank for the details, we may keep your current verion, we may rewrite assembly to improve performance future.
Regards,
Chen
At 2024-09-09 16:27:51, "Hari Limaye" <hari.limaye at arm.com> wrote:
>Hi Chen,
>
>Thank you for reviewing the patches.
>
>Regarding the patch that you highlighted:
> [PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
>
>> performance result looks not good enough,
>The key result for this patch is the performance uplift for Neoverse N1 (1.123x), as this machine does not support Neon I8MM instructions.
>The results for the other machines are stated for completeness - however these machines will instead run the Neon I8MM implementation:
>
> https://mailman.videolan.org/pipermail/x265-devel/2024-September/013907.html
>
>the uplift from which is copied here:
>
> Geomean uplift across all block sizes for chroma filters, relative to
> Armv8.4 Neon DotProd implementations:
>
> Neoverse N2: 1.402x
> Neoverse V1: 1.214x
> Neoverse V2: 1.289x
>
>>and why shortcut branch in case (coeffIdx == 4)?
>As the Armv8.0 Neon implementation can be highly specialized for coeffIdx of 4, the Armv8.4 Neon DotProd implementation is not faster for this filter - so we dispatch to the Armv8.0 Neon implementation in this case.
>The uplift for the other values of coeffIdx from the Armv8.4 Neon DotProd implementation (on Neoverse N1) is significant.
>
>Many thanks,
>Hari
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240909/615f7658/attachment.htm>
More information about the x265-devel
mailing list