[x265] [PATCH 00/14] AArch64: Add Armv8.4 Neon DotProd and Armv8.6 Neon I8MM implementations of ipfilter primitives
Karam Singh
karam.singh at multicorewareinc.com
Mon Sep 9 16:32:40 UTC 2024
All the patches of this series have been pushed to the master branch.
*__________________________*
*Karam Singh*
*Ph.D. IIT Guwahati*
Senior Software (Video Coding) Engineer
Mobile: +91 8011279030
Block 9A, 6th floor, DLF Cyber City
Manapakkam, Chennai 600 089
On Mon, Sep 9, 2024 at 8:11 PM chen <chenm003 at 163.com> wrote:
> Hi Hari,
>
>
> Thank for the details, we may keep your current verion, we may rewrite
> assembly to improve performance future.
>
> Regards,
> Chen
>
> At 2024-09-09 16:27:51, "Hari Limaye" <hari.limaye at arm.com> wrote:
> >Hi Chen,
> >
> >Thank you for reviewing the patches.
> >
> >Regarding the patch that you highlighted:
> > [PATCH 04/14] AArch64: Add Armv8.4 Neon DotProd implementations of filter_hpp
> >
> >> performance result looks not good enough,
> >The key result for this patch is the performance uplift for Neoverse N1 (1.123x), as this machine does not support Neon I8MM instructions.
> >The results for the other machines are stated for completeness - however these machines will instead run the Neon I8MM implementation:
> >
> > https://mailman.videolan.org/pipermail/x265-devel/2024-September/013907.html
> >
> >the uplift from which is copied here:
> >
> > Geomean uplift across all block sizes for chroma filters, relative to
> > Armv8.4 Neon DotProd implementations:
> >
> > Neoverse N2: 1.402x
> > Neoverse V1: 1.214x
> > Neoverse V2: 1.289x
> >
> >>and why shortcut branch in case (coeffIdx == 4)?
> >As the Armv8.0 Neon implementation can be highly specialized for coeffIdx of 4, the Armv8.4 Neon DotProd implementation is not faster for this filter - so we dispatch to the Armv8.0 Neon implementation in this case.
> >The uplift for the other values of coeffIdx from the Armv8.4 Neon DotProd implementation (on Neoverse N1) is significant.
> >
> >Many thanks,
> >Hari
> >
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240909/27982cd7/attachment.htm>
More information about the x265-devel
mailing list