[x265] [PATCH 0/7] AArch64 saoCuStats Optimisations

Karam Singh karam.singh at multicorewareinc.com
Mon Jul 1 03:46:38 UTC 2024


Patches 1 to 6 of this series are pushed.


*Karam Singh*

*Ph.D. IIT Guwahati*

*Senior Software (Video Coding) Engineer  *

Mobile: +91 8011279030

Block 9A, 6th floor, DLF Cyber City

Manapakkam, Chennai 600 089



On Thu, May 23, 2024 at 7:42 AM chen <chenm003 at 163.com> wrote:

> Hi Hari,
>
>
> The new patches looks good for me now, thank you for your patches.
>
>
> Regards,
>
> Chen
>
> At 2024-05-23 03:09:26, "Hari Limaye" <hari.limaye at arm.com> wrote:
> >Hi Chen,
> >
> >Thank you for reviewing the patches.
> >
> >>In signOf_neon
> >>>+ // signOf(a - b) = -(a > b) | (b > a)
> >>comments is not clear, suggest
> >>-(a > b ? -1 : 0) | ( a < b)
> >
> >I have posted updated versions of patches 3, 4, 6 to make these comments more clear with respect to the possible outputs of Neon comparison instructions.
> >
> >>In saoCuStatsBO_neon
> >>It is memory bandwidth optimize only, interval memory access strong depends on CPU pipeline design and >compiler, it is not generic, not sure how about on other kind of CPUs.
> >
> >Yes it is primarily a memory bandwidth optimisation - we have tested with recent GCC and Clang on a range of Neoverse CPUs and find it to be faster than the C implementation.
> >
> >>In saoCuStatsE*_neon
> >>No comments, it looks vmulq_s16+vmlaq_s16 reduce 1 instruction than vandq_s16+vandq_s16+vaddq_s16 or tbl/tbx, >it mostly faster on modern CPUs
> >
> >Yes, we found that this instruction sequence was faster than the alternatives, for the Neon implementation.
> >
> >Many thanks,
> >
> >Hari
>
> _______________________________________________
> x265-devel mailing list
> x265-devel at videolan.org
> https://mailman.videolan.org/listinfo/x265-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20240701/edf4f10e/attachment.htm>


More information about the x265-devel mailing list