[x265] [arm64] port costCoeffNxN

Fri Mar 4 20:24:09 UTC 2022

Thanks Min Chen for your feedback.

Please see attached a patch that avoids one transfer from NEON to gpr by using `str h2, [x13]`.

I'm not sure how to optimize the loop, however I see that x86 avx2+bmi has a much shorter loop.

Do you recommend doing as the avx2 implementation?

Thanks,

Sebastian

________________________________
From: x265-devel <x265-devel-bounces at videolan.org> on behalf of chen <chenm003 at 163.com>
Sent: Wednesday, March 2, 2022 10:20 PM
To: Development for x265
Subject: RE: [EXTERNAL] [x265] [arm64] port costCoeffNxN

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

Hi Sebastian,

Thank you for your contibution, the code looks good.

Just a little comment for future performance improve,
"fmov w12, s2" are expensive because data across Neon and Integer fields, especally it is inside the loop.
There are also some deep-seated data organization and algorithm problems, for example, we spends many instructions for absCoeff[numNonZero], if we allow spare zeros inside of array, we will reduce many of instructions.

Regards,
Min Chen

At 2022-03-02 07:28:15, "Pop, Sebastian" <spop at amazon.com> wrote:

Hi,

the attached patch fixes the registration of costCoeffNxN function hook and removes the early return that I used for testing.

Sebastian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220304/ce1cfdec/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-arm64-port-costCoeffNxN.patch
Type: text/x-patch
Size: 7453 bytes
Desc: 0002-arm64-port-costCoeffNxN.patch
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220304/ce1cfdec/attachment.bin>