[x265] [arm64] port costCoeffNxN

chen chenm003 at 163.com
Fri Mar 11 06:45:17 UTC 2022


Hi Sebastian,


Sorry for delay. this version looks good either.


For future loop optimize, we need first to modify algorithm.
For example, we access arrar as below
=====
absCoeff[numNonZero] = tmpCoeff[blkPos];
numNonZero += sig;
=====
These code break parallel due to dependent on unpredictable sig, If we allow the data in absCoeff to be stored sparsely, we can get parallel processing all of 16 elements.


Regards,
Min Chen



At 2022-03-05 04:24:09, "Pop, Sebastian" <spop at amazon.com> wrote:

Thanks Min Chen for your feedback.


Please see attached a patch that avoids one transfer from NEON to gpr by using `str h2, [x13]`.


I'm not sure how to optimize the loop, however I see that x86 avx2+bmi has a much shorter loop.


Do you recommend doing as the avx2 implementation?





Thanks,


Sebastian





From: x265-devel <x265-devel-bounces at videolan.org> on behalf of chen <chenm003 at 163.com>
Sent: Wednesday, March 2, 2022 10:20 PM
To: Development for x265
Subject: RE: [EXTERNAL] [x265] [arm64] port costCoeffNxN
 
|

CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.

|


Hi Sebastian,


Thank you for your contibution, the code looks good.


Just a little comment for future performance improve,
"fmov w12, s2" are expensive because data across Neon and Integer fields, especally it is inside the loop.
There are also some deep-seated data organization and algorithm problems, for example, we spends many instructions for absCoeff[numNonZero], if we allow spare zeros inside of array, we will reduce many of instructions.


Regards,
Min Chen




At 2022-03-02 07:28:15, "Pop, Sebastian" <spop at amazon.com> wrote:

Hi,





the attached patch fixes the registration of costCoeffNxN function hook and removes the early return that I used for testing.





Sebastian




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20220311/ba3efbcb/attachment.html>


More information about the x265-devel mailing list