[x265] [PATCH 0/3] AArch64 sse_pp Optimisations

Fri Jul 19 17:14:17 UTC 2024

Hi Chen,

Apologies for the delay in getting back to you.

Thank you for the comments on the patches.

>in the SSE_PP_8xN, how about two-lines format (.8b -> .16b), it just reduce one of UABD, I guess it is not performance change

You are correct that this is not beneficial for performance - the additional merge negates the benefit of removing the single UABD instruction.

>How about shared code in different size and reduce unroll?

For the block sizes that are fully unrolled at present, e.g. SSE_PP_16xN, reducing the unroll factor and sharing the code results in a performance regression. 

We have however updated SSE_PP_32xN to share the same code with a small wrapper, as this gives the same performance whilst decreasing the code size.

Many thanks,

Hari

-- 
2.42.1