[x265] [PATCH 0/3] AArch64 sse_pp Optimisations
Hari Limaye
hari.limaye at arm.com
Fri Jul 19 17:14:17 UTC 2024
Hi Chen,
Apologies for the delay in getting back to you.
Thank you for the comments on the patches.
>in the SSE_PP_8xN, how about two-lines format (.8b -> .16b), it just reduce one of UABD, I guess it is not performance change
You are correct that this is not beneficial for performance - the additional merge negates the benefit of removing the single UABD instruction.
>How about shared code in different size and reduce unroll?
For the block sizes that are fully unrolled at present, e.g. SSE_PP_16xN, reducing the unroll factor and sharing the code results in a performance regression.
We have however updated SSE_PP_32xN to share the same code with a small wrapper, as this gives the same performance whilst decreasing the code size.
Many thanks,
Hari
--
2.42.1
More information about the x265-devel
mailing list