[x265] [PATCH 0/2] AArch64: Fix SVE DCT implementations

Jonathan Wright jonathan.wright at arm.com
Tue Jun 10 17:40:50 UTC 2025


Hi,

This patch series fixes bugs in the Arm SVE 16x16 and 32x32 DCT
implementations, and also mitigates a portion of the performance
regression due to the fix. Both SVE DCT implementations are still
sgnificantly faster than the equivalent Neon paths.

Note that the DCT unit tests did not show these bugs. They were found
after differences in encoded output videos were observed on Arm and
x86 for veryslow, slower and slow encoding presets. With these patches
applied encoded output matches for all speed presets.

Thanks,
Jonathan

Jonathan Wright (2):
  AArch64: Fix SVE 16x16 and 32x32 DCT implementations
  AArch64: Specialize passes of 16x16 and 32x32 SVE DCTs

 source/common/aarch64/dct-prim-sve.cpp | 338 ++++++++++++++++++++++---
 1 file changed, 306 insertions(+), 32 deletions(-)

-- 
2.39.5 (Apple Git-154)



More information about the x265-devel mailing list