[x265] [PATCH 0/2] AArch64: Fix SVE DCT implementations
Jonathan Wright
jonathan.wright at arm.com
Tue Jun 10 17:40:50 UTC 2025
Hi,
This patch series fixes bugs in the Arm SVE 16x16 and 32x32 DCT
implementations, and also mitigates a portion of the performance
regression due to the fix. Both SVE DCT implementations are still
sgnificantly faster than the equivalent Neon paths.
Note that the DCT unit tests did not show these bugs. They were found
after differences in encoded output videos were observed on Arm and
x86 for veryslow, slower and slow encoding presets. With these patches
applied encoded output matches for all speed presets.
Thanks,
Jonathan
Jonathan Wright (2):
AArch64: Fix SVE 16x16 and 32x32 DCT implementations
AArch64: Specialize passes of 16x16 and 32x32 SVE DCTs
source/common/aarch64/dct-prim-sve.cpp | 338 ++++++++++++++++++++++---
1 file changed, 306 insertions(+), 32 deletions(-)
--
2.39.5 (Apple Git-154)
More information about the x265-devel
mailing list