[x265] [PATCH 0/2] AArch64: DCT optimization and NEON intrapred fix
Wiki Deng
wiki.deng at hj-micro.com
Fri Apr 10 09:18:49 UTC 2026
Hi,
This patch series contains two AArch64 NEON optimizations for x265:
Patch 1: AArch64: Optimize DCT kernels with stride-aware implementations
The previous DCT implementation performed an unnecessary memcpy into a
contiguous buffer before running transforms. This patch introduces
stride-aware versions of all DCT butterfly kernels (4x4, 8x8, 16x16,
32x32) that load directly from stride layout, eliminating the
intermediate buffer copy. It also adds HIGH_BIT_DEPTH support and
replaces memcpy calls in intrapred with NEON-optimized helpers.
Performance: DCT 8x8 memory operations reduced by ~75%.
Patch 2: AArch64: Fix 4x4 NEON memory overflow in intrapred helpers
The 8-bit width=4 NEON copy helpers used vld1_u8/vst1_u8 which
read/write 8 bytes instead of the required 4 bytes. In
all_angs_pred_neon<2>(), 4x4 mode outputs are packed contiguously
(16 bytes per mode), so writing 8 bytes per row overwrites adjacent
mode buffers.
Fixed by switching to scalar copy for exact 4-byte access, which also
avoids strict aliasing UB from uint32_t* casts and potential alignment
issues.
Affected files:
- source/common/aarch64/dct-prim.cpp
- source/common/aarch64/dct-prim-sve.cpp
- source/common/aarch64/intrapred-prim.cpp
Best regards,
Wiki Deng
wiki.deng at hj-micro.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20260410/7588391a/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0002-AArch64-Fix-4x4-NEON-memory-overflow-in-intrapred-he.patch
Type: application/octet-stream
Size: 3158 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20260410/7588391a/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-AArch64-Optimize-DCT-kernels-with-stride-aware-imple.patch
Type: application/octet-stream
Size: 42557 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/x265-devel/attachments/20260410/7588391a/attachment-0003.obj>
More information about the x265-devel
mailing list