[x264-devel] [PATCH 07/24] arm: Optimize x264_deblock_h_chroma_neon
Janne Grunau
janne-x264 at jannau.net
Tue Aug 18 10:44:04 CEST 2015
On 2015-08-13 23:59:28 +0300, Martin Storsjö wrote:
> Shuffle both chroma components together as a 16 bit unit, and
> don't write the unchanged columns (like in x264_deblock_h_luma_neon
> and in the aarch64 version of the function).
>
> This causes a minor slowdown for x264_deblock_v_chroma_neon, but
> it is negligible compared to the speedup.
>
> checkasm timing Cortex-A7 A8 A9
> deblock_chroma[1]_c 4817 4057 3601
> deblock_chroma[1]_neon 1249 716 817 (before)
> deblock_chroma[1]_neon 1249 766 845 (after)
>
> deblock_h_chroma_420_c 3699 3275 2830
> deblock_h_chroma_420_neon 2068 1414 1400 (before)
> deblock_h_chroma_420_neon 1838 1355 1291 (after)
On cortex-a8 speed-up and slowdown are in the same range, still worth
doing it. patch ok
Janne
More information about the x264-devel
mailing list