[x264-devel] [PATCH 07/24] arm: Optimize x264_deblock_h_chroma_neon

Janne Grunau janne-x264 at jannau.net
Tue Aug 18 10:44:04 CEST 2015


On 2015-08-13 23:59:28 +0300, Martin Storsjö wrote:
> Shuffle both chroma components together as a 16 bit unit, and
> don't write the unchanged columns (like in x264_deblock_h_luma_neon
> and in the aarch64 version of the function).
> 
> This causes a minor slowdown for x264_deblock_v_chroma_neon, but
> it is negligible compared to the speedup.
> 
> checkasm timing      Cortex-A7    A8    A9
> deblock_chroma[1]_c         4817  4057  3601
> deblock_chroma[1]_neon      1249  716   817   (before)
> deblock_chroma[1]_neon      1249  766   845   (after)
> 
> deblock_h_chroma_420_c      3699  3275  2830
> deblock_h_chroma_420_neon   2068  1414  1400  (before)
> deblock_h_chroma_420_neon   1838  1355  1291  (after)

On cortex-a8 speed-up and slowdown are in the same range, still worth 
doing it. patch ok

Janne


More information about the x264-devel mailing list