[x264-devel] [PATCH 2/3] arm: Implement some neon 8x16c intra predict functions
Janne Grunau
janne-x264 at jannau.net
Mon Aug 31 01:11:20 CEST 2015
On 2015-08-28 00:15:02 +0300, Martin Storsjö wrote:
> checkasm timing Cortex-A7 A8 A9
> intra_predict_8x16c_dct_c 862 540 590
> intra_predict_8x16c_dct_neon 608 511 657
> intra_predict_8x16c_h_c 972 707 719
> intra_predict_8x16c_h_neon 722 656 672
> intra_predict_8x16c_p_c 10183 9819 8655
> intra_predict_8x16c_p_neon 2622 1972 1983
>
> ---
> The dc_top function is the only one which is slower than the C
> version on one of the tested cpus (A9), and there the slowdown is
> smaller than the gain on A7.
a comment in x264_predict_8x16c_init_arm that the other functions were
not faster than C on ... CPU might be helpful. You left
x264_predict_8x16c_v_neon in predict-a.S. Adding the unused asm
functions too might be not a bad idea but please add a comment that it
is unused because it's slower than C with $COMPILER $VERSION. Also the
function declarations are all there.
otherwise ok
Janne
More information about the x264-devel
mailing list