[x264-devel] [PATCH 2/3] arm: Implement some neon 8x16c intra predict functions

Janne Grunau janne-x264 at jannau.net
Mon Aug 31 01:11:20 CEST 2015


On 2015-08-28 00:15:02 +0300, Martin Storsjö wrote:
> checkasm timing       Cortex-A7      A8     A9
> intra_predict_8x16c_dct_c    862     540    590
> intra_predict_8x16c_dct_neon 608     511    657
> intra_predict_8x16c_h_c      972     707    719
> intra_predict_8x16c_h_neon   722     656    672
> intra_predict_8x16c_p_c      10183   9819   8655
> intra_predict_8x16c_p_neon   2622    1972   1983
> 
> ---
> The dc_top function is the only one which is slower than the C
> version on one of the tested cpus (A9), and there the slowdown is
> smaller than the gain on A7.

a comment in x264_predict_8x16c_init_arm that the other functions were 
not faster than C on ... CPU might be helpful. You left 
x264_predict_8x16c_v_neon in predict-a.S. Adding the unused asm 
functions too might be not a bad idea but please add a comment that it 
is unused because it's slower than C with $COMPILER $VERSION. Also the 
function declarations are all there.

otherwise ok

Janne


More information about the x264-devel mailing list