[x264-devel] [PATCH 2/3] arm: Implement some neon 8x16c intra predict functions

Martin Storsjö martin at martin.st
Mon Aug 31 08:00:07 CEST 2015


On Mon, 31 Aug 2015, Janne Grunau wrote:

> On 2015-08-28 00:15:02 +0300, Martin Storsjö wrote:
>> checkasm timing       Cortex-A7      A8     A9
>> intra_predict_8x16c_dct_c    862     540    590
>> intra_predict_8x16c_dct_neon 608     511    657
>> intra_predict_8x16c_h_c      972     707    719
>> intra_predict_8x16c_h_neon   722     656    672
>> intra_predict_8x16c_p_c      10183   9819   8655
>> intra_predict_8x16c_p_neon   2622    1972   1983
>>
>> ---
>> The dc_top function is the only one which is slower than the C
>> version on one of the tested cpus (A9), and there the slowdown is
>> smaller than the gain on A7.
>
> a comment in x264_predict_8x16c_init_arm that the other functions were
> not faster than C on ... CPU might be helpful. You left
> x264_predict_8x16c_v_neon in predict-a.S.

Oops, that wasn't intentional obviously

> Adding the unused asm functions too might be not a bad idea but please 
> add a comment that it is unused because it's slower than C with 
> $COMPILER $VERSION.

> Also the function declarations are all there.

Ah, good catch, I forgot about that.

I'll repost it without those, and with a comment.

// Martin


More information about the x264-devel mailing list