[x264-devel] [PATCH 11/24] arm: Implement neon 8x16c intra predict functions

Martin Storsjö martin at martin.st
Fri Aug 14 08:22:33 CEST 2015


On Thu, 13 Aug 2015, Henrik Gramner wrote:

> On Thu, Aug 13, 2015 at 10:59 PM, Martin Storsjö <martin at martin.st> wrote:
>> Some of the simpler ones actually turn out to be slower than the
>> plain C version, at least on some CPUs.
>
> Shouldn't we just skip having neon versions of those functions then?
> Maybe add a comment explaining it.
>
> Or am I missing something?

No, you're right.

I just posted all of them for reference (I guess I should have tagged the 
patch RFC as some of the others), in case someone can tell me something 
obvious I've missed. It might also be worthwhile to doublecheck the 
aarch64 versions and see if they add or if they also should be disabled.

Although - not all of them are always slower; e.g. 
intra_predict_8x16c_dct_neon and intra_predict_8x16c_v_neon are faster 
than the C version on one CPU each (A9 and A7 respectively) but slower on 
the others.

// Martin


More information about the x264-devel mailing list