[x264-devel] [PATCH 11/24] arm: Implement neon 8x16c intra predict functions

Fri Aug 14 09:05:16 CEST 2015

On 2015-08-14 09:22:33 +0300, Martin Storsjö wrote:
> On Thu, 13 Aug 2015, Henrik Gramner wrote:
> 
> >On Thu, Aug 13, 2015 at 10:59 PM, Martin Storsjö <martin at martin.st> wrote:
> >>Some of the simpler ones actually turn out to be slower than the
> >>plain C version, at least on some CPUs.
> >
> >Shouldn't we just skip having neon versions of those functions then?
> >Maybe add a comment explaining it.
> >
> >Or am I missing something?
> 
> No, you're right.
> 
> I just posted all of them for reference (I guess I should have
> tagged the patch RFC as some of the others), in case someone can
> tell me something obvious I've missed. It might also be worthwhile
> to doublecheck the aarch64 versions and see if they add or if they
> also should be disabled.

IIRC all enabled aarch64 functions are at least on one CPU faster. If 
they weren't I did either not enable them with a comment that they are 
slower or didn't send them. They might be some functions which are 
reused by other asm functions where the combination was clearly faster.
I might have left some of them enabled. Probably a good idea to recheck 
now with more arm64 hardware available.

Janne