[x264-devel] [PATCH 11/24] arm: Implement neon 8x16c intra predict functions

Martin Storsjö martin at martin.st
Fri Aug 14 09:32:01 CEST 2015


On Fri, 14 Aug 2015, Janne Grunau wrote:

> On 2015-08-14 09:22:33 +0300, Martin Storsjö wrote:
>> On Thu, 13 Aug 2015, Henrik Gramner wrote:
>>
>>> On Thu, Aug 13, 2015 at 10:59 PM, Martin Storsjö <martin at martin.st> wrote:
>>>> Some of the simpler ones actually turn out to be slower than the
>>>> plain C version, at least on some CPUs.
>>>
>>> Shouldn't we just skip having neon versions of those functions then?
>>> Maybe add a comment explaining it.
>>>
>>> Or am I missing something?
>>
>> No, you're right.
>>
>> I just posted all of them for reference (I guess I should have
>> tagged the patch RFC as some of the others), in case someone can
>> tell me something obvious I've missed. It might also be worthwhile
>> to doublecheck the aarch64 versions and see if they add or if they
>> also should be disabled.
>
> IIRC all enabled aarch64 functions are at least on one CPU faster. If
> they weren't I did either not enable them with a comment that they are
> slower or didn't send them.

Ok, thanks for reconfirming.

FWIW, as explanation for this whole patchset; Janne told me earlier that 
when doing the aarch64 optimizations, he added a bit more functions than 
what existed for arm before. This patchset adds arm versions of all those 
that were missing for arm but had an aarch64 version (either doing a 
direct translation of the aarch64 version, or redoing arm versions from 
scratch with inspiration of the aarch64 version). Those that are marked as 
RFC still either are slower or have other issues.

I had to spill intermediate values to the stack in sa8d_satd and the luma 
intra deblocking; in the former the total speedup is minimal, while the 
luma intra deblocking clearly is useful even though it's a bit register 
starved.

> They might be some functions which are reused by other asm functions 
> where the combination was clearly faster. I might have left some of them 
> enabled. Probably a good idea to recheck now with more arm64 hardware 
> available.

Yep, that's probably good.

// Martin


More information about the x264-devel mailing list