[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register

Martin Storsjö martin at martin.st
Wed Nov 16 09:55:25 CET 2016


On Tue, 15 Nov 2016, Janne Grunau wrote:

> On 2016-11-14 23:54:49 +0200, Martin Storsjö wrote:
>> On iOS, vcmp.f64 can behave as if the register was zero, if the
>> register (interpreted as a f64), was a denormal number.
>> 
>> The vcmp.f64 (and other VFP instructions) will trap to the kernel
>> (which is supposed to implement the FP operation, which it apparently
>> doesn't do properly on iOS) if the value is a denormal. If this happens,
>> the whole comparison ends up way more costly.
>> ---
>> This is marginally slower though. If we'd have another spare GPR,
>> we could have done
>>     vmov rX, rY, d28
>>     orr  rX, rX, rY
>>     cmp  rX, #0
>
> the cmp is not needed, just use orrs. We could make lr easily available 
> by pushing it and using 'pop {pc}' instead of 'bx lr'

Oh, indeed. That's about as fast as this patch on A53 and A8, and faster 
on A9.

// Martin


More information about the x264-devel mailing list