[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register
Martin Storsjö
martin at martin.st
Wed Nov 16 09:55:25 CET 2016
On Tue, 15 Nov 2016, Janne Grunau wrote:
> On 2016-11-14 23:54:49 +0200, Martin Storsjö wrote:
>> On iOS, vcmp.f64 can behave as if the register was zero, if the
>> register (interpreted as a f64), was a denormal number.
>>
>> The vcmp.f64 (and other VFP instructions) will trap to the kernel
>> (which is supposed to implement the FP operation, which it apparently
>> doesn't do properly on iOS) if the value is a denormal. If this happens,
>> the whole comparison ends up way more costly.
>> ---
>> This is marginally slower though. If we'd have another spare GPR,
>> we could have done
>> vmov rX, rY, d28
>> orr rX, rX, rY
>> cmp rX, #0
>
> the cmp is not needed, just use orrs. We could make lr easily available
> by pushing it and using 'pop {pc}' instead of 'bx lr'
Oh, indeed. That's about as fast as this patch on A53 and A8, and faster
on A9.
// Martin
More information about the x264-devel
mailing list