[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register
Janne Grunau
janne-x264 at jannau.net
Tue Nov 15 23:53:04 CET 2016
On 2016-11-14 23:54:49 +0200, Martin Storsjö wrote:
> On iOS, vcmp.f64 can behave as if the register was zero, if the
> register (interpreted as a f64), was a denormal number.
>
> The vcmp.f64 (and other VFP instructions) will trap to the kernel
> (which is supposed to implement the FP operation, which it apparently
> doesn't do properly on iOS) if the value is a denormal. If this happens,
> the whole comparison ends up way more costly.
> ---
> This is marginally slower though. If we'd have another spare GPR,
> we could have done
> vmov rX, rY, d28
> orr rX, rX, rY
> cmp rX, #0
the cmp is not needed, just use orrs. We could make lr easily available
by pushing it and using 'pop {pc}' instead of 'bx lr'
> instead.
> ---
> common/arm/deblock-a.S | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
> index d781828..90ef844 100644
> --- a/common/arm/deblock-a.S
> +++ b/common/arm/deblock-a.S
> @@ -211,8 +211,10 @@ endfunc
> vclt.u8 q13, q4, q14 @ < (alpha >> 2) + 2 if_2
> vand q12, q7, q6 @ if_1
> vshrn.u16 d28, q12, #4
> - vcmp.f64 d28, #0
> - vmrs APSR_nzcv, FPSCR
> + vrev64.32 d29, d28
> + vorr d28, d28, d29
> + vmov.32 r2, d28[0]
> + cmp r2, #0
> beq 9f
>
> sub sp, sp, #32
patch ok if above alternative is not faster
Janne
More information about the x264-devel
mailing list