[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register

Janne Grunau janne-x264 at jannau.net
Tue Nov 15 23:53:04 CET 2016


On 2016-11-14 23:54:49 +0200, Martin Storsjö wrote:
> On iOS, vcmp.f64 can behave as if the register was zero, if the
> register (interpreted as a f64), was a denormal number.
> 
> The vcmp.f64 (and other VFP instructions) will trap to the kernel
> (which is supposed to implement the FP operation, which it apparently
> doesn't do properly on iOS) if the value is a denormal. If this happens,
> the whole comparison ends up way more costly.
> ---
> This is marginally slower though. If we'd have another spare GPR,
> we could have done
>     vmov rX, rY, d28
>     orr  rX, rX, rY
>     cmp  rX, #0

the cmp is not needed, just use orrs. We could make lr easily available 
by pushing it and using 'pop {pc}' instead of 'bx lr'

> instead.
> ---
>  common/arm/deblock-a.S | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
> index d781828..90ef844 100644
> --- a/common/arm/deblock-a.S
> +++ b/common/arm/deblock-a.S
> @@ -211,8 +211,10 @@ endfunc
>      vclt.u8         q13, q4,  q14   @ < (alpha >> 2) + 2 if_2
>      vand            q12, q7,  q6    @ if_1
>      vshrn.u16       d28, q12,  #4
> -    vcmp.f64        d28, #0
> -    vmrs            APSR_nzcv, FPSCR
> +    vrev64.32       d29, d28
> +    vorr            d28, d28, d29
> +    vmov.32         r2,  d28[0]
> +    cmp             r2,  #0
>      beq             9f
>  
>      sub             sp,  sp,  #32

patch ok if above alternative is not faster

Janne


More information about the x264-devel mailing list