[x264-devel] [PATCHv2 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register
Janne Grunau
janne-x264 at jannau.net
Wed Nov 16 19:14:24 CET 2016
On 2016-11-16 10:56:14 +0200, Martin Storsjö wrote:
> On iOS, vcmp.f64 can behave as if the register was zero, if the
> register (interpreted as a f64), was a denormal number.
>
> The vcmp.f64 (and other VFP instructions) will trap to the kernel
> (which is supposed to implement the FP operation, which it apparently
> doesn't do properly on iOS) if the value is a denormal. If this happens,
> the whole comparison ends up way more costly.
> ---
> Updated to use lr as temp register.
> ---
> common/arm/deblock-a.S | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
> index d781828..41306e2 100644
> --- a/common/arm/deblock-a.S
> +++ b/common/arm/deblock-a.S
> @@ -211,8 +211,8 @@ endfunc
> vclt.u8 q13, q4, q14 @ < (alpha >> 2) + 2 if_2
> vand q12, q7, q6 @ if_1
> vshrn.u16 d28, q12, #4
> - vcmp.f64 d28, #0
> - vmrs APSR_nzcv, FPSCR
> + vmov r2, lr, d28
> + orrs r2, r2, lr
> beq 9f
>
> sub sp, sp, #32
> @@ -325,6 +325,7 @@ endfunc
> .endm
>
> function x264_deblock_v_luma_intra_neon
> + push {lr}
> vld1.64 {d0, d1}, [r0,:128], r1
> vld1.64 {d2, d3}, [r0,:128], r1
> vld1.64 {d4, d5}, [r0,:128], r1
> @@ -348,10 +349,11 @@ function x264_deblock_v_luma_intra_neon
> vst1.64 {d4, d5}, [r0,:128]
> 9:
> align_pop_regs
> - bx lr
> + pop {pc}
> endfunc
>
> function x264_deblock_h_luma_intra_neon
> + push {lr}
> sub r0, r0, #4
> vld1.64 {d22}, [r0], r1
> vld1.64 {d20}, [r0], r1
> @@ -397,7 +399,7 @@ function x264_deblock_h_luma_intra_neon
> vst1.64 {d7}, [r0], r1
> 9:
> align_pop_regs
> - bx lr
> + pop {pc}
> endfunc
>
> .macro h264_loop_filter_chroma
ok
Janne
More information about the x264-devel
mailing list