[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register
Martin Storsjö
martin at martin.st
Mon Nov 14 22:54:49 CET 2016
On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.
The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.
---
This is marginally slower though. If we'd have another spare GPR,
we could have done
vmov rX, rY, d28
orr rX, rX, rY
cmp rX, #0
instead.
---
common/arm/deblock-a.S | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
index d781828..90ef844 100644
--- a/common/arm/deblock-a.S
+++ b/common/arm/deblock-a.S
@@ -211,8 +211,10 @@ endfunc
vclt.u8 q13, q4, q14 @ < (alpha >> 2) + 2 if_2
vand q12, q7, q6 @ if_1
vshrn.u16 d28, q12, #4
- vcmp.f64 d28, #0
- vmrs APSR_nzcv, FPSCR
+ vrev64.32 d29, d28
+ vorr d28, d28, d29
+ vmov.32 r2, d28[0]
+ cmp r2, #0
beq 9f
sub sp, sp, #32
--
2.7.4
More information about the x264-devel
mailing list