[x264-devel] [PATCH 2/6] arm: Don't use vcmp.f64 for testing for an all-zeros register

Martin Storsjö martin at martin.st
Mon Nov 14 22:54:49 CET 2016


On iOS, vcmp.f64 can behave as if the register was zero, if the
register (interpreted as a f64), was a denormal number.

The vcmp.f64 (and other VFP instructions) will trap to the kernel
(which is supposed to implement the FP operation, which it apparently
doesn't do properly on iOS) if the value is a denormal. If this happens,
the whole comparison ends up way more costly.
---
This is marginally slower though. If we'd have another spare GPR,
we could have done
    vmov rX, rY, d28
    orr  rX, rX, rY
    cmp  rX, #0
instead.
---
 common/arm/deblock-a.S | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/common/arm/deblock-a.S b/common/arm/deblock-a.S
index d781828..90ef844 100644
--- a/common/arm/deblock-a.S
+++ b/common/arm/deblock-a.S
@@ -211,8 +211,10 @@ endfunc
     vclt.u8         q13, q4,  q14   @ < (alpha >> 2) + 2 if_2
     vand            q12, q7,  q6    @ if_1
     vshrn.u16       d28, q12,  #4
-    vcmp.f64        d28, #0
-    vmrs            APSR_nzcv, FPSCR
+    vrev64.32       d29, d28
+    vorr            d28, d28, d29
+    vmov.32         r2,  d28[0]
+    cmp             r2,  #0
     beq             9f
 
     sub             sp,  sp,  #32
-- 
2.7.4



More information about the x264-devel mailing list