[x264-devel] [PATCH 1/1] checkasm: fix arm64 register clobber test

Janne Grunau janne-x264 at jannau.net
Mon Aug 24 20:11:31 CEST 2015


On 2015-08-24 12:42:47 +0300, Martin Storsjö wrote:
> On Mon, 17 Aug 2015, Janne Grunau wrote:
> 
> >Hi Martin,
> >
> >the code has a couple of problems, see below in the commit message.
> >feel free squash this patch. I still need to test on ios though.
> >IIRC it won't work there since the ABI for stack arguments and variadic
> >arguments is incompatible. Stack arguments take only there size + padding
> >for natural alignment but variadic arguments are allocated to slots of
> >8 bytes. There are iirc one or two functions which require args on the
> >stack. If those arguments are smaller than 8 bytes this won't work.
> >I don't think this is fixable, i.e. arm64 x264_checkasm_call can't work
> >on ios.
> >
> >Janne
> >
> >---8<---
> >The stackpointer SP must be 16-byte aligned if it's used for meory
> >access. Solve this by creating a proper frame record.
> >
> >Use the frame pointer to access the original arguments on the stack.
> >
> >Provide a macro to test neon args.
> >
> >Do not write past int ok.
> >---
> >tools/checkasm-aarch64.S | 58 ++++++++++++++++++++++--------------------------
> >1 file changed, 26 insertions(+), 32 deletions(-)
> >
> >diff --git a/tools/checkasm-aarch64.S b/tools/checkasm-aarch64.S
> >index 2b86d62..ec36518 100644
> >--- a/tools/checkasm-aarch64.S
> >+++ b/tools/checkasm-aarch64.S
> >@@ -56,11 +56,11 @@ error_message:
> >// max number of args used by any x264 asm function.
> >#define MAX_ARGS 15
> >
> >-#define ARG_STACK 8*(MAX_ARGS - 6)
> >-#define PUSHED 8*8 + 8*12
> >+#define ARG_STACK ((8*(MAX_ARGS - 6) + 15) & ~15)
> >
> >function x264_checkasm_call, export=1
> >-    str         x30, [sp, #-8]!
> >+    stp         x29, x30, [sp, #-16]!
> >+    mov         x29, sp
> >    stp         x19, x20, [sp, #-16]!
> >    stp         x21, x22, [sp, #-16]!
> >    stp         x23, x24, [sp, #-16]!
> >@@ -82,12 +82,13 @@ function x264_checkasm_call, export=1
> >    ldp         x25, x26, [x9], #16
> >    ldp         x27, x28, [x9], #16
> >
> >-    str         x1,  [sp, #-8]!
> >+    str         x1,  [sp, #-16]!
> >
> >    sub         sp,  sp,  #ARG_STACK
> >.equ pos, 0
> >+// first two stacked args are copied to x6, x7
> >.rept MAX_ARGS-6
> >-    ldr         x9, [sp, #ARG_STACK + PUSHED + 16 + pos]
> >+    ldr         x9, [x29, #16 + 16 + pos]
> >    str         x9, [sp, #pos]
> >.equ pos, pos + 8
> >.endr
> >@@ -99,33 +100,26 @@ function x264_checkasm_call, export=1
> >    mov         x3,  x5
> >    mov         x4,  x6
> >    mov         x5,  x7
> >-    ldp         x6,  x7,  [sp, #ARG_STACK + PUSHED]
> >+    ldp         x6,  x7,  [x29, #16]
> >    blr         x12
> >    add         sp,  sp,  #ARG_STACK
> >-    ldr         x2,  [sp], #8
> >-
> >-    stp         x0,  x1, [sp, #-16]!
> >+    ldr         x2,  [sp]
> >+    stp         x0,  x1, [sp]
> >    movrel      x9, register_init
> >-    ldp         d0,  d1,  [x9], #16
> >-    ldp         d2,  d3,  [x9], #16
> >-    ldp         d4,  d5,  [x9], #16
> >-    ldp         d6,  d7,  [x9], #16
> >-    eor         v0.8b,  v0.8b,  v8.8b
> >-    eor         v1.8b,  v1.8b,  v9.8b
> >-    eor         v2.8b,  v2.8b,  v10.8b
> >-    eor         v3.8b,  v3.8b,  v11.8b
> >-    eor         v4.8b,  v4.8b,  v12.8b
> >-    eor         v5.8b,  v5.8b,  v13.8b
> >-    eor         v6.8b,  v6.8b,  v14.8b
> >-    eor         v7.8b,  v7.8b,  v15.8b
> >-    orr         v0.8b,  v0.8b,  v1.8b
> >-    orr         v0.8b,  v0.8b,  v2.8b
> >-    orr         v0.8b,  v0.8b,  v3.8b
> >-    orr         v0.8b,  v0.8b,  v4.8b
> >-    orr         v0.8b,  v0.8b,  v5.8b
> >-    orr         v0.8b,  v0.8b,  v6.8b
> >-    orr         v0.8b,  v0.8b,  v7.8b
> >-    fmov        x3,  d0
> >+    movi        v3.8h,  #0
> >+
> >+.macro check_reg_neon reg1, reg2
> >+    ldr         q0,  [x9], #16
> >+    uzp1        v1.2d,  v\reg1\().2d, v\reg2\().2d
> >+    eor         v0.16b, v0.16b, v1.16b
> >+    orr         v3.16b, v3.16b, v0.16b
> >+.endm
> >+    check_reg_neon  8,  9
> >+    check_reg_neon  10, 11
> >+    check_reg_neon  12, 13
> >+    check_reg_neon  14, 15
> >+    xtn         v3.8b,  v3.8h
> 
> Doesn't this drop half of the bits in the registers, i.e. a bit set
> in the upper half of the halfwords would be discarded and missed? An
> uqxtn would probably handle that, right?

half (16-bit) or double (64-bit) words? in any case the code is correct.  
it takes the the lower 64-bit of the vector registers and writes them 
into v1. The ABI just mandates that the lower 64-bit of the refisters 
should be preserved, i.e. the equivalent of the ARM NEON d8-d15.

> Did you ever get back to testing on iOS? I guess it'd require
> something like (ARCH_AARCH64 && !defined(__APPLE__)) in checkasm.c,
> to skip it there.

I forgot to test but if it works now that's by coincidence. We would 
have to guarantee that the 7th and 8th paramter is 8 bytes large.  So 
it's best to disable it on ios. your preprocessor condition should work.

Janne


More information about the x264-devel mailing list