[x264-devel] [PATCH 1/1] checkasm: fix arm64 register clobber test

Mon Aug 24 20:59:44 CEST 2015

On 2015-08-24 21:51:44 +0300, Martin Storsjö wrote:
> On Mon, 24 Aug 2015, Janne Grunau wrote:
> 
> >On 2015-08-24 12:42:47 +0300, Martin Storsjö wrote:
> >>On Mon, 17 Aug 2015, Janne Grunau wrote:
> >>
> >>>+.macro check_reg_neon reg1, reg2
> >>>+    ldr         q0,  [x9], #16
> >>>+    uzp1        v1.2d,  v\reg1\().2d, v\reg2\().2d
> >>>+    eor         v0.16b, v0.16b, v1.16b
> >>>+    orr         v3.16b, v3.16b, v0.16b
> >>>+.endm
> >>>+    check_reg_neon  8,  9
> >>>+    check_reg_neon  10, 11
> >>>+    check_reg_neon  12, 13
> >>>+    check_reg_neon  14, 15
> >>>+    xtn         v3.8b,  v3.8h
> >>
> >>Doesn't this drop half of the bits in the registers, i.e. a bit set
> >>in the upper half of the halfwords would be discarded and missed? An
> >>uqxtn would probably handle that, right?
> >
> >half (16-bit) or double (64-bit) words?
> 
> halfwords
> 
> >in any case the code is correct.
> 
> No, I just reproduced a clobbered register that it failed to notice.
> E.g., before the check_reg_neon, do "mov x0, #42; mov v8.b[1], w0",
> and it will fail to notice. For odd indices between 1 and 7, it will
> fail to notice the corrupted register, while even indices between 0
> and 6 are caught properly (and 8-15 ignored, as they should).
> 
> >it takes the the lower 64-bit of the vector registers and writes
> >them into v1. The ABI just mandates that the lower 64-bit of the
> >refisters should be preserved, i.e. the equivalent of the ARM NEON
> >d8-d15.
> 
> Yes, it takes the lower 64-bit half of two 128 bit vector registers
> and writes them into v1. But then, at the end, once you have the
> bitfield in v3 showing which bits had errors (where every bit is
> significant; the upper 64 bits of this register comes from the
> errors in v9, v11 etc), you discard every other byte in the
> register.

right, I completely missed that you were speaking of the xtn. uqxtn 
should indeed do the trick

Janne