[x264-devel] commit: Compile fixes for pre-ARMv6T2 and/or PIC (David Conrad )
Måns Rullgård
mans at mansr.com
Mon Sep 7 04:52:03 CEST 2009
David Conrad <lessen42 at gmail.com> writes:
> On Sep 6, 2009, at 2:45 PM, Måns Rullgård wrote:
>
>> git at videolan.org (git version control) writes:
>>
>>> x264 | branch: master | David Conrad <lessen42 at gmail.com> | Wed
>>> Sep 2 16:14:59 2009 -0700|
>>> [e390cbf993d180b1db413746272e232ac3068dad] | committer: Jason
>>> Garrett-Glaser
>>>
>>> Compile fixes for pre-ARMv6T2 and/or PIC
>>>
>>> +.macro movconst rd, val
>>> +#ifdef HAVE_ARMV6T2
>>> + movw \rd, #:lower16:\val
>>> +.if \val >> 16
>>> + movt \rd, #:upper16:\val
>>> +.endif
>>> +#else
>>> + ldr \rd, =\val
>>> +#endif
>>> +.endm
>>> +
>>> @@ -1209,9 +1203,8 @@ function x264_pixel_ssim_end4_neon, export=1
>>> vshl.s32 q2, q2, #6
>>> vadd.s32 q1, q8, q8
>>>
>>> - mov r3, #416 // ssim_c1= .01*.01*255*255*64
>>> - movw ip, #39355 // ssim_c2= .03*.03*255*255*64*63
>>> - 3<<16
>>> - movt ip, #3
>>> + mov r3, #416 // ssim_c1 = .01*.01*255*255*64
>>> + movconst ip, 235963 // ssim_c2 = .03*.03*255*255*64*63
>>> vdup.32 q14, r3
>>> vdup.32 q15, ip
>>>
>>> diff --git a/common/arm/predict-a.S b/common/arm/predict-a.S
>>> index 46e687b..8ff61a2 100644
>>> --- a/common/arm/predict-a.S
>>> +++ b/common/arm/predict-a.S
>>> @@ -102,7 +102,7 @@ function x264_predict_4x4_ddr_armv6, export=1
>>> add r4, r4, r3, lsl #8
>>> add r5, r5, r4, lsl #8
>>> add r6, r6, r5, lsl #8
>>> - ldr ip, pb_1
>>> + ldr ip, =0x01010101
>>
>> Why not use movconst here?
>
> Oops it should, that was the first thing I did (since I didn't know
> about the syntax earlier); movconst was the last. I'll change it in
> with iPhone support.
In this particular case, a shift/or sequence might be faster on pre-T2
CPUs:
mov ip, #0x01
orr ip, ip, #0x0100
orr ip, ip, ip, lsl #16
For best results, interleave with other instructions to avoid a stall
between the second and third line. Shifted operands are required one
cycle earlier than non-shifted.
>>> + # arm-gcc-4.2 produces incorrect output with -ffast-math
>>> + # and it doesn't save any speed anyway on 4.4, so disable it
>>> + CFLAGS="-O4 -fno-fast-math $CFLAGS"
>>
>> Details?
>
> The output wasn't bitexact to x86 with both CodeSourcery 2007q3 and
> Apple gcc 4.2 although it was bitexact between those two compilers,
> although it looked fine (no obvious artifacts.) The stats showed a
> higher bitrate (crf) with much more I macroblocks than P/B, so Jason
> suggested it was probably something being messed up in SAD scores but
> left RD fine. I didn't investigate exactly what gcc screwed up, but
> gcc 4.4 and Apple gcc 4.0 both matched x86 output, as did both 4.2
> variants with -fno-fast-math, and I didn't measure a speedup on arm
> with gcc 4.4 with -ffast-math.
OK, sounds like a real bug.
--
Måns Rullgård
mans at mansr.com
More information about the x264-devel
mailing list