[x264-devel] commit: Compile fixes for pre-ARMv6T2 and/or PIC (David Conrad )
David Conrad
lessen42 at gmail.com
Mon Sep 7 02:21:52 CEST 2009
On Sep 6, 2009, at 2:45 PM, Måns Rullgård wrote:
> git at videolan.org (git version control) writes:
>
>> x264 | branch: master | David Conrad <lessen42 at gmail.com> | Wed
>> Sep 2 16:14:59 2009 -0700|
>> [e390cbf993d180b1db413746272e232ac3068dad] | committer: Jason
>> Garrett-Glaser
>>
>> Compile fixes for pre-ARMv6T2 and/or PIC
>>
>> +.macro movconst rd, val
>> +#ifdef HAVE_ARMV6T2
>> + movw \rd, #:lower16:\val
>> +.if \val >> 16
>> + movt \rd, #:upper16:\val
>> +.endif
>> +#else
>> + ldr \rd, =\val
>> +#endif
>> +.endm
>> +
>> @@ -1209,9 +1203,8 @@ function x264_pixel_ssim_end4_neon, export=1
>> vshl.s32 q2, q2, #6
>> vadd.s32 q1, q8, q8
>>
>> - mov r3, #416 // ssim_c1= .01*.01*255*255*64
>> - movw ip, #39355 // ssim_c2= .03*.03*255*255*64*63
>> - 3<<16
>> - movt ip, #3
>> + mov r3, #416 // ssim_c1 = .01*.01*255*255*64
>> + movconst ip, 235963 // ssim_c2 = .03*.03*255*255*64*63
>> vdup.32 q14, r3
>> vdup.32 q15, ip
>>
>> diff --git a/common/arm/predict-a.S b/common/arm/predict-a.S
>> index 46e687b..8ff61a2 100644
>> --- a/common/arm/predict-a.S
>> +++ b/common/arm/predict-a.S
>> @@ -102,7 +102,7 @@ function x264_predict_4x4_ddr_armv6, export=1
>> add r4, r4, r3, lsl #8
>> add r5, r5, r4, lsl #8
>> add r6, r6, r5, lsl #8
>> - ldr ip, pb_1
>> + ldr ip, =0x01010101
>
> Why not use movconst here?
Oops it should, that was the first thing I did (since I didn't know
about the syntax earlier); movconst was the last. I'll change it in
with iPhone support.
>> + # arm-gcc-4.2 produces incorrect output with -ffast-math
>> + # and it doesn't save any speed anyway on 4.4, so disable it
>> + CFLAGS="-O4 -fno-fast-math $CFLAGS"
>
> Details?
The output wasn't bitexact to x86 with both CodeSourcery 2007q3 and
Apple gcc 4.2 although it was bitexact between those two compilers,
although it looked fine (no obvious artifacts.) The stats showed a
higher bitrate (crf) with much more I macroblocks than P/B, so Jason
suggested it was probably something being messed up in SAD scores but
left RD fine. I didn't investigate exactly what gcc screwed up, but
gcc 4.4 and Apple gcc 4.0 both matched x86 output, as did both 4.2
variants with -fno-fast-math, and I didn't measure a speedup on arm
with gcc 4.4 with -ffast-math.
The -fno-fast-math is used instead of omitting it because Apple gcc
4.2 apparently uses -ffast-math by default.
More information about the x264-devel
mailing list