[x264-devel] commit: Compile fixes for pre-ARMv6T2 and/or PIC (David Conrad )

Mon Sep 7 02:21:52 CEST 2009

On Sep 6, 2009, at 2:45 PM, Måns Rullgård wrote:

> git at videolan.org (git version control) writes:
>
>> x264 | branch: master | David Conrad <lessen42 at gmail.com> | Wed  
>> Sep  2 16:14:59 2009 -0700|  
>> [e390cbf993d180b1db413746272e232ac3068dad] | committer: Jason  
>> Garrett-Glaser
>>
>> Compile fixes for pre-ARMv6T2 and/or PIC
>>
>> +.macro movconst rd, val
>> +#ifdef HAVE_ARMV6T2
>> +    movw        \rd, #:lower16:\val
>> +.if \val >> 16
>> +    movt        \rd, #:upper16:\val
>> +.endif
>> +#else
>> +    ldr         \rd, =\val
>> +#endif
>> +.endm
>> +
>> @@ -1209,9 +1203,8 @@ function x264_pixel_ssim_end4_neon, export=1
>>     vshl.s32    q2,  q2,  #6
>>     vadd.s32    q1,  q8,  q8
>>
>> -    mov         r3, #416        // ssim_c1= .01*.01*255*255*64
>> -    movw        ip, #39355      // ssim_c2= .03*.03*255*255*64*63  
>> - 3<<16
>> -    movt        ip, #3
>> +    mov         r3, #416        // ssim_c1 = .01*.01*255*255*64
>> +    movconst    ip, 235963      // ssim_c2 = .03*.03*255*255*64*63
>>     vdup.32     q14, r3
>>     vdup.32     q15, ip
>>
>> diff --git a/common/arm/predict-a.S b/common/arm/predict-a.S
>> index 46e687b..8ff61a2 100644
>> --- a/common/arm/predict-a.S
>> +++ b/common/arm/predict-a.S
>> @@ -102,7 +102,7 @@ function x264_predict_4x4_ddr_armv6, export=1
>>     add     r4, r4, r3, lsl #8
>>     add     r5, r5, r4, lsl #8
>>     add     r6, r6, r5, lsl #8
>> -    ldr     ip, pb_1
>> +    ldr     ip, =0x01010101
>
> Why not use movconst here?

Oops it should, that was the first thing I did (since I didn't know  
about the syntax earlier); movconst was the last. I'll change it in  
with iPhone support.

>> +    # arm-gcc-4.2 produces incorrect output with -ffast-math
>> +    # and it doesn't save any speed anyway on 4.4, so disable it
>> +    CFLAGS="-O4 -fno-fast-math $CFLAGS"
>
> Details?

The output wasn't bitexact to x86 with both CodeSourcery 2007q3 and  
Apple gcc 4.2 although it was bitexact between those two compilers,  
although it looked fine (no obvious artifacts.) The stats showed a  
higher bitrate (crf) with much more I macroblocks than P/B, so Jason  
suggested it was probably something being messed up in SAD scores but  
left RD fine. I didn't investigate exactly what gcc screwed up, but  
gcc 4.4 and Apple gcc 4.0 both matched x86 output, as did both 4.2  
variants with -fno-fast-math, and I didn't measure a speedup on arm  
with gcc 4.4 with -ffast-math.

The -fno-fast-math is used instead of omitting it because Apple gcc  
4.2 apparently uses -ffast-math by default.