[x264-devel] Failure to build x264 with ASM on i386

Sun Feb 23 22:10:41 CET 2014

On 23/02/14 2:56 PM, BugMaster wrote:
> On Sat, 22 Feb 2014 21:54:48 -0500, Brad Smith wrote:
>> On 17/02/14 7:43 PM, Brad Smith wrote:
>>> On 04/02/14 5:04 PM, Loren Merritt wrote:
>>>> On Tue, 4 Feb 2014, Martin Storsjö wrote:
>>>>> On Mon, 3 Feb 2014, Dimitry Andric wrote:
>>>>>> On 03 Feb 2014, at 18:39, Loren Merritt <lorenm at u.washington.edu>
>>>>>> wrote:
>>>>>> ...
>>>>>>> Otoh, gcc works with -fPIC. I can confirm that 5 registers is
>>>>>>> enough for
>>>>>>> the inline asm blocks in question. If clang thinks they need more
>>>>>>> than 5
>>>>>>> registers, that's a bug in clang's register allocator.
>>>>>>
>>>>>> No, gcc uses 7 registers.  For example, with -fPIC and gcc 4.8, the
>>>>>> allocation for x264_predictor_clip_mmx2() is as follows:
>>>>>>
>>>>>> %0 = %eax
>>>>>> %1 = %edx
>>>>>> %2 = %ecx
>>>>>> %4 = %ebp
>>>>>> %5 = %esi
>>>>>> %6 = %edi
>>>>>> %7 = %ebx
>>>>
>>>> I tried with gcc-4.6.3 -fPIC -fno-omit-frame-pointer
>>>> -mpreferred-stack-boundary=2
>>>> (Although as far as register pressure goes, unaligned stack just
>>>> forces it to
>>>> use a frame-pointer, and thus isn't any worse than
>>>> -fno-omit-frame-pointer alone.)
>>>>
>>>> %0 = %ecx
>>>> %1 = nothing (or you could call it the same reg as %5)
>>>> %2 = %eax
>>>> %3 = %edx
>>>> %4 = %edi
>>>> %5 = %esi
>>>> %6 = on the stack
>>>> %7 = %ebx (this is the same reg that -fPIC itself reserves, and thus
>>>>        doesn't count against the 5 regs left after PIC.)
>>>> %8 = nothing (or you could call it the same reg as %0)
>>>>
>>>>>> The reason is that gcc assumes a 16 byte stack alignment on i386, which
>>>>>> is only valid for Linux after ~2006, not most BSDs.  If you force
>>>>>> gcc to
>>>>>> assume a 4 byte stack alignment, it also cannot compile the inline
>>>>>> assembly:
>>>>>
>>>>> FWIW, the similar cases within libav are handled by adding
>>>>> __attribute__((force_align_arg_pointer)) to all public entry points
>>>>> into the
>>>>> libraries, which adds a special prologue to these functions that
>>>>> realign the
>>>>> stack to 16 bytes, and adding -mincoming-stack-boundary=4 to the cflags,
>>>>> telling the compiler to assume a 16 byte aligned stack in all
>>>>> functions, so
>>>>> only the public ones need to care about fixing the alignment.
>>>>
>>>> x264 does in fact realign the stack at all entrypoints (using yasm rather
>>>> than force_align_arg_pointer since we added that feature before
>>>> force_align_arg_pointer existed). And yes we do that so that we can
>>>> tell the compiler to assume aligned stack even if the OS doesn't provide
>>>> it. I didn't previously know that about BSD, but Win32 also doesn't.
>>>>
>>>> --Loren Merritt
>>>
>>> Still looking for any possible options that could result in being able
>>> to build x264 on i386. I haven't seen any suggestions yet that would
>>> help.
>
>> ping.
>
> Your question was already answered by Loren Merritt so you should
> "give up":
>   - give up on --enable-pic as x264's asm optimizations for i386
>   wouldn't be PIC anyway (even if you didn't get this compilation
>   error) and will have textrels;
>   - give up on asm and use --disable-asm if you for some stupid reason
>   no matter what need PIC;
>   - give up on i386 and build for x86_64 if need both PIC and asm.
>
> P.S. You can also give up on clang and use gcc but that wouldn't
> change the fact that resulting library wouldn't be really PIC.

So I'll stick with --disable-asm since the other options are not
useable or just don't make any sense. Giving up on Clang is not an 
option. I need the code to actually build.

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.