[x264-devel] Failure to build x264 with ASM on i386
Brad Smith
brad at comstyle.com
Tue Feb 18 01:43:24 CET 2014
On 04/02/14 5:04 PM, Loren Merritt wrote:
> On Tue, 4 Feb 2014, Martin Storsjö wrote:
>> On Mon, 3 Feb 2014, Dimitry Andric wrote:
>>> On 03 Feb 2014, at 18:39, Loren Merritt <lorenm at u.washington.edu> wrote:
>>> ...
>>>> Otoh, gcc works with -fPIC. I can confirm that 5 registers is enough for
>>>> the inline asm blocks in question. If clang thinks they need more than 5
>>>> registers, that's a bug in clang's register allocator.
>>>
>>> No, gcc uses 7 registers. For example, with -fPIC and gcc 4.8, the
>>> allocation for x264_predictor_clip_mmx2() is as follows:
>>>
>>> %0 = %eax
>>> %1 = %edx
>>> %2 = %ecx
>>> %4 = %ebp
>>> %5 = %esi
>>> %6 = %edi
>>> %7 = %ebx
>
> I tried with gcc-4.6.3 -fPIC -fno-omit-frame-pointer -mpreferred-stack-boundary=2
> (Although as far as register pressure goes, unaligned stack just forces it to
> use a frame-pointer, and thus isn't any worse than -fno-omit-frame-pointer alone.)
>
> %0 = %ecx
> %1 = nothing (or you could call it the same reg as %5)
> %2 = %eax
> %3 = %edx
> %4 = %edi
> %5 = %esi
> %6 = on the stack
> %7 = %ebx (this is the same reg that -fPIC itself reserves, and thus
> doesn't count against the 5 regs left after PIC.)
> %8 = nothing (or you could call it the same reg as %0)
>
>>> The reason is that gcc assumes a 16 byte stack alignment on i386, which
>>> is only valid for Linux after ~2006, not most BSDs. If you force gcc to
>>> assume a 4 byte stack alignment, it also cannot compile the inline
>>> assembly:
>>
>> FWIW, the similar cases within libav are handled by adding
>> __attribute__((force_align_arg_pointer)) to all public entry points into the
>> libraries, which adds a special prologue to these functions that realign the
>> stack to 16 bytes, and adding -mincoming-stack-boundary=4 to the cflags,
>> telling the compiler to assume a 16 byte aligned stack in all functions, so
>> only the public ones need to care about fixing the alignment.
>
> x264 does in fact realign the stack at all entrypoints (using yasm rather
> than force_align_arg_pointer since we added that feature before
> force_align_arg_pointer existed). And yes we do that so that we can
> tell the compiler to assume aligned stack even if the OS doesn't provide
> it. I didn't previously know that about BSD, but Win32 also doesn't.
>
> --Loren Merritt
Still looking for any possible options that could result in being able
to build x264 on i386. I haven't seen any suggestions yet that would
help.
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the x264-devel
mailing list