[x264-devel] Failure to build x264 with ASM on i386

Loren Merritt lorenm at u.washington.edu
Tue Feb 4 23:04:23 CET 2014


On Tue, 4 Feb 2014, Martin Storsjö wrote:
> On Mon, 3 Feb 2014, Dimitry Andric wrote:
>> On 03 Feb 2014, at 18:39, Loren Merritt <lorenm at u.washington.edu> wrote:
>> ...
>>> Otoh, gcc works with -fPIC. I can confirm that 5 registers is enough for
>>> the inline asm blocks in question. If clang thinks they need more than 5
>>> registers, that's a bug in clang's register allocator.
>>
>> No, gcc uses 7 registers.  For example, with -fPIC and gcc 4.8, the
>> allocation for x264_predictor_clip_mmx2() is as follows:
>>
>> %0 = %eax
>> %1 = %edx
>> %2 = %ecx
>> %4 = %ebp
>> %5 = %esi
>> %6 = %edi
>> %7 = %ebx

I tried with gcc-4.6.3 -fPIC -fno-omit-frame-pointer -mpreferred-stack-boundary=2
(Although as far as register pressure goes, unaligned stack just forces it to
use a frame-pointer, and thus isn't any worse than -fno-omit-frame-pointer alone.)

%0 = %ecx
%1 = nothing (or you could call it the same reg as %5)
%2 = %eax
%3 = %edx
%4 = %edi
%5 = %esi
%6 = on the stack
%7 = %ebx (this is the same reg that -fPIC itself reserves, and thus
     doesn't count against the 5 regs left after PIC.)
%8 = nothing (or you could call it the same reg as %0)

>> The reason is that gcc assumes a 16 byte stack alignment on i386, which
>> is only valid for Linux after ~2006, not most BSDs.  If you force gcc to
>> assume a 4 byte stack alignment, it also cannot compile the inline
>> assembly:
>
> FWIW, the similar cases within libav are handled by adding
> __attribute__((force_align_arg_pointer)) to all public entry points into the
> libraries, which adds a special prologue to these functions that realign the
> stack to 16 bytes, and adding -mincoming-stack-boundary=4 to the cflags,
> telling the compiler to assume a 16 byte aligned stack in all functions, so
> only the public ones need to care about fixing the alignment.

x264 does in fact realign the stack at all entrypoints (using yasm rather
than force_align_arg_pointer since we added that feature before
force_align_arg_pointer existed). And yes we do that so that we can
tell the compiler to assume aligned stack even if the OS doesn't provide
it. I didn't previously know that about BSD, but Win32 also doesn't.

--Loren Merritt


More information about the x264-devel mailing list