[x264-devel] Solaris 10x86 AMD and SSSE3 bug
Mike Moya
moyman at ecn.purdue.edu
Thu Aug 12 22:41:49 CEST 2010
When compiling the latest x264 on an AMD machine it fails due to the use of (Intel only)SSSE3? It seems no matter what I do when I try to compile x264 on the latest Solaris 10 x86 it uses SSSE3:
Here is the machine:
# psrinfo -pv
The physical processor has 4 virtual processors (0-3)
x86 (chipid 0x0 AuthenticAMD family 16 model 4 step 2 clock 2700 MHz)
Quad-Core AMD Opteron(tm) Processor 2384
The physical processor has 4 virtual processors (4-7)
x86 (chipid 0x1 AuthenticAMD family 16 model 4 step 2 clock 2700 MHz)
Quad-Core AMD Opteron(tm) Processor 2384
What it supports (no SSSE3):
# isainfo -v
64-bit amd64 applications
amd_lzcnt popcnt amd_sse4a tscp cx16 mon sse3 sse2 sse fxsr amd_3dnowx
amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu
32-bit i386 applications
amd_lzcnt popcnt amd_sse4a tscp cx16 mon sse3 sse2 sse fxsr amd_3dnowx
amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu
I updated to the latest yasm but it made no difference:
# yasm --version
yasm 1.1.0.2352
Compiled on Aug 12 2010.
Copyright (c) 2001-2010 Peter Johnson and other Yasm developers.
Run yasm --license for licensing overview and summary.
I git the latest x264 and run configure:
# ./configure
Platform: X86
System: SunOS
asm: yes
avs: no
lavf: no
ffms: no
gpac: no
pthread: yes
filters: crop select_every
debug: no
gprof: no
PIC: no
shared: no
visualize: no
bit depth: 8
It compiles clean. Here is the last of the compilation:
...etc...
gcc -Wshadow -O3 -ffast-math -Wall -I. -march=i686 -mfpmath=sse -msse -std=gnu99 -s -fomit-frame-pointer -fno-tree-vectorize -c -o common/x86/predict-c.o common/x86/predict-c.c
yasm -O2 -f elf -Icommon/x86/ -o common/x86/const-a.o common/x86/const-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/cabac-a.o common/x86/cabac-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/dct-a.o common/x86/dct-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/deblock-a.o common/x86/deblock-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/mc-a.o common/x86/mc-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/mc-a2.o common/x86/mc-a2.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/pixel-a.o common/x86/pixel-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/predict-a.o common/x86/predict-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/quant-a.o common/x86/quant-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/sad-a.o common/x86/sad-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/cpu-a.o common/x86/cpu-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/dct-32.o common/x86/dct-32.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/bitstream-a.o common/x86/bitstream-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/pixel-32.o common/x86/pixel-32.asm
ar rc libx264.a common/mc.o common/predict.o common/pixel.o common/macroblock.o common/frame.o common/dct.o common/cpu.o common/cabac.o common/common.o common/mdate.o common/rectangle.o common/set.o common/quant.o common/deblock.o common/vlc.o common/mvpred.o common/bitstream.o encoder/analyse.o encoder/me.o encoder/ratecontrol.o encoder/set.o encoder/macroblock.o encoder/cabac.o encoder/cavlc.o encoder/encoder.o encoder/lookahead.o common/threadpool.o common/x86/mc-c.o common/x86/predict-c.o common/x86/const-a.o common/x86/cabac-a.o common/x86/dct-a.o common/x86/deblock-a.o common/x86/mc-a.o common/x86/mc-a2.o common/x86/pixel-a.o common/x86/predict-a.o common/x86/quant-a.o common/x86/sad-a.o common/x86/cpu-a.o common/x86/dct-32.o common/x86/bitstream-a.o common/x86/pixel-32.o
ranlib libx264.a
gcc -o x264 x264.o input/input.o input/timecode.o input/raw.o input/y4m.o output/raw.o output/matroska.o output/matroska_ebml.o output/flv.o output/flv_bytestream.o filters/filters.o filters/video/video.o filters/video/source.o filters/video/internal.o filters/video/resize.o filters/video/cache.o filters/video/fix_vfr_pts.o filters/video/select_every.o filters/video/crop.o input/thread.o extras/getopt.o libx264.a -lm -lpthread -s
#
And promptly fails due to the use of SSSE3 code since it is not an Intel processor:
# ./x264 --version
ld.so.1: x264: fatal: hardware capability unsupported: 0x400000 [ SSSE3 ]
Killed
# file ./x264
./x264: ELF 32-bit LSB executable 80386 Version 1 [SSSE3 SSE MMX CMOV FPU], dynamically linked, stripped
# ldd ./x264
x264: warning: hardware capability unsupported: 0x400000 [ SSSE3 ]
libm.so.2 => /usr/lib/libm.so.2
libpthread.so.1 => /usr/lib/libpthread.so.1
libc.so.1 => /usr/lib/libc.so.1
Is this expected? Why would it not use SSE3 or SSE2 or MMX which is supported? I can disable asm (--disable-asm) and it will work at a very substantial performance penalty. I have tried it on many difference AMD processors all with the same result. I have tried both gcc and SunStudio with the same result. Something in the x264 code is incorrectly defining SSSE3 support. Is there a way for me to force it to use SSE3 instead of SSSE3?
--mike
More information about the x264-devel
mailing list