[x264-devel] Solaris 10x86 AMD and SSSE3 bug

Mike Moya moyman at ecn.purdue.edu
Thu Aug 12 22:41:49 CEST 2010


When compiling the latest x264 on an AMD machine it fails due to the use of (Intel only)SSSE3? It seems no matter what I do when I try to compile x264 on the latest Solaris 10 x86 it uses SSSE3:

Here is the machine:

# psrinfo -pv
The physical processor has 4 virtual processors (0-3)
  x86 (chipid 0x0 AuthenticAMD family 16 model 4 step 2 clock 2700 MHz)
	Quad-Core AMD Opteron(tm) Processor 2384
The physical processor has 4 virtual processors (4-7)
  x86 (chipid 0x1 AuthenticAMD family 16 model 4 step 2 clock 2700 MHz)
	Quad-Core AMD Opteron(tm) Processor 2384

What it supports (no SSSE3):

# isainfo -v
64-bit amd64 applications
	amd_lzcnt popcnt amd_sse4a tscp cx16 mon sse3 sse2 sse fxsr amd_3dnowx 
	amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu 
32-bit i386 applications
	amd_lzcnt popcnt amd_sse4a tscp cx16 mon sse3 sse2 sse fxsr amd_3dnowx 
	amd_3dnow amd_mmx mmx cmov amd_sysc cx8 tsc fpu 

I updated to the latest yasm but it made no difference:

# yasm --version
yasm 1.1.0.2352
Compiled on Aug 12 2010.
Copyright (c) 2001-2010 Peter Johnson and other Yasm developers.
Run yasm --license for licensing overview and summary.

I git the latest x264 and run configure:

# ./configure
Platform:   X86
System:     SunOS
asm:        yes
avs:        no
lavf:       no
ffms:       no
gpac:       no
pthread:    yes
filters:    crop select_every
debug:      no
gprof:      no
PIC:        no
shared:     no
visualize:  no
bit depth:  8

It compiles clean. Here is the last of the compilation:

...etc...
gcc -Wshadow -O3 -ffast-math  -Wall -I. -march=i686 -mfpmath=sse -msse -std=gnu99 -s -fomit-frame-pointer -fno-tree-vectorize   -c -o common/x86/predict-c.o common/x86/predict-c.c
yasm -O2 -f elf -Icommon/x86/ -o common/x86/const-a.o common/x86/const-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/cabac-a.o common/x86/cabac-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/dct-a.o common/x86/dct-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/deblock-a.o common/x86/deblock-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/mc-a.o common/x86/mc-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/mc-a2.o common/x86/mc-a2.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/pixel-a.o common/x86/pixel-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/predict-a.o common/x86/predict-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/quant-a.o common/x86/quant-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/sad-a.o common/x86/sad-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/cpu-a.o common/x86/cpu-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/dct-32.o common/x86/dct-32.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/bitstream-a.o common/x86/bitstream-a.asm
yasm -O2 -f elf -Icommon/x86/ -o common/x86/pixel-32.o common/x86/pixel-32.asm
ar rc libx264.a common/mc.o common/predict.o common/pixel.o common/macroblock.o common/frame.o common/dct.o common/cpu.o common/cabac.o common/common.o common/mdate.o common/rectangle.o common/set.o common/quant.o common/deblock.o common/vlc.o common/mvpred.o common/bitstream.o encoder/analyse.o encoder/me.o encoder/ratecontrol.o encoder/set.o encoder/macroblock.o encoder/cabac.o encoder/cavlc.o encoder/encoder.o encoder/lookahead.o common/threadpool.o common/x86/mc-c.o common/x86/predict-c.o common/x86/const-a.o common/x86/cabac-a.o common/x86/dct-a.o common/x86/deblock-a.o common/x86/mc-a.o common/x86/mc-a2.o common/x86/pixel-a.o common/x86/predict-a.o common/x86/quant-a.o common/x86/sad-a.o common/x86/cpu-a.o common/x86/dct-32.o common/x86/bitstream-a.o common/x86/pixel-32.o
ranlib libx264.a
gcc -o x264 x264.o input/input.o input/timecode.o input/raw.o input/y4m.o output/raw.o output/matroska.o output/matroska_ebml.o output/flv.o output/flv_bytestream.o filters/filters.o filters/video/video.o filters/video/source.o filters/video/internal.o filters/video/resize.o filters/video/cache.o filters/video/fix_vfr_pts.o filters/video/select_every.o filters/video/crop.o input/thread.o extras/getopt.o libx264.a  -lm -lpthread -s
#

And promptly fails due to the use of SSSE3 code since it is not an Intel processor:

# ./x264 --version
ld.so.1: x264: fatal: hardware capability unsupported: 0x400000  [ SSSE3 ]
Killed
# file ./x264
./x264:		ELF 32-bit LSB executable 80386 Version 1 [SSSE3 SSE MMX CMOV FPU], dynamically linked, stripped
# ldd ./x264
x264: warning: hardware capability unsupported: 0x400000  [ SSSE3 ]
	libm.so.2 =>	 /usr/lib/libm.so.2
	libpthread.so.1 =>	 /usr/lib/libpthread.so.1
	libc.so.1 =>	 /usr/lib/libc.so.1


Is this expected? Why would it not use SSE3 or SSE2 or MMX which is supported? I can disable asm (--disable-asm) and it will work at a very substantial performance penalty. I have tried it on many difference AMD processors all with the same result. I have tried both gcc and SunStudio with the same result. Something in the x264 code is incorrectly defining SSSE3 support. Is there a way for me to force it to use SSE3 instead of SSSE3?
--mike





More information about the x264-devel mailing list