[x264-devel] [patch] slightly faster quant

Wed Sep 21 03:17:51 CEST 2005

Hi,

attached is a patch based on x264 rev 293 that replaces the MMX 
optimized quant8x8 and quant4x4 with a new ones that run about 20% 
faster. There also is a MMXEXT optimized version that runs only a few 
percent (less than 5) slower than the new MMX version but allows the 
entries of quant_mf to be uint16_t (in contrast to int16_t for MMX). It 
also includes an AMD64 version (untested), and refines the decision of 
which quant functions will be used, based on the values of quant_mf.

The speed gain was only minimal, but it was not zero or negative, so...
test system: Athlon-XP 3000+ WinXP/MinGW
test parameters: --bframes 2 --ref 8 --8x8dct --analyse all (+defaults)
test clip: 720x576 at 25fps 2050 frames

encoding fps
C              3.9184
MMXEXT32       3.9996
MMX-AI (old)   4.0084
MMXEXT16 (new) 4.0143
MMXEXT15 (new) 4.0195

(Alexander Izvorski original version runs at 4.0294 fps)

Here are some "user friendly" rules that specify when each version will 
be used. I hand checked each one of them on input coefficients that 
maxed out they theoretical value ranges. (cqm >= x means that all 
entries of the corresponding custom quant matrix must be at least x)

MMXEXT32 qaunt2x2dc can be used if cqm >= 2.
MMXEXT32 qaunt4x4dc can be used if cqm >= 4.
MMXEXT32 quant4x4 can always be used.
MMXEXT32 qaunt8x8 can be used if cqm >= 2.

MMXEXT16 qaunt2x2dc can be used if cqm >= 4.
MMXEXT16 qaunt4x4dc can be used if cqm >= 4.
MMXEXT16 quant4x4 can be used if cqm >= 4.
MMXEXT16 quant8x8 can be used if cqm >= 7.

MMX15 qaunt2x2dc can be used if cqm >= 7.
MMX15 qaunt4x4dc can be used if cqm >= 7.
MMX15 quant4x4 can be used if cqm >= 7.
MMX15 quant8x8 can be used if cqm >= 11.

Those are very safe assumptions to give you a general idea about the 
bounds. Matrixes may contain even lower entries than that, and still 
work if they are at the right spots.

If you want to be perfectly safe, avoid the value 1 in 8x8 quant 
matrixes and the first coefficient of the 4x4 quant matrix. 
Theoretically even the C implementation screws up in this case (because 
quant_mf[x][y]*dct[x][y] may not fit in an int32_t), but in practice 
this hardly ever matters, because the output coefficients of the dct 
phase seldom become critical for the theoretical bounds.

This was only the short version. In case someone is interested, I could 
put together a document or doxygen comments specifying theoretical value 
ranges for dct and qaunt output coefficients.

Best regards,
Christian Heine

-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264rev293-quant-mmx16.diff.gz
Type: application/gzip
Size: 3636 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20050921/248af12b/attachment.bin