[x264-devel] [patch] checkasm, mmx dequant bug

Fri Feb 10 00:12:18 CET 2006

Hi,

I finally found some spare time to work on x264 again. The attached 
patch contains a more rigid testing of quant and dequant optimized 
functions. For quant functions every quantizer and quantization matrix 
is checked and the worst- case- input range was corrected. As this 
correction shows mayor flaws in mmx dequant functions for certain 
quantizers, these flaws have been corrected. Not one of 10000 checkasm 
runs failed. It is possible, however unlikely that this bug resulted in 
degraded image quality of the compressed stream.

* in detail: input range for quant
If you consider dct4x4 closely, you will find, that the first 1D dct 
transform values in the range of -255 to 255 to the output value range 
-255*(4,6,4,6) to 255*(4,6,4,6). After applying the second 1D transform 
the output is in the range

       / 16 24 16 24 \          / 16 24 16 24 \
-255* | 24 36 24 36 |  to 255* | 24 36 24 36 |
       | 16 24 16 24 |          | 16 24 16 24 |
       \ 24 36 24 32 /          \ 24 36 24 32 /

This is slightly higher than the old checkasm assumed (it assumed 16 at 
each position of the matrix). For dct8x8 it is worse. The output range 
of the first transform is -255*(8,7.5,6,7.5,8,7.5,6,7.5) to 
255*(8,7.5,6,7.5,8,7.5,6,7.5). The output of dct8x8 for each coefficient 
  dct[x][y] is thus in -255*scale8[x]*scale8[y] to 
255*scale8[x]*scale8[y] with scale[]={8, 7.5, 6, 7.5, 8, 7.5, 6, 7.5). 
This is up to 4 times higher than checkasm assumed. The checkasm in the 
patch now generates random values in the whole output range of dct and 
thus the input range of quant.

* in detail: dequant bug
Adjusting the input range of quant resulted in higher input ranges for 
dequant. While the dequant bug was found seldomly in earlier versions of 
checkasm, it now shows up consistenly at qp 30..35 for dequant8x8 and qp 
18..23 for dequant4x4. This is because those functions assume 
dct[x][y]*dequant_mf[x][y] to fit in 16bits for the given quantizers. 
Because of the higher value range of dct[x][y] this is no longer true. 
Everything based on that assumption has been replaced by the more 
general 32bit computation that previously took only place at qp<30 
(quant8x8) and qp<18 (quant4x4). This results in correct behavior over 
the entire input range of dequant, but comes with a small speed penalty. 
This bug has been corrected for i386 and amd64, but only the i386 part 
has been tested.

best regards,
Christian Heine
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264-r426-dequant-mmx-bugfix.diff.gz
Type: application/gzip
Size: 1916 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060210/229635fa/attachment.bin