[x264-devel] [patch] checkasm, mmx dequant bug
Christian Heine
sennindemokrit at gmx.net
Fri Feb 10 00:12:18 CET 2006
Hi,
I finally found some spare time to work on x264 again. The attached
patch contains a more rigid testing of quant and dequant optimized
functions. For quant functions every quantizer and quantization matrix
is checked and the worst- case- input range was corrected. As this
correction shows mayor flaws in mmx dequant functions for certain
quantizers, these flaws have been corrected. Not one of 10000 checkasm
runs failed. It is possible, however unlikely that this bug resulted in
degraded image quality of the compressed stream.
* in detail: input range for quant
If you consider dct4x4 closely, you will find, that the first 1D dct
transform values in the range of -255 to 255 to the output value range
-255*(4,6,4,6) to 255*(4,6,4,6). After applying the second 1D transform
the output is in the range
/ 16 24 16 24 \ / 16 24 16 24 \
-255* | 24 36 24 36 | to 255* | 24 36 24 36 |
| 16 24 16 24 | | 16 24 16 24 |
\ 24 36 24 32 / \ 24 36 24 32 /
This is slightly higher than the old checkasm assumed (it assumed 16 at
each position of the matrix). For dct8x8 it is worse. The output range
of the first transform is -255*(8,7.5,6,7.5,8,7.5,6,7.5) to
255*(8,7.5,6,7.5,8,7.5,6,7.5). The output of dct8x8 for each coefficient
dct[x][y] is thus in -255*scale8[x]*scale8[y] to
255*scale8[x]*scale8[y] with scale[]={8, 7.5, 6, 7.5, 8, 7.5, 6, 7.5).
This is up to 4 times higher than checkasm assumed. The checkasm in the
patch now generates random values in the whole output range of dct and
thus the input range of quant.
* in detail: dequant bug
Adjusting the input range of quant resulted in higher input ranges for
dequant. While the dequant bug was found seldomly in earlier versions of
checkasm, it now shows up consistenly at qp 30..35 for dequant8x8 and qp
18..23 for dequant4x4. This is because those functions assume
dct[x][y]*dequant_mf[x][y] to fit in 16bits for the given quantizers.
Because of the higher value range of dct[x][y] this is no longer true.
Everything based on that assumption has been replaced by the more
general 32bit computation that previously took only place at qp<30
(quant8x8) and qp<18 (quant4x4). This results in correct behavior over
the entire input range of dequant, but comes with a small speed penalty.
This bug has been corrected for i386 and amd64, but only the i386 part
has been tested.
best regards,
Christian Heine
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x264-r426-dequant-mmx-bugfix.diff.gz
Type: application/gzip
Size: 1916 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20060210/229635fa/attachment.bin
More information about the x264-devel
mailing list