[x264-devel] Re: [Alexander Izvorski <aizvorski at gmail.com>] [patch] SSE2 pixel routines
Loren Merritt
lorenm at u.washington.edu
Thu Sep 8 19:22:04 CEST 2005
On Fri, 22 Jul 2005, Alexander Izvorski wrote:
> Why is the result of satd divided by two?! That throws away one bit
> of precision which would have a small but noticeable impact on PSNR.
> (see the "shr eax,1" in MMX_SUM_MM).
The rightshift in SATD doesn't lose any precision.
Proof by example:
Put a printf before the shift in the C version, and see that the sum is
always even.
Proof by induction:
The Hadamard matrix contains only entries of 1 and -1. So each element in
the Hadamard transformed residual depends on each element of the input
residual by a coefficient of 1 or -1. So adding or subtracting 1 to any
element of the residual changes each of the transformed elements by +/- 1.
The same happens to their absolute values. There are an even number of
elements (16), so the parity of the sum is unchanged.
Base case: SATD(0) = 0 is even.
Therefor SATD (before shift) is always even.
Note: the same argument applies to each column individually, so you can
shift in MMX_SUM_MM before adding, still without loss of precision.
> Why is MMX_SUM_MM called once for every four 4x4 blocks in satd
> functions? The maximum sum from a 4x4 block, as I understand it, is
> 2*256*4*4, and that will be split between four unsigned doublewords
four unsigned words
> with each one getting no more than 256*4*4. So (even before we divide
> the result by two) it is impossible to saturate the result with less
> than sixteen 4x4 blocks.
The maximum sum before shifting is 4*255*4*4 per 4x4 block, with each
column getting at most 2*255*4*4.
This is still only a problem in the MMX version. With SSE2 the sum is
accumulated in 8 unsigned words so each column doesn't overflow, and
SUM_MM_SSE2 can be fixed to deal with the max sum.
A residual that achieves the max SATD is:
+255 -255 -255 -255
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
transforms to
+1020 -1020 -1020 -1020
-1020 +1020 +1020 +1020
-1020 +1020 +1020 +1020
-1020 +1020 +1020 +1020
A residual that achieves the max per column is:
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
transforms to
+2040 0 0 0
-2040 0 0 0
-2040 0 0 0
-2040 0 0 0
(The real Hadamard transform is the transpose of this, but the
MMX version skips that.)
--Loren Merritt
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list