[x264-devel] Re: [Alexander Izvorski <aizvorski at gmail.com>] [patch] SSE2 pixel routines

Loren Merritt lorenm at u.washington.edu
Thu Sep 8 19:22:04 CEST 2005


On Fri, 22 Jul 2005, Alexander Izvorski wrote:

> Why is the result of satd divided by two?!  That throws away one bit
> of precision which would have a small but noticeable impact on PSNR.
> (see the "shr eax,1" in MMX_SUM_MM).

The rightshift in SATD doesn't lose any precision.

Proof by example:
Put a printf before the shift in the C version, and see that the sum is 
always even.

Proof by induction:
The Hadamard matrix contains only entries of 1 and -1. So each element in 
the Hadamard transformed residual depends on each element of the input 
residual by a coefficient of 1 or -1. So adding or subtracting 1 to any 
element of the residual changes each of the transformed elements by +/- 1. 
The same happens to their absolute values. There are an even number of 
elements (16), so the parity of the sum is unchanged.
Base case: SATD(0) = 0 is even.
Therefor SATD (before shift) is always even.

Note: the same argument applies to each column individually, so you can 
shift in MMX_SUM_MM before adding, still without loss of precision.

> Why is MMX_SUM_MM called once for every four 4x4 blocks in satd
> functions?  The maximum sum from a 4x4 block, as I understand it, is
> 2*256*4*4, and that will be split between four unsigned doublewords

four unsigned words

> with each one getting no more than 256*4*4.  So (even before we divide
> the result by two) it is impossible to saturate the result with less
> than sixteen 4x4 blocks.

The maximum sum before shifting is 4*255*4*4 per 4x4 block, with each 
column getting at most 2*255*4*4.
This is still only a problem in the MMX version. With SSE2 the sum is 
accumulated in 8 unsigned words so each column doesn't overflow, and 
SUM_MM_SSE2 can be fixed to deal with the max sum.

A residual that achieves the max SATD is:
+255 -255 -255 -255
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
transforms to
+1020 -1020 -1020 -1020
-1020 +1020 +1020 +1020
-1020 +1020 +1020 +1020
-1020 +1020 +1020 +1020

A residual that achieves the max per column is:
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
-255 +255 +255 +255
transforms to
+2040   0    0    0
-2040   0    0    0
-2040   0    0    0
-2040   0    0    0
(The real Hadamard transform is the transpose of this, but the
MMX version skips that.)

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list