[x264-devel] Re: Questions about add8x8_idct8

Loren Merritt lorenm at u.washington.edu
Fri Sep 15 18:56:50 CEST 2006

On Fri, 15 Sep 2006, Guillaume POIRIER wrote:

> I'm working on converting add8x8_idct8 to Altivec, so far, so good, I've got 
> an Altivec version of macro IDCT8_1D.
> However, there smth that troubles me, and it's the part when rounding is done 
> "for the >>6 at the end".
> I'm a little confused regarding why pw_32 isn't just 32 followed by the 
> relevant number of zeros, and I'm also wondering which of my vector register 
> should be summed with pw_32 (probably the one that loaded dct[0][0]), if it 
> matters at all.
> It looks like a minor thing, but it confuses me.

The standard says that you should perform the idct, and then for each 
coefficient do
But that's equivalent to adding 32 to the dc coefficient, then doing 
the idct, and then
It's also equivalent to doing the row idct, then adding 32 to all 8 
column dc coefficients, then doing the column idct.

In each implementation, I picked whatever was convenient. In sse2 there 
was some latency to fill between the transpose and the column idct, so 
putting a paddw there was free. Whereas in the mmx version that method 
would have taken 2x paddw, which was slower than 1x add. (Of course, the 
results might be different on a different cpu.)

> Another question: I was also wondering if it was possible to somehow get the 
> memory pointed by uint8_t *dst to be aligned.
> It looks like it's not possible from what I've been able to figure out in 
> encoder/macroblock.c and in common/common.h, but maybe I was just blind...

It should already be 8-byte aligned, and can't be 16-byte aligned.
add8x8_idct8 is called on adjacent 8x8 blocks.

--Loren Merritt

This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html

More information about the x264-devel mailing list