[x264-devel] Re: Questions about add8x8_idct8

Fri Sep 15 18:56:50 CEST 2006

On Fri, 15 Sep 2006, Guillaume POIRIER wrote:

> I'm working on converting add8x8_idct8 to Altivec, so far, so good, I've got 
> an Altivec version of macro IDCT8_1D.
>
> However, there smth that troubles me, and it's the part when rounding is done 
> "for the >>6 at the end".
[...]
> I'm a little confused regarding why pw_32 isn't just 32 followed by the 
> relevant number of zeros, and I'm also wondering which of my vector register 
> should be summed with pw_32 (probably the one that loaded dct[0][0]), if it 
> matters at all.
>
> It looks like a minor thing, but it confuses me.

The standard says that you should perform the idct, and then for each 
coefficient do
     c[i][j]=(c[i][j]+32)>>6;
But that's equivalent to adding 32 to the dc coefficient, then doing 
the idct, and then
     c[i][j]=c[i][j]>>6;
It's also equivalent to doing the row idct, then adding 32 to all 8 
column dc coefficients, then doing the column idct.

In each implementation, I picked whatever was convenient. In sse2 there 
was some latency to fill between the transpose and the column idct, so 
putting a paddw there was free. Whereas in the mmx version that method 
would have taken 2x paddw, which was slower than 1x add. (Of course, the 
results might be different on a different cpu.)

> Another question: I was also wondering if it was possible to somehow get the 
> memory pointed by uint8_t *dst to be aligned.
> It looks like it's not possible from what I've been able to figure out in 
> encoder/macroblock.c and in common/common.h, but maybe I was just blind...

It should already be 8-byte aligned, and can't be 16-byte aligned.
add8x8_idct8 is called on adjacent 8x8 blocks.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html