[x264-devel] Re: Questions about add8x8_idct8
Loren Merritt
lorenm at u.washington.edu
Fri Sep 15 18:56:50 CEST 2006
On Fri, 15 Sep 2006, Guillaume POIRIER wrote:
> I'm working on converting add8x8_idct8 to Altivec, so far, so good, I've got
> an Altivec version of macro IDCT8_1D.
>
> However, there smth that troubles me, and it's the part when rounding is done
> "for the >>6 at the end".
[...]
> I'm a little confused regarding why pw_32 isn't just 32 followed by the
> relevant number of zeros, and I'm also wondering which of my vector register
> should be summed with pw_32 (probably the one that loaded dct[0][0]), if it
> matters at all.
>
> It looks like a minor thing, but it confuses me.
The standard says that you should perform the idct, and then for each
coefficient do
c[i][j]=(c[i][j]+32)>>6;
But that's equivalent to adding 32 to the dc coefficient, then doing
the idct, and then
c[i][j]=c[i][j]>>6;
It's also equivalent to doing the row idct, then adding 32 to all 8
column dc coefficients, then doing the column idct.
In each implementation, I picked whatever was convenient. In sse2 there
was some latency to fill between the transpose and the column idct, so
putting a paddw there was free. Whereas in the mmx version that method
would have taken 2x paddw, which was slower than 1x add. (Of course, the
results might be different on a different cpu.)
> Another question: I was also wondering if it was possible to somehow get the
> memory pointed by uint8_t *dst to be aligned.
> It looks like it's not possible from what I've been able to figure out in
> encoder/macroblock.c and in common/common.h, but maybe I was just blind...
It should already be 8-byte aligned, and can't be 16-byte aligned.
add8x8_idct8 is called on adjacent 8x8 blocks.
--Loren Merritt
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list