[x264-devel] Questions about add8x8_idct8
Guillaume POIRIER
gpoirier at mplayerhq.hu
Fri Sep 15 17:47:51 CEST 2006
Hello folk,
I'm working on converting add8x8_idct8 to Altivec, so far, so good, I've
got an Altivec version of macro IDCT8_1D.
However, there smth that troubles me, and it's the part when rounding is
done "for the >>6 at the end".
Here is what the C version does:
static void add8x8_idct8( uint8_t *dst, int16_t dct[8][8] )
{
int i;
dct[0][0] += 32; // rounding for the >>6 at the end
An now here's what the SSE2 version does:
x264_add8x8_idct8_sse2:
[..]
xmm9, xmm8 SSE2_TRANSPOSE8x8 xmm9, xmm1, xmm7, xmm3, xmm4, xmm0,
xmm2, xmm6, xmm5 paddw xmm9, [pw_32 GLOBAL] ; rounding for the >>6 at
the end
IDCT8_1D xmm9, xmm0, xmm6, xmm3, xmm5, xmm4, xmm7, xmm1, xmm8, xmm2
but when I look at the definition of pw_32, it reads:
pw_32: times 8 dw 32
Which means that pw_32 is 32, 32, 32, 32, 32, 32, 32, 32 (as words)
the equivalent MMX code just does a simple:
add word [eax], 32
(which is just what the C code does)
I'm a little confused regarding why pw_32 isn't just 32 followed by the
relevant number of zeros, and I'm also wondering which of my vector
register should be summed with pw_32 (probably the one that loaded
dct[0][0]), if it matters at all.
It looks like a minor thing, but it confuses me.
Another question: I was also wondering if it was possible to somehow get
the memory pointed by uint8_t *dst to be aligned.
It looks like it's not possible from what I've been able to figure out
in encoder/macroblock.c and in common/common.h, but maybe I was just
blind...
I looks like I should fiddle with this code in common.h
DECLARE_ALIGNED( uint8_t, fenc_buf[24*FENC_STRIDE], 16 );
DECLARE_ALIGNED( uint8_t, fdec_buf[27*FDEC_STRIDE], 16 );
/* pointer over mb of the frame to be compressed */
uint8_t *p_fenc[3];
/* pointer over mb of the frame to be reconstructed */
uint8_t *p_fdec[3];
/* pointer over mb of the references */
uint8_t *p_fref[2][16][4+2]; /* last: lN, lH, lV, lHV, cU,
cV */
uint16_t *p_integral[2][16];
but I fear it would break a lot of things if I were to do it...
Non-aligned access (especially stores) are a major pain (both slower to
execute and to code) so it would really help if *dst could be 16-bytes
aligned
Guillaume
--
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html
More information about the x264-devel
mailing list