[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, idct8

Mon Sep 25 00:46:32 CEST 2006

On Sun, 24 Sep 2006, Guillaume POIRIER wrote:
> On 9/18/06, Loren Merritt <lorenm at u.washington.edu> wrote:
>
>> pixel_sa8d_8x8_core_altivec could use a VEC_DIFF with one of the pointers
>> 8byte aligned.
>
> So far I've been able to use VEC_DIFF_H_8BYTE_ALIGNED with the
> following pattern:
>
> +    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff0v );
> +    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff1v );
> +    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff2v );
> +    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff3v );
> +
> +    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff4v );
> +    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff5v );
> +    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff6v );
> +    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff7v );
>
> I have not looked too much at this problem, but as far as I've seen,
> it looks like one every other call to VEC_DIFF* is done with a
> different alignment of pix1 and pix2;
> i.e. each call of VEC_DIFF_H_8BYTE_ALIGNED is done with both pix1 and
> pix8 8bytes or 16 bytes aligned, whereas on the above call the calls
> to VEC_DIFF are done with a different alignment of pix1 and pix2 (i.e.
> one is 8bytes aligned and the other is 16 bytes aligned).

Weird. That would indicate that stride is only a multiple of 8. Which does 
happen for pix2 during slicetype and chroma_me, but only for sad and satd 
not sa8d.

> I'll see what I can do, but I imagine it's possible to make do without
> using VEC_DIFF (which doesn't care about alignment at all).

sad, satd, and sa8d can all be optimized for:
pix1 is aligned to whatever the block size is.
pix2 is unaligned.
stride1 is a multiple of 16.
stride2 is a multiple of 8, and I could easily make it 16.

Additionally, in the current usage of sa8d, pix2 is also aligned to the 
blocksize. But don't count on that remaining so.

> Now I have a question regarding a bug I've found in the Altivec quant code.
> I've noticed on some encodes I've done with that patch, I'm getting
> some isolated green or blue blocks that sometimes create green drags
> on first pass, and on the final encode, I'm just getting blocs that
> "pop in and pop out" (as in: the motion compensation doesn't turn them
> into green drags).
>
> It _appears_ that the more I activate high quality options (RD,
> trellis), the less artifacts I'm getting. I imagine that it means that
> the different codepath taken with high quality options may not trigger
> the bug as often, or maybe compensate for them.
>
> What's funny is that the bug is un-reproductible, as in: if I take a
> sample encode it once, I'll get some green/blue blocks, say at frames
> 5 and 7... and if I re-encode, with the same source, and the same
> options, I won't get the blocs at the same frames and at the same
> locations of the frame.

The only causes of nondeterminism in single-threaded programs are 
uninitialized memory and deliberate randomness (e.g. time()). 
So try valgrind.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html

[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*

[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, idct8