[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*

Guillaume POIRIER gpoirier at mplayerhq.hu
Sun Sep 24 21:25:56 CEST 2006


Hi,

On 9/18/06, Loren Merritt <lorenm at u.washington.edu> wrote:
> On Mon, 18 Sep 2006, Guillaume POIRIER wrote:
>
> > The attached patch adds *idct8* routines to my whole patchset.
> >
> > Please test and review.
> >
> > IDCT8 can be made faster with less loads and store, but right now, I don't
> > know exactly how to do it. Suggestions welcome.
>
> x264_sub8x8_dct8_altivec could use VEC_DIFF_H_8BYTE_ALIGNED.

Done


> pixel_sa8d_8x8_core_altivec could use a VEC_DIFF with one of the pointers
> 8byte aligned.


So far I've been able to use VEC_DIFF_H_8BYTE_ALIGNED with the
following pattern:

+    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff0v );
+    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff1v );
+    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff2v );
+    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff3v );
+
+    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff4v );
+    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff5v );
+    VEC_DIFF_H_8BYTE_ALIGNED( pix1, i_pix1, pix2, i_pix2, 8, diff6v );
+    VEC_DIFF_H( pix1, i_pix1, pix2, i_pix2, 8, diff7v );

I have not looked too much at this problem, but as far as I've seen,
it looks like one every other call to VEC_DIFF* is done with a
different alignment of pix1 and pix2;
i.e. each call of VEC_DIFF_H_8BYTE_ALIGNED is done with both pix1 and
pix8 8bytes or 16 bytes aligned, whereas on the above call the calls
to VEC_DIFF are done with a different alignment of pix1 and pix2 (i.e.
one is 8bytes aligned and the other is 16 bytes aligned).

I'll see what I can do, but I imagine it's possible to make do without
using VEC_DIFF (which doesn't care about alignment at all).


> ALTIVEC_STORE_SUM_CLIP is 8byte aligned, so it could have two versions like
>    #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A(dest, idctv) {\
>      vec_u8_t dstv = vec_ld(0, dest);\
>      vec_s16_t idct_sh6 = vec_sra(idctv, sixv);\
>      vec_u16_t dst16h = vec_mergeh(zero_u8v, dstv);\
>      vec_u16_t dst16l = vec_mergel(zero_u8v, dstv);\
>      vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16h);\
>      vec_u8_t sum8 = vec_packsu(dst16l, sum16);\
>      vec_st(sum8, 0, dest);\
>    }
> ... and swap dst16l with dst16h for the other parity.

Ok, done. Works beautifully for idct8 8x8... but broke 16x16 idct8 for
some reason. I need to see what's wrong there.

No patch today because I forgot to bring it along with me. ;-)


Now I have a question regarding a bug I've found in the Altivec quant code.
I've noticed on some encodes I've done with that patch, I'm getting
some isolated green or blue blocks that sometimes create green drags
on first pass, and on the final encode, I'm just getting blocs that
"pop in and pop out" (as in: the motion compensation doesn't turn them
into green drags).

It _appears_ that the more I activate high quality options (RD,
trellis), the less artifacts I'm getting. I imagine that it means that
the different codepath taken with high quality options may not trigger
the bug as often, or maybe compensate for them.

What's funny is that the bug is un-reproductible, as in: if I take a
sample encode it once, I'll get some green/blue blocks, say at frames
5 and 7... and if I re-encode, with the same source, and the same
options, I won't get the blocs at the same frames and at the same
locations of the frame.

This is puzzling.

I have suspected that I could have an overflow problem and thus
converted the routine to use only saturated arithmetic, but it didn't
change anything...

I've also suspected that I could be bitten by some unaligned load and
stores, but it doesn't _seem_ to be the problem (both because checked
with some if (pointer%16) printf ("unaligned access\n") here and
there, and because I imagine if there were some unaligned mem access
problems, I should be getting a whole lore more of image corruptions)

Anyway, I don't know where to look now to solve this problem, so I'm
asking you guys if you have some brilliant ideas to further narrow
down and solve my problem.


Guillaume
-- 
With DADVSI (http://en.wikipedia.org/wiki/DADVSI), France finally has
a lead on USA on selling out individuals right to corporations!
Vive la France!

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list