[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, idct8

Mon Oct 2 04:49:00 CEST 2006

On Sun, 1 Oct 2006, Guillaume POIRIER wrote:
>
> Ok, I was just being careless in my tests. It's just that I've tested only 
> with checkasm, and not with a real-word encoder. That was foolish of me.
>
> After I ran some tests, it looks like there are only 3 useful VEC_DIFF_xx 
> patterns encountered in real life:
>
> - both arrays are always at least 16-bytes aligned, and aligned i_pix1 and 
> i_pix2 are multiples of 16. That means that the no special trick has to be 
> done to load a full line.... and that I should maybe create a macro for 
> VEC_DIFF_16BYTES_ALIGNED
>
> - Both arrays are 8-bytes aligned and i_pix1 and i_pix2 are multiples of 16. 
> That means that all loads would need some permutation, and that 
> VEC_DIFF_H_8BYTE_ALIGNED() has to be used everywhere.
>
> Note that the 2 above are by far the most common case.
>
> The third case is when the i_pix2 is a multiple of 8, so at any given moment, 
> the alignment of each memory access is different (this is a case tested in 
> checkasm, but that I didn't see in real world with the options I've tested). 
> In that case, the interleaved VEC_DIFF_H_8BYTE_ALIGNED/VEC_DIFF_H takes care 
> of it. This could probably be improved somehow, but since that case isn't 
> common in my experience, I don't see the point to optimize it.... but I could 
> be wrong (as I tested a small subset of encoding options).

checkasm is just testing that the two strides are different, and that pix2 
is unaligned. I didn't think about mod16 when I coded that. This case 
should be gone now.

>> sad, satd, and sa8d can all be optimized for:
>> pix1 is aligned to whatever the block size is.
>> pix2 is unaligned.
>
> unaligned, as in: _any_ alignment, or as in "sometimes 8 or 16 bytes aligned?
> Also, aren't alignment patterns different for sad, satd, and sa8d?
>
>> Additionally, in the current usage of sa8d, pix2 is also aligned to the 
>> blocksize. But don't count on that remaining so.
>
> Ah crap! What kind of alignment should I assume in the future? No alignment, 
> or 4, 8, ... bytes aligned?

sad, satd, sa8d should all assume pix2 is only a multiple of 1.
Yes, the fraction of calls that happen to have aligned pix2 is different 
between the functions, so it may be useful to have separate aligned 
versions of some of them.

The application I have in mind that needs unaligned sa8d is: predict 
whether the current macroblock will use 8x8 vs 4x4 dct, and if the guess 
is 8x8 then use sa8d instead of satd in motion estimation.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html

[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*

[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, idct8