[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*

Loren Merritt lorenm at u.washington.edu
Mon Sep 18 21:24:12 CEST 2006


On Mon, 18 Sep 2006, Guillaume POIRIER wrote:

> The attached patch adds *idct8* routines to my whole patchset.
>
> Please test and review.
>
> IDCT8 can be made faster with less loads and store, but right now, I don't 
> know exactly how to do it. Suggestions welcome.

x264_sub8x8_dct8_altivec could use VEC_DIFF_H_8BYTE_ALIGNED.
pixel_sa8d_8x8_core_altivec could use a VEC_DIFF with one of the pointers 
8byte aligned.

ALTIVEC_STORE_SUM_CLIP is 8byte aligned, so it could have two versions like
   #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A(dest, idctv) {\
     vec_u8_t dstv = vec_ld(0, dest);\
     vec_s16_t idct_sh6 = vec_sra(idctv, sixv);\
     vec_u16_t dst16h = vec_mergeh(zero_u8v, dstv);\
     vec_u16_t dst16l = vec_mergel(zero_u8v, dstv);\
     vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16h);\
     vec_u8_t sum8 = vec_packsu(dst16l, sum16);\
     vec_st(sum8, 0, dest);\
   }
... and swap dst16l with dst16h for the other parity.

--Loren Merritt

-- 
This is the x264-devel mailing-list
To unsubscribe, go to: http://developers.videolan.org/lists.html



More information about the x264-devel mailing list