[x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*
David Wolstencroft
wolstencroft at alum.rpi.edu
Tue Sep 19 11:30:20 CEST 2006
- Previous message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*
- Next message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
On the second
> #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A
and
> vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16);\
should be
#define ALTIVEC_STORE_SUM_CLIP_ALIGN8_B(dest, idctv) {\
and
vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16l);\
Sorry - tired :(
On Sep 19, 2006, at 2:27 AM, David Wolstencroft wrote:
> Sooo, ok, since I'm wired (don't know why I can't sleep....)
>
>
> if (dst is 16 byte aligned)
>
> #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A(dest, idctv) {\
> vec_u8_t dstv = vec_ld(0, dest);\
> vec_s16_t idct_sh6 = vec_sra(idctv, sixv);\
> vec_u16_t dst16h = vec_mergeh(zero_u8v, dstv);\
> vec_u16_t dst16l = vec_mergel(zero_u8v, dstv);\
> vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16h);\
> vec_u8_t sum8 = vec_packsu(sum16, dst16l);\ <- I swear
> pengvado made a mistake here, if that's possible
> vec_st(sum8, 0, dest);\
>
> else (8 byte aligned but not 16 byte aligned)
>
> #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A(dest, idctv) {\
> vec_u8_t dstv = vec_ld(0, dest);\
> vec_s16_t idct_sh6 = vec_sra(idctv, sixv);\
> vec_u16_t dst16h = vec_mergeh(zero_u8v, dstv);\
> vec_u16_t dst16l = vec_mergel(zero_u8v, dstv);\
> vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16);\
> vec_u8_t sum8 = vec_packsu(dst16h, sum16);\
> vec_st(sum8, 0, dest);\
>
>
>
> On Sep 18, 2006, at 12:24 PM, Loren Merritt wrote:
>
>> #define ALTIVEC_STORE_SUM_CLIP_ALIGN8_A(dest, idctv) {\
>> vec_u8_t dstv = vec_ld(0, dest);\
>> vec_s16_t idct_sh6 = vec_sra(idctv, sixv);\
>> vec_u16_t dst16h = vec_mergeh(zero_u8v, dstv);\
>> vec_u16_t dst16l = vec_mergel(zero_u8v, dstv);\
>> vec_s16_t sum16 = vec_adds(idct_sh6, (vec_s16_t)dst16h);\
>> vec_u8_t sum8 = vec_packsu(dst16l, sum16);\
>> vec_st(sum8, 0, dest);\
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20060919/4ae295a7/attachment.htm
- Previous message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*
- Next message: [x264-devel] Re: [PATCH] Altivec optimizations for quant4x4, quant4x4dc, quant8x8, sub8x8_dct8, sub16x16_dct8, pixel_sa8d_8x8, pixel_sa8d_16x16, *idct8*
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the x264-devel
mailing list