[x264-devel] [PATCH] faster mc_chroma_altivec
Guillaume POIRIER
gpoirier at mplayerhq.hu
Thu Feb 5 23:10:45 CET 2009
Hi,
On Wed, Feb 4, 2009 at 6:06 PM, Guillaume POIRIER <gpoirier at mplayerhq.hu> wrote:
> Hello,
>
> 2009/2/4 <maaanuuu at gmx.net>:
>>>> Finally, I put width == 2 into its own function because at the moment the
>>>> code that is used for it is actually slower than plain C.
>>>
>>> I'm not surprised. There's too little work to do to use AltiVec here.
>>> Did you try to do some pseudo-SIMD using general purpose registers?
>>
>> Do you mean parallel loop unrolling? I tried that and it wasn't faster on a
>> PPC970.
>
> Nope, more something like what's done in x264/common/predict.c or
> what's described here too http://guru.multimedia.cx/simd-without-simd/
> The key idea is to use a variable of a bigger size (say a short to
> represent two char) to compute 2 char values at a time.
>
>
> Note that I don't know if it's possible to write an efficient
> pseudo-SIMD version of width==2 code.
So instead of this code:
+ for( y = 0; y < i_height; y++ )
+ {
+ dst[0] = ( cA*src[0] + cB*src[0+1] +
+ cC*srcp[0] + cD*srcp[0+1] + 32 ) >> 6;
+ dst[1] = ( cA*src[1] + cB*src[1+1] +
+ cC*srcp[1] + cD*srcp[1+1] + 32 ) >> 6;
+
+ src += i_src_stride;
+ srcp += i_src_stride;
+ dst += i_dst_stride;
+ }
you may want to try:
uint16_t thirtytwo = 0x2020;
uint16_t *src16_0 = (uint16_t*)src;
uint16_t *src16_1 = (uint16_t*)(src+1);
uint16_t * srcp16_0 = (uint16_t*) srcp;
uint16_t * srcp16_1 = (uint16_t*)(srcp+1);
uint16_t *dst16 = (uint16_t*)dst;
for( y = 0; y < i_height; y++ )
{
dst16[0] = ( cA*src16_0[0] + cB*src16_1[0] +
cC*srcp_0[0] + cD*srcp_1[0] + thirtytwo) >> 6;
src += i_src_stride;
srcp += i_src_stride;
dst += i_dst_stride;
}
Note that I haven't checked if it compiles or if it's correct, it's
just to show you what I'm talking about.
Guillaume
--
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.
Lucille Ball - "The secret of staying young is to live honestly, eat
slowly, and lie about your age."
More information about the x264-devel
mailing list