[x264-devel] [PATCH] faster mc_chroma_altivec

Guillaume POIRIER gpoirier at mplayerhq.hu
Thu Feb 5 23:10:45 CET 2009


Hi,

On Wed, Feb 4, 2009 at 6:06 PM, Guillaume POIRIER <gpoirier at mplayerhq.hu> wrote:
> Hello,
>
> 2009/2/4  <maaanuuu at gmx.net>:
>>>> Finally, I put width == 2 into its own function because at the moment the
>>>> code that is used for it is actually slower than plain C.
>>>
>>> I'm not surprised. There's too little work to do to use AltiVec here.
>>> Did you try to do some pseudo-SIMD using general purpose registers?
>>
>> Do you mean parallel loop unrolling? I tried that and it wasn't faster on a
>> PPC970.
>
> Nope, more something like what's done in x264/common/predict.c or
> what's described here too http://guru.multimedia.cx/simd-without-simd/
> The key idea is to use a variable of a bigger size (say a short to
> represent two char) to compute 2 char values at a time.
>
>
> Note that I don't know if it's possible to write an efficient
> pseudo-SIMD version of width==2 code.


So instead of this code:

+    for( y = 0; y < i_height; y++ )
+    {
+        dst[0] = ( cA*src[0] +  cB*src[0+1] +
+                  cC*srcp[0] + cD*srcp[0+1] + 32 ) >> 6;
+        dst[1] = ( cA*src[1] +  cB*src[1+1] +
+                  cC*srcp[1] + cD*srcp[1+1] + 32 ) >> 6;
+
+        src  += i_src_stride;
+        srcp += i_src_stride;
+        dst  += i_dst_stride;
+    }


you may want to try:

uint16_t thirtytwo = 0x2020;
uint16_t *src16_0 = (uint16_t*)src;
uint16_t *src16_1 = (uint16_t*)(src+1);
uint16_t * srcp16_0 = (uint16_t*) srcp;
uint16_t * srcp16_1 = (uint16_t*)(srcp+1);

uint16_t *dst16 = (uint16_t*)dst;

    for( y = 0; y < i_height; y++ )
    {
        dst16[0] = ( cA*src16_0[0] +  cB*src16_1[0] +
                  cC*srcp_0[0] + cD*srcp_1[0] + thirtytwo) >> 6;

        src  += i_src_stride;
        srcp += i_src_stride;
        dst  += i_dst_stride;
    }

Note that I haven't checked if it compiles or if it's correct, it's
just to show you what I'm talking about.

Guillaume
-- 
Only a very small fraction of our DNA does anything; the rest is all
comments and ifdefs.

Lucille Ball  - "The secret of staying young is to live honestly, eat
slowly, and lie about your age."


More information about the x264-devel mailing list