[x264-devel] [PATCH] faster mc_chroma_altivec

maaanuuu at gmx.net maaanuuu at gmx.net
Wed Feb 4 17:20:26 CET 2009


Hello,

the attached patch now doesn't fail on dimensions that are not mod 16.  
I'll check whether the mod 16 chroma stride patch allows the use of  
VEC_LOAD.

>> mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte  
>> boundary, is
>> that OK?
>
> Fine by me. The question now is off course: it that ensured by proper
> aligned allocations and strides?

I used grep to search for all calls to mc_chroma. All strides for dst  
are either 8, 16 or FDEC_STRIDE, so that should be fine. The  
alignments seem to be OK too.


>> Finally, I put width == 2 into its own function because at the  
>> moment the
>> code that is used for it is actually slower than plain C.
>
> I'm not surprised. There's too little work to do to use AltiVec here.
> Did you try to do some pseudo-SIMD using general purpose registers?

Do you mean parallel loop unrolling? I tried that and it wasn't faster  
on a PPC970.


>> The patch passes checkasm and leads to a 2-3% performance gain  
>> overall using
>> the default settings. Please note that I have NOT done extensive  
>> regression
>> tests.
>
> Please do so. Run a encode of several hundred frames with and without
> this patch, and make sure that the MD5 matches.

I ran encodings using different input dimensions and different  
parameters, and the output files were always identical.


>> Comments and suggestions are welcome :)
>
> Well, if you ask...
>
> +        src0v_16A = vec_u8_to_u16( src0v_8A );
> +        src0v_16B = vec_u8_to_u16( src0v_8B );
> +        dstv_16A = vec_mladd( src0v_16A, coeff0v, k32v );
> +        dstv_16B = vec_mladd( src0v_16B, coeff0v, k32v );
>
> Could you put this in a macro to factorize some code?

Done

>
>
> So far, this new code looks alright, though I have to admit I'd prefer
> smaller, self-contained patches to simplify the reviewing process...

How should I split them up? I tried to make the patch cleaner a little  
bit.


Manuel


-------------- next part --------------
A non-text attachment was scrubbed...
Name: faster_mc_chroma_altivec2.diff
Type: application/octet-stream
Size: 8405 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090204/b650ccb9/attachment.obj 
-------------- next part --------------



More information about the x264-devel mailing list