[x264-devel] [PATCH] faster mc_chroma_altivec
maaanuuu at gmx.net
maaanuuu at gmx.net
Wed Feb 4 17:20:26 CET 2009
Hello,
the attached patch now doesn't fail on dimensions that are not mod 16.
I'll check whether the mod 16 chroma stride patch allows the use of
VEC_LOAD.
>> mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte
>> boundary, is
>> that OK?
>
> Fine by me. The question now is off course: it that ensured by proper
> aligned allocations and strides?
I used grep to search for all calls to mc_chroma. All strides for dst
are either 8, 16 or FDEC_STRIDE, so that should be fine. The
alignments seem to be OK too.
>> Finally, I put width == 2 into its own function because at the
>> moment the
>> code that is used for it is actually slower than plain C.
>
> I'm not surprised. There's too little work to do to use AltiVec here.
> Did you try to do some pseudo-SIMD using general purpose registers?
Do you mean parallel loop unrolling? I tried that and it wasn't faster
on a PPC970.
>> The patch passes checkasm and leads to a 2-3% performance gain
>> overall using
>> the default settings. Please note that I have NOT done extensive
>> regression
>> tests.
>
> Please do so. Run a encode of several hundred frames with and without
> this patch, and make sure that the MD5 matches.
I ran encodings using different input dimensions and different
parameters, and the output files were always identical.
>> Comments and suggestions are welcome :)
>
> Well, if you ask...
>
> + src0v_16A = vec_u8_to_u16( src0v_8A );
> + src0v_16B = vec_u8_to_u16( src0v_8B );
> + dstv_16A = vec_mladd( src0v_16A, coeff0v, k32v );
> + dstv_16B = vec_mladd( src0v_16B, coeff0v, k32v );
>
> Could you put this in a macro to factorize some code?
Done
>
>
> So far, this new code looks alright, though I have to admit I'd prefer
> smaller, self-contained patches to simplify the reviewing process...
How should I split them up? I tried to make the patch cleaner a little
bit.
Manuel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: faster_mc_chroma_altivec2.diff
Type: application/octet-stream
Size: 8405 bytes
Desc: not available
Url : http://mailman.videolan.org/pipermail/x264-devel/attachments/20090204/b650ccb9/attachment.obj
-------------- next part --------------
More information about the x264-devel
mailing list