[x264-devel] [PATCH] faster mc_chroma_altivec

Tue Feb 3 01:28:55 CET 2009

>Now VEC_LOAD is used instead of VEC_LOAD_G
I had to change that back before. In rare cases (when the width of the input
video is not mod 16), using VEC_LOAD will give incorrect results. I have not
sent in a patch to checkasm.c to check for these cases.

The mod 16 chroma stride patch from Guillaume might prevent that, but please
be certain this is the case.

2009/2/2 <maaanuuu at gmx.net>

> Hello,
>
> the attached patch improves mc_chroma_altivec:
>
> Now VEC_LOAD is used instead of VEC_LOAD_G, vec_mladd is used more
> efficient and the loop is unrolled 2x.
> mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte boundary, is
> that OK?
> Finally, I put width == 2 into its own function because at the moment the
> code that is used for it is actually slower than plain C.
>
> The patch passes checkasm and leads to a 2-3% performance gain overall
> using the default settings. Please note that I have NOT done extensive
> regression tests.
> Comments and suggestions are welcome :)
>
>
> Manuel
>
>
>
>
>
>
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20090202/3998e4ab/attachment.htm