[x264-devel] [PATCH] faster mc_chroma_altivec
manuelro at gmx.de
manuelro at gmx.de
Tue Feb 3 17:13:29 CET 2009
Ah, you're right, I'll have to change that back for now...
Thank you for your comment!
Manuel
Am 03.02.2009 um 01:28 schrieb David Wolstencroft:
> >Now VEC_LOAD is used instead of VEC_LOAD_G
>
> I had to change that back before. In rare cases (when the width of
> the input video is not mod 16), using VEC_LOAD will give incorrect
> results. I have not sent in a patch to checkasm.c to check for these
> cases.
>
> The mod 16 chroma stride patch from Guillaume might prevent that,
> but please be certain this is the case.
>
> 2009/2/2 <maaanuuu at gmx.net>
> Hello,
>
> the attached patch improves mc_chroma_altivec:
>
> Now VEC_LOAD is used instead of VEC_LOAD_G, vec_mladd is used more
> efficient and the loop is unrolled 2x.
> mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte
> boundary, is that OK?
> Finally, I put width == 2 into its own function because at the
> moment the code that is used for it is actually slower than plain C.
>
> The patch passes checkasm and leads to a 2-3% performance gain
> overall using the default settings. Please note that I have NOT done
> extensive regression tests.
> Comments and suggestions are welcome :)
>
>
> Manuel
>
>
>
>
>
>
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel
>
>
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20090203/5e89781d/attachment.htm
More information about the x264-devel
mailing list