[x264-devel] [PATCH] faster mc_chroma_altivec

manuelro at gmx.de manuelro at gmx.de
Tue Feb 3 17:13:29 CET 2009


Ah, you're right, I'll have to change that back for now...
Thank you for your comment!

Manuel

Am 03.02.2009 um 01:28 schrieb David Wolstencroft:

> >Now VEC_LOAD is used instead of VEC_LOAD_G
>
> I had to change that back before. In rare cases (when the width of  
> the input video is not mod 16), using VEC_LOAD will give incorrect  
> results. I have not sent in a patch to checkasm.c to check for these  
> cases.
>
> The mod 16 chroma stride patch from Guillaume might prevent that,  
> but please be certain this is the case.
>
> 2009/2/2 <maaanuuu at gmx.net>
> Hello,
>
> the attached patch improves mc_chroma_altivec:
>
> Now VEC_LOAD is used instead of VEC_LOAD_G, vec_mladd is used more  
> efficient and the loop is unrolled 2x.
> mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte  
> boundary, is that OK?
> Finally, I put width == 2 into its own function because at the  
> moment the code that is used for it is actually slower than plain C.
>
> The patch passes checkasm and leads to a 2-3% performance gain  
> overall using the default settings. Please note that I have NOT done  
> extensive regression tests.
> Comments and suggestions are welcome :)
>
>
> Manuel
>
>
>
>
>
>
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel
>
>
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20090203/5e89781d/attachment.htm 


More information about the x264-devel mailing list