>Now VEC_LOAD is used instead of VEC_LOAD_G<div><br></div><div>I had to change that back before. In rare cases (when the width of the input video is not mod 16), using VEC_LOAD will give incorrect results. I have not sent in a patch to checkasm.c to check for these cases.</div>
<div><br></div><div>The mod 16 chroma stride patch from Guillaume might prevent that, but please be certain this is the case.<br><br><div class="gmail_quote">2009/2/2 <span dir="ltr"><<a href="mailto:maaanuuu@gmx.net">maaanuuu@gmx.net</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">Hello,<br>
<br>
the attached patch improves mc_chroma_altivec:<br>
<br>
Now VEC_LOAD is used instead of VEC_LOAD_G, vec_mladd is used more efficient and the loop is unrolled 2x.<br>
mc_chroma_w4_altivec now needs dst to be aligned to a 4 byte boundary, is that OK?<br>
Finally, I put width == 2 into its own function because at the moment the code that is used for it is actually slower than plain C.<br>
<br>
The patch passes checkasm and leads to a 2-3% performance gain overall using the default settings. Please note that I have NOT done extensive regression tests.<br>
Comments and suggestions are welcome :)<br>
<br>
<br>
Manuel<br>
<br>
<br>
<br>
<br><br>
<br>_______________________________________________<br>
x264-devel mailing list<br>
<a href="mailto:x264-devel@videolan.org">x264-devel@videolan.org</a><br>
<a href="http://mailman.videolan.org/listinfo/x264-devel" target="_blank">http://mailman.videolan.org/listinfo/x264-devel</a><br>
<br></blockquote></div><br></div>