vec_perm and vec_s(r|l) happen are computed on different units (VPERM vs VSFX - vector simple integer). You can, in theory, send to to their respective units on the same cycle and have them execute and complete at the same time (if they are not dependent on one another), whereas if you code two instructions in succession that are not dependent on each other that rely on the VSFX unit, they cannot be executed at the same time - the second instruction will have to wait until the VSFX can accept another instruction.<div>
<br></div><div>If your code is VSFX heavy, then offloading instructions to VPERM when available, even if throughput/latency is slightly higher, may still improve the speed of your code.<br><div><div><br></div><div><br><br>
<div class="gmail_quote">On Sat, Jan 24, 2009 at 5:53 AM, Guillaume POIRIER <span dir="ltr"><<a href="mailto:gpoirier@mplayerhq.hu">gpoirier@mplayerhq.hu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="Ih2E3d">On Sat, Jan 24, 2009 at 2:47 AM, Holger Lubitz<br>
<<a href="mailto:Holger.Lubitz@informatik.uni-oldenburg.de">Holger.Lubitz@informatik.uni-oldenburg.de</a>> wrote:<br>
>> The 8x8 doesn't such a big speed-up because the data is 8-bytes<br>
>> aligned, not 16-bytes aligned, so it's necessary to permute it before<br>
>> using it.<br>
><br>
> I do not know much about altivec at all, but it seems the permute may be more<br>
> expensive than a shift. Have you tried just shifting things into place?<br>
<br>
</div>vec_perm and vec_s(r|l)* have the same throughput and latencies.<br>
That's what's cool about it ;-)<br>
<div class="Ih2E3d"><br>
Guillaume<br>
--<br>
Only a very small fraction of our DNA does anything; the rest is all<br>
comments and ifdefs.<br>
<br>
</div>Marilyn Monroe - "It's not true that I had nothing on. I had the radio on."<br>
<div><div></div><div class="Wj3C7c">_______________________________________________<br>
x264-devel mailing list<br>
<a href="mailto:x264-devel@videolan.org">x264-devel@videolan.org</a><br>
<a href="http://mailman.videolan.org/listinfo/x264-devel" target="_blank">http://mailman.videolan.org/listinfo/x264-devel</a><br>
</div></div></blockquote></div><br></div></div></div>