[x264-devel] 8x8 and 16x16 Altivec implementation of variance

Sat Jan 24 18:24:19 CET 2009

vec_perm and vec_s(r|l) happen are computed on different units (VPERM vs
VSFX - vector simple integer). You can, in theory, send to to their
respective units on the same cycle and have them execute and complete at the
same time (if they are not dependent on one another), whereas if you code
two instructions in succession that are not dependent on each other that
rely on the VSFX unit, they cannot be executed at the same time - the second
instruction will have to wait until the VSFX can accept another instruction.
If your code is VSFX heavy, then offloading instructions to VPERM when
available, even if throughput/latency is slightly higher, may still improve
the speed of your code.

On Sat, Jan 24, 2009 at 5:53 AM, Guillaume POIRIER <gpoirier at mplayerhq.hu>wrote:

> On Sat, Jan 24, 2009 at 2:47 AM, Holger Lubitz
> <Holger.Lubitz at informatik.uni-oldenburg.de> wrote:
> >> The 8x8 doesn't such a big speed-up because the data is 8-bytes
> >> aligned, not 16-bytes aligned, so it's necessary to permute it before
> >> using it.
> >
> > I do not know much about altivec at all, but it seems the permute may be
> more
> > expensive than a shift. Have you tried just shifting things into place?
>
> vec_perm and vec_s(r|l)* have the same throughput and latencies.
> That's what's cool about it ;-)
>
> Guillaume
> --
> Only a very small fraction of our DNA does anything; the rest is all
> comments and ifdefs.
>
> Marilyn Monroe  - "It's not true that I had nothing on. I had the radio
> on."
> _______________________________________________
> x264-devel mailing list
> x264-devel at videolan.org
> http://mailman.videolan.org/listinfo/x264-devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.videolan.org/pipermail/x264-devel/attachments/20090124/9421eb23/attachment.htm