[vlc-devel] [patch] avx2 acceleration for i420_yuy2/i422_yuy2/i420_rgb
jnqnfe at gmail.com
jnqnfe at gmail.com
Sat Jan 26 17:10:23 CET 2019
I'll look into it...
On Sat, 2019-01-26 at 00:44 +0100, Jean-Baptiste Kempf wrote:
> Shouldn't that be moved to nasm/yasm syntax?
>
> On Tue, 22 Jan 2019, at 22:58, jnqnfe at gmail.com wrote:
> > the attached patch adds AVX2 acceleration for
> > i420_yuy2/i422_yuy2/i420_rgb chroma converters
> >
> > it is built on top of two other submissions sent in today, one to
> > add
> > an AVX2 module to configure, and the other was a set of various
> > patches
> > to these plugins
> >
> > it is designed based upon the SSE2 implementation
> >
> > i've not yet been in any position to compile it, but I've put a lot
> > of
> > work into it over the past fews days perfecting it
> >
> > benefits:
> > - twice as much data at a time
> > - Vex instructions are more compact I believe = less byte code
> > - use of non-destructive instructions enabled eliminating many of
> > the
> > copies done in the SSE2 version
> >
> > ---
> > an aside:
> > one small thing I'll mention that I don't like is that a
> > `_mm256_loadl_epi128` function does not exist, so I had to use
> > `_mm256_inserti128_si256`. With the assembly, `vmovdqa` is used for
> > both 256-bit and 128-bit aligned loads (`vmovdqu` for unaligned),
> > with
> > how much data depending on whether you reference a YMM or XMM
> > register;
> > and if an XMM register, it zeros out the top portion. Use of
> > `_mm256_inserti128_si256` in the function based implementation does
> > not
> > zero out the top portion (unless already zero), but should not
> > cause
> > any problem since we don't use that data. Note that an older non-
> > "Vex"
> > ("__mm_" instead of "__mm256_") instruction could be used, but A)
> > this
> > would have the same effect, and B) I have a copy of a 2014 paper
> > which
> > suggests mixing Vex/non-Vex YMM+XMM instructions brings a big
> > performance penalty. I don't know yet whether or not there's a
> > better
> > solution than `_mm256_inserti128_si256` (to properly load 1x128
> > with
> > upper zerod properly as with asm).
> > _______________________________________________
> > vlc-devel mailing list
> > To unsubscribe or modify your subscription options:
> > https://mailman.videolan.org/listinfo/vlc-devel
> > Email had 1 attachment:
> > + chroma_avx2.patch
> > 174k (text/x-patch)
>
>
More information about the vlc-devel
mailing list