[vlc-devel] [patch] avx2 acceleration for i420_yuy2/i422_yuy2/i420_rgb

jnqnfe at gmail.com jnqnfe at gmail.com
Sun Jan 27 12:28:47 CET 2019


ok, I've done some more work, and converted to nasm.

hopefully I've caught you before you've started reviewing v2 of the
patch sent yesterday evening.

if I have, then the "patch4_pre_avx2.patch" and "chroma_avx2_v3.patch"
attachments will be of interest.
 - the former is the previous "patch4" i420_yuy2/i422_yuy2/i420_rgb doc
fix, but rebased to before the AVX2 addition, so only applying to the
existing MMX/SSE2 code, and thus can be merged along with the first set
of chroma patches already reviewed.
 - the latter is the big i420_yuy2/i422_yuy2/i420_rgb AVX2 addition,
with all of the subsequent fixes merged in, including the portion of
the just mentioned doc fix that was applicable to it, **AND** now
converted to nasm.

if you've already started review, and you've prefer not to start again,
the "chroma_avx2_nasm.patch" attachment can be used instead, which is
simply the conversion of the previous v2 of the patch to nasm.

On Sat, 2019-01-26 at 00:44 +0100, Jean-Baptiste Kempf wrote:
> Shouldn't that be moved to nasm/yasm syntax?
> 
> On Tue, 22 Jan 2019, at 22:58, jnqnfe at gmail.com wrote:
> > the attached patch adds AVX2 acceleration for
> > i420_yuy2/i422_yuy2/i420_rgb chroma converters
> > 
> > it is built on top of two other submissions sent in today, one to
> > add
> > an AVX2 module to configure, and the other was a set of various
> > patches
> > to these plugins
> > 
> > it is designed based upon the SSE2 implementation
> > 
> > i've not yet been in any position to compile it, but I've put a lot
> > of
> > work into it over the past fews days perfecting it
> > 
> > benefits:
> >  - twice as much data at a time
> >  - Vex instructions are more compact I believe = less byte code
> >  - use of non-destructive instructions enabled eliminating many of
> > the
> > copies done in the SSE2 version
> > 
> > ---
> > an aside:
> > one small thing I'll mention that I don't like is that a
> > `_mm256_loadl_epi128` function does not exist, so I had to use
> > `_mm256_inserti128_si256`. With the assembly, `vmovdqa` is used for
> > both 256-bit and 128-bit aligned loads (`vmovdqu` for unaligned),
> > with
> > how much data depending on whether you reference a YMM or XMM
> > register;
> > and if an XMM register, it zeros out the top portion. Use of
> > `_mm256_inserti128_si256` in the function based implementation does
> > not
> > zero out the top portion (unless already zero), but should not
> > cause
> > any problem since we don't use that data. Note that an older non-
> > "Vex"
> > ("__mm_" instead of "__mm256_") instruction could be used, but A)
> > this
> > would have the same effect, and B) I have a copy of a 2014 paper
> > which
> > suggests mixing Vex/non-Vex YMM+XMM instructions brings a big
> > performance penalty. I don't know yet whether or not there's a
> > better
> > solution than `_mm256_inserti128_si256` (to properly load 1x128
> > with
> > upper zerod properly as with asm).
> > _______________________________________________
> > vlc-devel mailing list
> > To unsubscribe or modify your subscription options:
> > https://mailman.videolan.org/listinfo/vlc-devel
> > Email had 1 attachment:
> > + chroma_avx2.patch
> >   174k (text/x-patch)
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chroma_avx2_v3.patch
Type: text/x-patch
Size: 122237 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190127/f6beb6c1/attachment-0003.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch4_pre_avx2.patch
Type: text/x-patch
Size: 5346 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190127/f6beb6c1/attachment-0004.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chroma_avx2_nasm.patch
Type: text/x-patch
Size: 74171 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190127/f6beb6c1/attachment-0005.bin>


More information about the vlc-devel mailing list