[vlc-devel] [patch] avx2 acceleration for i420_yuy2/i422_yuy2/i420_rgb
Jean-Baptiste Kempf
jb at videolan.org
Sat Jan 26 00:44:25 CET 2019
Shouldn't that be moved to nasm/yasm syntax?
On Tue, 22 Jan 2019, at 22:58, jnqnfe at gmail.com wrote:
> the attached patch adds AVX2 acceleration for
> i420_yuy2/i422_yuy2/i420_rgb chroma converters
>
> it is built on top of two other submissions sent in today, one to add
> an AVX2 module to configure, and the other was a set of various patches
> to these plugins
>
> it is designed based upon the SSE2 implementation
>
> i've not yet been in any position to compile it, but I've put a lot of
> work into it over the past fews days perfecting it
>
> benefits:
> - twice as much data at a time
> - Vex instructions are more compact I believe = less byte code
> - use of non-destructive instructions enabled eliminating many of the
> copies done in the SSE2 version
>
> ---
> an aside:
> one small thing I'll mention that I don't like is that a
> `_mm256_loadl_epi128` function does not exist, so I had to use
> `_mm256_inserti128_si256`. With the assembly, `vmovdqa` is used for
> both 256-bit and 128-bit aligned loads (`vmovdqu` for unaligned), with
> how much data depending on whether you reference a YMM or XMM register;
> and if an XMM register, it zeros out the top portion. Use of
> `_mm256_inserti128_si256` in the function based implementation does not
> zero out the top portion (unless already zero), but should not cause
> any problem since we don't use that data. Note that an older non-"Vex"
> ("__mm_" instead of "__mm256_") instruction could be used, but A) this
> would have the same effect, and B) I have a copy of a 2014 paper which
> suggests mixing Vex/non-Vex YMM+XMM instructions brings a big
> performance penalty. I don't know yet whether or not there's a better
> solution than `_mm256_inserti128_si256` (to properly load 1x128 with
> upper zerod properly as with asm).
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel
> Email had 1 attachment:
> + chroma_avx2.patch
> 174k (text/x-patch)
--
Jean-Baptiste Kempf - President
+33 672 704 734
More information about the vlc-devel
mailing list