[vlc-devel] [patch] avx2 acceleration for i420_yuy2/i422_yuy2/i420_rgb

jnqnfe at gmail.com jnqnfe at gmail.com
Tue Jan 22 22:58:58 CET 2019

the attached patch adds AVX2 acceleration for
i420_yuy2/i422_yuy2/i420_rgb chroma converters

it is built on top of two other submissions sent in today, one to add
an AVX2 module to configure, and the other was a set of various patches
to these plugins

it is designed based upon the SSE2 implementation

i've not yet been in any position to compile it, but I've put a lot of
work into it over the past fews days perfecting it

 - twice as much data at a time
 - Vex instructions are more compact I believe = less byte code
 - use of non-destructive instructions enabled eliminating many of the
copies done in the SSE2 version

an aside:
one small thing I'll mention that I don't like is that a
`_mm256_loadl_epi128` function does not exist, so I had to use
`_mm256_inserti128_si256`. With the assembly, `vmovdqa` is used for
both 256-bit and 128-bit aligned loads (`vmovdqu` for unaligned), with
how much data depending on whether you reference a YMM or XMM register;
and if an XMM register, it zeros out the top portion. Use of
`_mm256_inserti128_si256` in the function based implementation does not
zero out the top portion (unless already zero), but should not cause
any problem since we don't use that data. Note that an older non-"Vex"
("__mm_" instead of "__mm256_") instruction could be used, but A) this
would have the same effect, and B) I have a copy of a 2014 paper which
suggests mixing Vex/non-Vex YMM+XMM instructions brings a big
performance penalty. I don't know yet whether or not there's a better
solution than `_mm256_inserti128_si256` (to properly load 1x128 with
upper zerod properly as with asm).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chroma_avx2.patch
Type: text/x-patch
Size: 126926 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190122/1b738ca2/attachment-0001.bin>

More information about the vlc-devel mailing list