[vlc-devel] [patch] avx2 acceleration for i420_yuy2/i422_yuy2/i420_rgb

jnqnfe at gmail.com jnqnfe at gmail.com
Fri Mar 8 08:11:48 CET 2019


Ok, I have looked into and addressed:1) The misunderstanding around use
of NASM. The attached patch set discussed below goes back to using MASM
for the big chroma AVX2 enhancement as originally used, (a proper
switch to separate NASM based asm files can be perhaps done later), and
otherwise drops the other invalid NASM syntax conversion patches.2) The
compilation failures pointed out with #17 of the earlier set3) The
compilation issue seen by Remi (someone else's patch already merged may
have also helped)
I have also found and corrected a few other small mistakes I found
whilst ensuring that this latest revision compiles correctly.
Working and writing this offline, I am out of sync with any corrective
changes made to git-master regarding the troublesome commits causing
compilation issues... When I last was online yesterday no corrective
action had yet been taken, except one tiny new configure patch added.
So the first few patches in the attached set focus on getting in
sync...
#1 reverts the MMX purge patch (clashes with the next revert)#2 reverts
the big chroma AVX2 enhancement that we now know to have been
fundamentally flawed (nasm misunderstanding)
of course feel completely free to instead just drop the two respective
merged commits (11fc7ec9835e47a8a8cd56a1e0b71a209f1e517b and
e04412a31bf8bc246b076843ff8eeb6d7ad1cf1b) and force-update the repo
(preferred) or make your own revert commits (to save reviewing my
revert commit changes), if you haven't already taken such action...
#3 is the packetizer startcode-helper AVX2 enhancement patch (unchanged
from that merged)
I mistakenly suggested in my previous email (and in separate public
discussion) that this was one to revert/remove, thinking it contained a
NASM syntax conversion when actually that was done later; so here it is
again in case it has already been incorrectly removed/reverted,
otherwise just ignore it...
#4: fixes the AVX/AVX2 configure inline asm support check, which was
actually broken (I overlooked that it used a bad early attempt at
AVX/VEX asm)#5: addresses the fact that the VLC_AVX define was
missing#6: applies the revised now MASM based big chroma AVX2
enhancement (and includes a few other compilation fixes)#7: fixes
broken compilation of the packetizer startcode-helper#8: reapplies the
MMX purge#9: applies a fixed version of #17 from the previous set
#10-17: are the remainder of the not-yet merged patches from before, up
to the nasm conversions which are now dropped
I have checked that this compiles successfully.
Regards,
On Wed, 2019-03-06 at 11:17 -0500, Jean-Baptiste Kempf wrote:
> Hello Lyndon,
> 
> I applied what I could, aka 1-16.
> 
> 17 does not work, because it fails to compile
> 
> CC       video_filter/deinterlace/libdeinterlace_plugin_la-helpers.lo
> ../../modules/video_filter/deinterlace/helpers.c: In function
> ‘EstimateNumBlocksWithMotion’:
> ../../modules/video_filter/deinterlace/helpers.c:243:5: error: output
> number 0 not directly addressable
> __asm__ volatile (
> ^~~~~~~
> ../../modules/video_filter/deinterlace/helpers.c:243:5: error: output
> number 1 not directly addressable
> ../../modules/video_filter/deinterlace/helpers.c:243:5: error: output
> number 2 not directly addressable
> ../../modules/video_filter/deinterlace/helpers.c: In function
> ‘CalculateInterlaceScore’:
> ../../modules/video_filter/deinterlace/helpers.c:607:5: error: output
> number 0 not directly addressable
> __asm__ volatile ("movd %%xmm7, %0\n" : "=" (i_score_sse) ::
> "memory");
> ^~~~~~~
> make[4]: *** [Makefile:21748:
> video_filter/deinterlace/libdeinterlace_plugin_la-helpers.lo] Error 1
> 
> Best,
> 
> 
> On Sun, 3 Mar 2019, at 12:11, Jean-Baptiste Kempf wrote:
> > Btw, you moved to the nasm syntax, but not to nasm files? (patches
> > 26-31)
> > 
> > On Thu, 28 Feb 2019, at 00:54, Jean-Baptiste Kempf wrote:
> > > Merci.
> > > 
> > > On Wed, 27 Feb 2019, at 19:48, jnqnfe at gmail.com wrote:
> > > > Ok, I've rebased and attached
> > > > 
> > > > Regarding your issue applying the previous copy, I wonder
> > > > whether you perhaps missed the note in the first post which
> > > > pointed out that this built on other patches submitted days
> > > > earlier? I had no issues rebasing onto git master (except an
> > > > expected clash with the subsequent MMX purge with someone
> > > > else's deinterlace work).
> > > > 
> > > > The attached rebased patch collection bundles (in order):
> > > > - The initial set of general i420_rgb/i420_yuy2/i422_yuy2 fixes
> > > > (patches 1-11)
> > > > - The i420_rgb intrinsics buffer overflow fix (patch 12)
> > > > - The i420_rgb/i420_yuy2/i422_yuy2 AVX2 enhancement (patches
> > > > 13-14)
> > > > - The packetizer/startcode_helper AVX2 enhancement (patch 15)
> > > > - The purging of MMX/MMXEXT/3Dnow (patches 16-25)
> > > > - The conversion of asm to nasm syntax (patches 26-31)
> > > > 
> > > > The first of these was approved of by one person who just had a
> > > > question about the second patch (removal of what I believe are
> > > > unused RV24 artifacts, which he may have misunderstood as
> > > > removing RV24 support), I had responded to clarify but then
> > > > nothing happened. With the second (buffer overflow patch) there
> > > > was some discussion, but no merge of the patch...
> > > > 
> > > > I have taken the opportunity of bundling all of this together,
> > > > to:
> > > > - fixup one patch (fixed a mistake in a larger previous one)
> > > > - fix a few commit message typos and such
> > > > 
> > > > I have also reworked the MMX purge with the deinterlace plugin
> > > > on top of the now merged work of Janne Grunau
> > > > 
> > > > Oh, I should point out that with the last set, the nasm
> > > > conversion, which came about from someone suggesting using this
> > > > for the AVX2 asm (as I then reworked it to use), all I have
> > > > done is to switch the syntax itself, I have not made any change
> > > > to the build files (does a hint not need to be given to the
> > > > compiler to get it to invoke the right assembler?)
> > > > 
> > > > Regards,
> > > > Lyndon
> > > > 
> > > > On Sat, 2019-02-23 at 12:00 -0500, Jean-Baptiste Kempf wrote:
> > > > > Hello Lyndon,
> > > > > 
> > > > > Can you repost all the patches, in the correct order? I
> > > > > cannot apply any of them, in my tree.
> > > > > 
> > > > > Best,
> > > > > 
> > > > > On Thu, 31 Jan 2019, at 09:39, jnqnfe at gmail.com wrote:
> > > > > > sigh, so here's v4
> > > > > > 
> > > > > > v3 did not include a conversion of the RGB15 AVX2 to nasm,
> > > > > > while the v2
> > > > > > -> v3 diff patch (sent in case you'd already started
> > > > > > reviewing v2) did
> > > > > > 
> > > > > > I'm perplexed as to how on earth that happened, since the
> > > > > > diff patch
> > > > > > was a simple fixup into v2 to produce v3...
> > > > > > 
> > > > > > _______________________________________________
> > > > > > vlc-devel mailing list
> > > > > > To unsubscribe or modify your subscription options:
> > > > > > https://mailman.videolan.org/listinfo/vlc-devel
> > > > > > 
> > > > > > Attachments:
> > > > > > chroma_avx2_v4.patch
> > > > > 
> > > > > --
> > > > > Jean-Baptiste Kempf -  President
> > > > > +33 672 704 734
> > > > >  
> > > > > 
> > > > > 
> > > > > 
> > > > 
> > > > Attachments:
> > > > patches.zip
> > > 
> > > --
> > > Jean-Baptiste Kempf -  President
> > > +33 672 704 734
> > >  
> > > 
> > 
> > --
> > Jean-Baptiste Kempf -  President
> > +33 672 704 734
> >  
> > 
> > 
> > 
> 
> --
> Jean-Baptiste Kempf -  President
> +33 672 704 734
>  
> 
> 
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190308/d495a810/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patches.zip
Type: application/zip
Size: 65401 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190308/d495a810/attachment.zip>


More information about the vlc-devel mailing list