[vlc-devel] Software decoding in Hardware buffers

Mon Aug 12 14:19:45 CEST 2019

Software decoding is not a rare case. Yes this breaks push completely for software decoding since it requires decoders and filters to pull buffers from decoder_NewPicture or filter_NewPicture.

Le 12 août 2019 15:00:00 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a écrit :
>On 2019-08-12 13:32, Rémi Denis-Courmont wrote:
>> Hi,
>> 
>> Your proposal does break push completely. It literally requires all
>software decoder and filter to pull buffers from the device. The whole
>point of push is intrinsically that the upstream decides how it
>allocates its buffers: you cannot require decoders or filters to
>allocate in any specific one way.
>
>No it's not all the time, only in rare cases. Here that's for MMAL
>only, 
>but as I said it could be optionally enabled for D311 if there's 
>performance gains.
>
>It's only for decoders/filters that don't use a video context of their 
>own. So the way the picture planes are allocated doesn't matter to them
>
>as long as they are usable. It's transparent to them.
>
>In decoder_NewPicture() and filter_NewPicture() there are already 
>different cases depending if there's a video context to allocate the 
>pictures or not. It makes no difference there either. The only 
>difference is that they were added a video context without their
>knowledge.
>
>> Le 12 août 2019 12:09:36 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a
>écrit :
>>> On 2019-08-12 10:15, Alexandre Janniaux wrote:
>>>> Hi,
>>>>
>>>>> No, it is not my opinion. It is what was agreed collectively.
>Unlike
>>> your opinion, which engages only you.
>>>>>
>>>>> I am very fed up with people misconstruing earlier decisions as my
>>> opinion. You can not have it both ways.
>>>>
>>>> Sorry, even if you are right, for my part I fail to agree being
>>> included in
>>>> this kind of message whereas what we decided hasn't been summarized
>>> somewhere.
>>>> You can't point design issue on non-existant design, so it doesn't
>>> seem right
>>>> to consider someone else work as being "misconstruing" the decision
>>> we took.
>>>>
>>>> I took notes for myself and tried later to summarize them, after
>the
>>> first
>>>> questions about what we decided were raised on this mailing list,
>but
>>> I don't
>>>> have enough background to finish them without being biased by my
>own
>>> ideas
>>>> on the implementation behind push model. Even the design and naming
>>> itself has
>>>> evolved since the second vout workshop. Maybe it would help if I
>>> publish them
>>>> as draft, with the scan of the notes, so that it can be completed
>>> somewhere
>>>> publicly ?
>>>
>>> IMO we should have a formal summary of what was decided during
>>> workshops. It should go into details so there's no confusion.
>Subjects
>>> that are left to be decided should also be mentioned. It would also
>>> give
>>> an idea for people not in the workshop of where things are headed,
>what
>>>
>>> is going to change.
>>>
>>> This is not the first time we disagree in directions after we had a
>>> workshop. Implementing things always lead to some corner cases that
>we
>>> may have not seen during the meeting.
>>>
>>>> I don't see any issues with trying to include more cases, even if I
>>> would
>>>> prefer having the first layers first before starting to experiment
>>> with
>>>> the push architecture.
>>>>
>>>> Thank you for the constructing argument on push for the SoC case
>>> though.
>>>> IMHO they are good arguments to show that it should work on SoC and
>>> that
>>>> delaying some decoder full support with an extra copy is
>acceptable.
>>>>
>>>> On Mon, Aug 12, 2019 at 08:57:25AM +0200, Steve Lhomme wrote:
>>>>> On 2019-08-11 13:45, Rémi Denis-Courmont wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Indeed we did agree that some old and crappy API's may require
>>> memory copying if they have no sane ways to allocate and reference
>>> picture buffers, including but not limited to old OpenGL versions.
>>>>
>>>> OpenGL isn't really handling native resource management and
>>> everything
>>>> related to that but pixel buffer and CPU upload is made by the
>layer
>>>> below (EGL, GLX, ANGLE). It might not be the best example when it
>>> comes
>>>> to pictures. However, I agree that additional copies on edge cases
>>> should
>>>> not prevent us from shifting to a push model.
>>>
>>> Nothing is preventing the push model we designed. It's only a matter
>of
>>>
>>> not losing performance because of it.
>>>
>>>>>
>>>>> Yes we said that for OpenGL there were cases where some copies
>could
>>> be
>>>>> needed. But there was already a copy from CPU to GPU in that case,
>>> be it in
>>>>> our code or in the driver. (we're talking about software decoding)
>>> So it
>>>>> doesn't really matter who does it.
>>>>>
>>>>> In the case of MMAL (or a SoC architecture in general) the case
>>> wasn't
>>>>> raised AFAIK. Here we introduce an extra copy that didn't exist
>>> before.
>>>>>
>>>>> In the case of OpenGL that's on rare/old OpenGL implementations
>that
>>> we
>>>>> handle the copy ourself. In the case of MMAL that's the default
>>>>> implementation for everyone.
>>>>>
>>>>
>>>> Maybe the optimization can be tackled after the design has been set
>>> up on
>>>> the whole architecture. I haven't checked but I believe it could be
>>>> replaced (now or later) by the more or less standard GBM + v4l2
>layer
>>>> like on Linux instead of relying on mmal API, which would make the
>>>> raspberry push-friendly.
>>>
>>>  From what I saw the MMAL vout is really a standalone one and not
>>> related to that. I suppose MMAL decoding should also work in OpenGL
>but
>>>
>>> that doesn't seem implemented. Maybe it is in John Cox's branch. It
>may
>>>
>>> also be worth investivating this as if there's PBO then we're back
>to
>>> the regualar case where we don't need a copy. And that's likely the
>way
>>>
>>> we want to go forward. If an inferior display module (current MMAL)
>is
>>> used then we can live with some drawback.
>>>
>>>> GBM is the graphics allocation layer used everywhere but NVidia
>>> (which has
>>>> it's own EGLStream which works more or less the same, for a
>change).
>>> You get
>>>> an object which can be turned into a file descriptor, and be
>imported
>>> in
>>>> either graphics API like vulkan or EGL, or windowing API like X11
>or
>>> Wayland.
>>>> It's a very simple API so it's quite future-proof even if it will
>>> eventually
>>>> be replaced again.
>>>>
>>>> I don't know what's available for BSD-like system or other exotic
>>> systems
>>>> but I guess the first challenge would be GPU support even before
>>> supporting
>>>> the push model for most SBC available on the market, and as they
>are
>>> not
>>>> officially supported for now, we might avoid considering them in
>the
>>> design
>>>> as well.
>>>>
>>>> Greats,
>>>> --
>>>> Alexandre Janniaux
>>>> VideoLabs
>>>>
>>>>>
>>>>>> Le 9 août 2019 16:11:55 GMT+03:00, Steve Lhomme
><robux4 at ycbcr.xyz>
>>> a écrit :
>>>>>>> Did we agree that MMAL will get extra copies due to our design
>>>>>>> decisions ?
>>>>>>>
>>>>>>> On 2019-08-09 13:55, Rémi Denis-Courmont wrote:
>>>>>>>> No, it is not my opinion. It is what was agreed collectively.
>>> Unlike
>>>>>>> your opinion, which engages only you.
>>>>>>>>
>>>>>>>> I am very fed up with people misconstruing earlier decisions as
>>> my
>>>>>>> opinion. You can not have it both ways.
>>>>>>>>
>>>>>>>> Le 9 août 2019 12:57:53 GMT+03:00, Steve Lhomme
>>> <robux4 at ycbcr.xyz> a
>>>>>>> écrit :
>>>>>>>>> On 2019-08-09 10:16, Rémi Denis-Courmont wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> The MMAL plugins are unmaintained. By definition, if the
>>>>>>>>> implementation that actually sees users and updates is another
>>> one,
>>>>>>>>> then ours is unmaintained.
>>>>>>>>>>
>>>>>>>>>> And the point is that I don't want to change the core for a
>>>>>>>>> misdesigned plugin. I have not seen any technically valid
>>>>>>> justification
>>>>>>>>> for adding yet another way to allocate pictures, nor how this
>>> would
>>>>>>>>> work.
>>>>>>>>>>
>>>>>>>>>> You cannot expect software decoders and filters to allocate
>>>>>>> pictures
>>>>>>>> >from decoder device or video context. That's complete denial
>of
>>>>>>>>> everything that was agreed upon, and reintroduces a whole lot
>of
>>>>>>>>> problems that push was supposed to fix.
>>>>>>>>>
>>>>>>>>> That's your opinion and I don't share it. We made some design
>>>>>>> choices
>>>>>>>>> but until they were implemented we had no idea if all use
>cases
>>> were
>>>>>>>>> covered. And it turns out not all use cases are covered. With
>>> MMAL
>>>>>>> (be
>>>>>>>>> it the old module or the new module) the design we have is not
>>> good
>>>>>>>>> enough. We are forcing copies where they didn't exist before.
>>>>>>>>>
>>>>>>>>> And what I propose is still push design. It's still the
>decoder
>>> that
>>>>>>>>> creates a video context and pushes it forward. It just may not
>>> be
>>>>>>> aware
>>>>>>>>>
>>>>>>>>> it's using it at all.
>>>>>>>>>
>>>>>>>>> And as I experienced with this idea, for D3D11 it would make
>>> perfect
>>>>>>>>> sense to allow the decoder device be the creator of video
>>> context.
>>>>>>> They
>>>>>>>>>
>>>>>>>>> are highly related and one doesn't really exist without the
>>> other.
>>>>>>>>>
>>>>>>>>>> Le 9 août 2019 08:50:43 GMT+03:00, Steve Lhomme
>>> <robux4 at ycbcr.xyz>
>>>>>>> a
>>>>>>>>> écrit :
>>>>>>>>>>> On 2019-08-08 18:27, Rémi Denis-Courmont wrote:
>>>>>>>>>>>> Le torstaina 8. elokuuta 2019, 15.29.30 EEST Steve Lhomme a
>>> écrit
>>>>>>> :
>>>>>>>>>>>>> Any opinion ?
>>>>>>>>>>>>
>>>>>>>>>>>> I don't see why we should mess the architecture for a
>>>>>>>>>>> hardware-specific
>>>>>>>>>>>> implementation-specific unmaintained module.
>>>>>>>>>>>
>>>>>>>>>>> It's not unmaintained, I was planning to revive it to make
>>> sure
>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>> default player on Raspberry Pi remains VLC when we release
>>> 4.0. It
>>>>>>>>>>> seems
>>>>>>>>>>> there's a different implementation so I'll adapt that one.
>>>>>>>>>>>
>>>>>>>>>>> One reason for that is to make sure our new push
>architecture
>>> is
>>>>>>>>> sound
>>>>>>>>>>> and can adapt to many use cases. Supporting SoC
>architectures
>>>>>>> should
>>>>>>>>>>> still be possible with the new architecture. Allocating all
>>>>>>> buffers
>>>>>>>>>>> once
>>>>>>>>>>> in the display was making this easy and efficient (in terms
>of
>>>>>>> copy,
>>>>>>>>>>> not
>>>>>>>>>>> memory usage). We should aim for the same level of
>efficiency.
>>>>>>>>>>>
>>>>>>>>>>> Also let me remind you the VLC motto: "VLC plays everything
>>> and
>>>>>>> runs
>>>>>>>>>>> everywhere".
>>>>>>>>>>>
>>>>>>>>>>>> Even when the GPU uses the same RAM as the CPU, it
>typically
>>> uses
>>>>>>>>>>> different
>>>>>>>>>>>> pixel format, tile format and/or memory coherency protocol,
>>> or it
>>>>>>>>>>> might simply
>>>>>>>>>>>> not have a suitable IOMMU. As such, VLC cannot render
>>> directly in
>>>>>>>>> it.
>>>>>>>>>>>>
>>>>>>>>>>>> And if it could, then by definition, it implies that the
>>> decoder
>>>>>>>>> and
>>>>>>>>>>> filters can
>>>>>>>>>>>> allocate and *reference* picture buffers as they see fit,
>>>>>>>>> regardless
>>>>>>>>>>> of the
>>>>>>>>>>>> hardware. Which means the software on CPU side is doing the
>>>>>>>>>>> allocation. If so,
>>>>>>>>>>>> then there are no good technical reasons why push cannot
>work
>>> -
>>>>>>>>>>> misdesigning
>>>>>>>>>>>> the display plugin is not a good reason.
>>>>>>>>>>>
>>>>>>>>>>> I haven't proposed any design change to the display plugin,
>>> other
>>>>>>>>> than
>>>>>>>>>>> already discussed. What I proposed is a way to allocate CPU
>>>>>>> pictures
>>>>>>>>>> >from the GPU. My current solution involves creating a video
>>>>>>> context
>>>>>>>>>>> optionally when the decoder doesn't provide one.
>>>>>>>>>>>
>>>>>>>>>>> It could even be used on desktop. For example on Intel
>>> platform
>>>>>>> it's
>>>>>>>>>>> possible to do it without much performance penalty. I used
>to
>>> do
>>>>>>> it
>>>>>>>>> in
>>>>>>>>>>> D3D11 until I realized it sucked for separate GPU memory.
>But
>>> I
>>>>>>> had
>>>>>>>>> no
>>>>>>>>>>> way to know exactly the impact of the switch because the
>code
>>> was
>>>>>>>>> quite
>>>>>>>>>>>
>>>>>>>>>>> different. Now it might be possible to tell. I have a
>feeling
>>> on
>>>>>>>>> Intel
>>>>>>>>>>> it may actually be better to decode in "GPU" buffers
>directly.
>>> The
>>>>>>>>>>> driver can take shortcuts that we can't. It may do the copy
>>> more
>>>>>>>>>>> efficiently if it needs one (or maybe it doesn't need one).
>It
>>> can
>>>>>>>>> do
>>>>>>>>>>> the copy asynchronously (as every command sent to a
>>>>>>>>>>> ID3D11DeviceContext)
>>>>>>>>>>> as long as it's ready when it needs to be displayed.
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> vlc-devel mailing list
>>>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail.
>Veuillez
>>>>>>>>> excuser ma brièveté.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> vlc-devel mailing list
>>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> vlc-devel mailing list
>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>
>>>>>>>> --
>>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>>>>>> excuser ma brièveté.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> vlc-devel mailing list
>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> vlc-devel mailing list
>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>
>>>>>> --
>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>> excuser ma brièveté.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> vlc-devel mailing list
>>>>>> To unsubscribe or modify your subscription options:
>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>
>>>>> _______________________________________________
>>>>> vlc-devel mailing list
>>>>> To unsubscribe or modify your subscription options:
>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>> _______________________________________________
>>>> vlc-devel mailing list
>>>> To unsubscribe or modify your subscription options:
>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>
>>> _______________________________________________
>>> vlc-devel mailing list
>>> To unsubscribe or modify your subscription options:
>>> https://mailman.videolan.org/listinfo/vlc-devel
>> 
>> -- 
>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>excuser ma brièveté.
>> 
>> 
>> _______________________________________________
>> vlc-devel mailing list
>> To unsubscribe or modify your subscription options:
>> https://mailman.videolan.org/listinfo/vlc-devel
>> 
>_______________________________________________
>vlc-devel mailing list
>To unsubscribe or modify your subscription options:
>https://mailman.videolan.org/listinfo/vlc-devel

-- 
Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20190812/8540cc27/attachment.html>