[vlc-devel] Software decoding in Hardware buffers

Mon Aug 12 11:09:36 CEST 2019

On 2019-08-12 10:15, Alexandre Janniaux wrote:
> Hi,
> 
>> No, it is not my opinion. It is what was agreed collectively. Unlike your opinion, which engages only you.
>>
>> I am very fed up with people misconstruing earlier decisions as my opinion. You can not have it both ways.
> 
> Sorry, even if you are right, for my part I fail to agree being included in
> this kind of message whereas what we decided hasn't been summarized somewhere.
> You can't point design issue on non-existant design, so it doesn't seem right
> to consider someone else work as being "misconstruing" the decision we took.
> 
> I took notes for myself and tried later to summarize them, after the first
> questions about what we decided were raised on this mailing list, but I don't
> have enough background to finish them without being biased by my own ideas
> on the implementation behind push model. Even the design and naming itself has
> evolved since the second vout workshop. Maybe it would help if I publish them
> as draft, with the scan of the notes, so that it can be completed somewhere
> publicly ?

IMO we should have a formal summary of what was decided during 
workshops. It should go into details so there's no confusion. Subjects 
that are left to be decided should also be mentioned. It would also give 
an idea for people not in the workshop of where things are headed, what 
is going to change.

This is not the first time we disagree in directions after we had a 
workshop. Implementing things always lead to some corner cases that we 
may have not seen during the meeting.

> I don't see any issues with trying to include more cases, even if I would
> prefer having the first layers first before starting to experiment with
> the push architecture.
> 
> Thank you for the constructing argument on push for the SoC case though.
> IMHO they are good arguments to show that it should work on SoC and that
> delaying some decoder full support with an extra copy is acceptable.
> 
> On Mon, Aug 12, 2019 at 08:57:25AM +0200, Steve Lhomme wrote:
>> On 2019-08-11 13:45, Rémi Denis-Courmont wrote:
>>> Hi,
>>>
>>> Indeed we did agree that some old and crappy API's may require memory copying if they have no sane ways to allocate and reference picture buffers, including but not limited to old OpenGL versions.
> 
> OpenGL isn't really handling native resource management and everything
> related to that but pixel buffer and CPU upload is made by the layer
> below (EGL, GLX, ANGLE). It might not be the best example when it comes
> to pictures. However, I agree that additional copies on edge cases should
> not prevent us from shifting to a push model.

Nothing is preventing the push model we designed. It's only a matter of 
not losing performance because of it.

>>
>> Yes we said that for OpenGL there were cases where some copies could be
>> needed. But there was already a copy from CPU to GPU in that case, be it in
>> our code or in the driver. (we're talking about software decoding) So it
>> doesn't really matter who does it.
>>
>> In the case of MMAL (or a SoC architecture in general) the case wasn't
>> raised AFAIK. Here we introduce an extra copy that didn't exist before.
>>
>> In the case of OpenGL that's on rare/old OpenGL implementations that we
>> handle the copy ourself. In the case of MMAL that's the default
>> implementation for everyone.
>>
> 
> Maybe the optimization can be tackled after the design has been set up on
> the whole architecture. I haven't checked but I believe it could be
> replaced (now or later) by the more or less standard GBM + v4l2 layer
> like on Linux instead of relying on mmal API, which would make the
> raspberry push-friendly.

 From what I saw the MMAL vout is really a standalone one and not 
related to that. I suppose MMAL decoding should also work in OpenGL but 
that doesn't seem implemented. Maybe it is in John Cox's branch. It may 
also be worth investivating this as if there's PBO then we're back to 
the regualar case where we don't need a copy. And that's likely the way 
we want to go forward. If an inferior display module (current MMAL) is 
used then we can live with some drawback.

> GBM is the graphics allocation layer used everywhere but NVidia (which has
> it's own EGLStream which works more or less the same, for a change). You get
> an object which can be turned into a file descriptor, and be imported in
> either graphics API like vulkan or EGL, or windowing API like X11 or Wayland.
> It's a very simple API so it's quite future-proof even if it will eventually
> be replaced again.
> 
> I don't know what's available for BSD-like system or other exotic systems
> but I guess the first challenge would be GPU support even before supporting
> the push model for most SBC available on the market, and as they are not
> officially supported for now, we might avoid considering them in the design
> as well.
> 
> Greats,
> --
> Alexandre Janniaux
> VideoLabs
> 
>>
>>> Le 9 août 2019 16:11:55 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a écrit :
>>>> Did we agree that MMAL will get extra copies due to our design
>>>> decisions ?
>>>>
>>>> On 2019-08-09 13:55, Rémi Denis-Courmont wrote:
>>>>> No, it is not my opinion. It is what was agreed collectively. Unlike
>>>> your opinion, which engages only you.
>>>>>
>>>>> I am very fed up with people misconstruing earlier decisions as my
>>>> opinion. You can not have it both ways.
>>>>>
>>>>> Le 9 août 2019 12:57:53 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a
>>>> écrit :
>>>>>> On 2019-08-09 10:16, Rémi Denis-Courmont wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> The MMAL plugins are unmaintained. By definition, if the
>>>>>> implementation that actually sees users and updates is another one,
>>>>>> then ours is unmaintained.
>>>>>>>
>>>>>>> And the point is that I don't want to change the core for a
>>>>>> misdesigned plugin. I have not seen any technically valid
>>>> justification
>>>>>> for adding yet another way to allocate pictures, nor how this would
>>>>>> work.
>>>>>>>
>>>>>>> You cannot expect software decoders and filters to allocate
>>>> pictures
>>>>> >from decoder device or video context. That's complete denial of
>>>>>> everything that was agreed upon, and reintroduces a whole lot of
>>>>>> problems that push was supposed to fix.
>>>>>>
>>>>>> That's your opinion and I don't share it. We made some design
>>>> choices
>>>>>> but until they were implemented we had no idea if all use cases were
>>>>>> covered. And it turns out not all use cases are covered. With MMAL
>>>> (be
>>>>>> it the old module or the new module) the design we have is not good
>>>>>> enough. We are forcing copies where they didn't exist before.
>>>>>>
>>>>>> And what I propose is still push design. It's still the decoder that
>>>>>> creates a video context and pushes it forward. It just may not be
>>>> aware
>>>>>>
>>>>>> it's using it at all.
>>>>>>
>>>>>> And as I experienced with this idea, for D3D11 it would make perfect
>>>>>> sense to allow the decoder device be the creator of video context.
>>>> They
>>>>>>
>>>>>> are highly related and one doesn't really exist without the other.
>>>>>>
>>>>>>> Le 9 août 2019 08:50:43 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz>
>>>> a
>>>>>> écrit :
>>>>>>>> On 2019-08-08 18:27, Rémi Denis-Courmont wrote:
>>>>>>>>> Le torstaina 8. elokuuta 2019, 15.29.30 EEST Steve Lhomme a écrit
>>>> :
>>>>>>>>>> Any opinion ?
>>>>>>>>>
>>>>>>>>> I don't see why we should mess the architecture for a
>>>>>>>> hardware-specific
>>>>>>>>> implementation-specific unmaintained module.
>>>>>>>>
>>>>>>>> It's not unmaintained, I was planning to revive it to make sure
>>>> that
>>>>>>>> the
>>>>>>>> default player on Raspberry Pi remains VLC when we release 4.0. It
>>>>>>>> seems
>>>>>>>> there's a different implementation so I'll adapt that one.
>>>>>>>>
>>>>>>>> One reason for that is to make sure our new push architecture is
>>>>>> sound
>>>>>>>> and can adapt to many use cases. Supporting SoC architectures
>>>> should
>>>>>>>> still be possible with the new architecture. Allocating all
>>>> buffers
>>>>>>>> once
>>>>>>>> in the display was making this easy and efficient (in terms of
>>>> copy,
>>>>>>>> not
>>>>>>>> memory usage). We should aim for the same level of efficiency.
>>>>>>>>
>>>>>>>> Also let me remind you the VLC motto: "VLC plays everything and
>>>> runs
>>>>>>>> everywhere".
>>>>>>>>
>>>>>>>>> Even when the GPU uses the same RAM as the CPU, it typically uses
>>>>>>>> different
>>>>>>>>> pixel format, tile format and/or memory coherency protocol, or it
>>>>>>>> might simply
>>>>>>>>> not have a suitable IOMMU. As such, VLC cannot render directly in
>>>>>> it.
>>>>>>>>>
>>>>>>>>> And if it could, then by definition, it implies that the decoder
>>>>>> and
>>>>>>>> filters can
>>>>>>>>> allocate and *reference* picture buffers as they see fit,
>>>>>> regardless
>>>>>>>> of the
>>>>>>>>> hardware. Which means the software on CPU side is doing the
>>>>>>>> allocation. If so,
>>>>>>>>> then there are no good technical reasons why push cannot work -
>>>>>>>> misdesigning
>>>>>>>>> the display plugin is not a good reason.
>>>>>>>>
>>>>>>>> I haven't proposed any design change to the display plugin, other
>>>>>> than
>>>>>>>> already discussed. What I proposed is a way to allocate CPU
>>>> pictures
>>>>>>> >from the GPU. My current solution involves creating a video
>>>> context
>>>>>>>> optionally when the decoder doesn't provide one.
>>>>>>>>
>>>>>>>> It could even be used on desktop. For example on Intel platform
>>>> it's
>>>>>>>> possible to do it without much performance penalty. I used to do
>>>> it
>>>>>> in
>>>>>>>> D3D11 until I realized it sucked for separate GPU memory. But I
>>>> had
>>>>>> no
>>>>>>>> way to know exactly the impact of the switch because the code was
>>>>>> quite
>>>>>>>>
>>>>>>>> different. Now it might be possible to tell. I have a feeling on
>>>>>> Intel
>>>>>>>> it may actually be better to decode in "GPU" buffers directly. The
>>>>>>>> driver can take shortcuts that we can't. It may do the copy more
>>>>>>>> efficiently if it needs one (or maybe it doesn't need one). It can
>>>>>> do
>>>>>>>> the copy asynchronously (as every command sent to a
>>>>>>>> ID3D11DeviceContext)
>>>>>>>> as long as it's ready when it needs to be displayed.
>>>>>>>> _______________________________________________
>>>>>>>> vlc-devel mailing list
>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>
>>>>>>> --
>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>>>>> excuser ma brièveté.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> vlc-devel mailing list
>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> vlc-devel mailing list
>>>>>> To unsubscribe or modify your subscription options:
>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>
>>>>> --
>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>>> excuser ma brièveté.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> vlc-devel mailing list
>>>>> To unsubscribe or modify your subscription options:
>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>
>>>> _______________________________________________
>>>> vlc-devel mailing list
>>>> To unsubscribe or modify your subscription options:
>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>>> --
>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
>>>
>>>
>>> _______________________________________________
>>> vlc-devel mailing list
>>> To unsubscribe or modify your subscription options:
>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>> _______________________________________________
>> vlc-devel mailing list
>> To unsubscribe or modify your subscription options:
>> https://mailman.videolan.org/listinfo/vlc-devel
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel
>