[vlc-devel] Software decoding in Hardware buffers

Tue Aug 13 11:03:00 CEST 2019

On 2019-08-12 14:19, Rémi Denis-Courmont wrote:
> Software decoding is not a rare case.

I agree, and even more on SoC which only support a few codec.

> Yes this breaks push completely for software decoding since it requires decoders and filters to pull buffers from decoder_NewPicture or filter_NewPicture.

Indeed there are some places left which use picture_NewFromFormat(), a 
lot of that for SPU regions. In codecs only VideoToolbox remains (but 
we'll see if it should use a video context or not). In filters only 
"edgedetection" and "fps" (can be easily fixed).

We're not doing push just to change the code but to solve issues, like 
the impossibility to use GPU filters in many cases. It's already not a 
pure push model where decoders just push whatever they want. The 
hardware decoders already depend on the decoder device from their 
"owner", same thing for GPU filters (that receive a NULL video context). 
We're already tied to the output. And whatever we output still need to 
be validated by the output and rejected if it cannot work.

Before a CPU decoder can output pictures it still needs to tell the 
output about it and this may fail and the decoder should not be used as 
such.

It doesn't seem too far fetched to handle the output allocations for the 
decoders/filters. I can already make the changes in 2 filters above in 
master. picture_NewFromFormat() could even be hidden in the core in the 
future.

> Le 12 août 2019 15:00:00 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a écrit :
>> On 2019-08-12 13:32, Rémi Denis-Courmont wrote:
>>> Hi,
>>>
>>> Your proposal does break push completely. It literally requires all
>> software decoder and filter to pull buffers from the device. The whole
>> point of push is intrinsically that the upstream decides how it
>> allocates its buffers: you cannot require decoders or filters to
>> allocate in any specific one way.
>>
>> No it's not all the time, only in rare cases. Here that's for MMAL
>> only,
>> but as I said it could be optionally enabled for D311 if there's
>> performance gains.
>>
>> It's only for decoders/filters that don't use a video context of their
>> own. So the way the picture planes are allocated doesn't matter to them
>>
>> as long as they are usable. It's transparent to them.
>>
>> In decoder_NewPicture() and filter_NewPicture() there are already
>> different cases depending if there's a video context to allocate the
>> pictures or not. It makes no difference there either. The only
>> difference is that they were added a video context without their
>> knowledge.
>>
>>> Le 12 août 2019 12:09:36 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a
>> écrit :
>>>> On 2019-08-12 10:15, Alexandre Janniaux wrote:
>>>>> Hi,
>>>>>
>>>>>> No, it is not my opinion. It is what was agreed collectively.
>> Unlike
>>>> your opinion, which engages only you.
>>>>>>
>>>>>> I am very fed up with people misconstruing earlier decisions as my
>>>> opinion. You can not have it both ways.
>>>>>
>>>>> Sorry, even if you are right, for my part I fail to agree being
>>>> included in
>>>>> this kind of message whereas what we decided hasn't been summarized
>>>> somewhere.
>>>>> You can't point design issue on non-existant design, so it doesn't
>>>> seem right
>>>>> to consider someone else work as being "misconstruing" the decision
>>>> we took.
>>>>>
>>>>> I took notes for myself and tried later to summarize them, after
>> the
>>>> first
>>>>> questions about what we decided were raised on this mailing list,
>> but
>>>> I don't
>>>>> have enough background to finish them without being biased by my
>> own
>>>> ideas
>>>>> on the implementation behind push model. Even the design and naming
>>>> itself has
>>>>> evolved since the second vout workshop. Maybe it would help if I
>>>> publish them
>>>>> as draft, with the scan of the notes, so that it can be completed
>>>> somewhere
>>>>> publicly ?
>>>>
>>>> IMO we should have a formal summary of what was decided during
>>>> workshops. It should go into details so there's no confusion.
>> Subjects
>>>> that are left to be decided should also be mentioned. It would also
>>>> give
>>>> an idea for people not in the workshop of where things are headed,
>> what
>>>>
>>>> is going to change.
>>>>
>>>> This is not the first time we disagree in directions after we had a
>>>> workshop. Implementing things always lead to some corner cases that
>> we
>>>> may have not seen during the meeting.
>>>>
>>>>> I don't see any issues with trying to include more cases, even if I
>>>> would
>>>>> prefer having the first layers first before starting to experiment
>>>> with
>>>>> the push architecture.
>>>>>
>>>>> Thank you for the constructing argument on push for the SoC case
>>>> though.
>>>>> IMHO they are good arguments to show that it should work on SoC and
>>>> that
>>>>> delaying some decoder full support with an extra copy is
>> acceptable.
>>>>>
>>>>> On Mon, Aug 12, 2019 at 08:57:25AM +0200, Steve Lhomme wrote:
>>>>>> On 2019-08-11 13:45, Rémi Denis-Courmont wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Indeed we did agree that some old and crappy API's may require
>>>> memory copying if they have no sane ways to allocate and reference
>>>> picture buffers, including but not limited to old OpenGL versions.
>>>>>
>>>>> OpenGL isn't really handling native resource management and
>>>> everything
>>>>> related to that but pixel buffer and CPU upload is made by the
>> layer
>>>>> below (EGL, GLX, ANGLE). It might not be the best example when it
>>>> comes
>>>>> to pictures. However, I agree that additional copies on edge cases
>>>> should
>>>>> not prevent us from shifting to a push model.
>>>>
>>>> Nothing is preventing the push model we designed. It's only a matter
>> of
>>>>
>>>> not losing performance because of it.
>>>>
>>>>>>
>>>>>> Yes we said that for OpenGL there were cases where some copies
>> could
>>>> be
>>>>>> needed. But there was already a copy from CPU to GPU in that case,
>>>> be it in
>>>>>> our code or in the driver. (we're talking about software decoding)
>>>> So it
>>>>>> doesn't really matter who does it.
>>>>>>
>>>>>> In the case of MMAL (or a SoC architecture in general) the case
>>>> wasn't
>>>>>> raised AFAIK. Here we introduce an extra copy that didn't exist
>>>> before.
>>>>>>
>>>>>> In the case of OpenGL that's on rare/old OpenGL implementations
>> that
>>>> we
>>>>>> handle the copy ourself. In the case of MMAL that's the default
>>>>>> implementation for everyone.
>>>>>>
>>>>>
>>>>> Maybe the optimization can be tackled after the design has been set
>>>> up on
>>>>> the whole architecture. I haven't checked but I believe it could be
>>>>> replaced (now or later) by the more or less standard GBM + v4l2
>> layer
>>>>> like on Linux instead of relying on mmal API, which would make the
>>>>> raspberry push-friendly.
>>>>
>>>>   From what I saw the MMAL vout is really a standalone one and not
>>>> related to that. I suppose MMAL decoding should also work in OpenGL
>> but
>>>>
>>>> that doesn't seem implemented. Maybe it is in John Cox's branch. It
>> may
>>>>
>>>> also be worth investivating this as if there's PBO then we're back
>> to
>>>> the regualar case where we don't need a copy. And that's likely the
>> way
>>>>
>>>> we want to go forward. If an inferior display module (current MMAL)
>> is
>>>> used then we can live with some drawback.
>>>>
>>>>> GBM is the graphics allocation layer used everywhere but NVidia
>>>> (which has
>>>>> it's own EGLStream which works more or less the same, for a
>> change).
>>>> You get
>>>>> an object which can be turned into a file descriptor, and be
>> imported
>>>> in
>>>>> either graphics API like vulkan or EGL, or windowing API like X11
>> or
>>>> Wayland.
>>>>> It's a very simple API so it's quite future-proof even if it will
>>>> eventually
>>>>> be replaced again.
>>>>>
>>>>> I don't know what's available for BSD-like system or other exotic
>>>> systems
>>>>> but I guess the first challenge would be GPU support even before
>>>> supporting
>>>>> the push model for most SBC available on the market, and as they
>> are
>>>> not
>>>>> officially supported for now, we might avoid considering them in
>> the
>>>> design
>>>>> as well.
>>>>>
>>>>> Greats,
>>>>> --
>>>>> Alexandre Janniaux
>>>>> VideoLabs
>>>>>
>>>>>>
>>>>>>> Le 9 août 2019 16:11:55 GMT+03:00, Steve Lhomme
>> <robux4 at ycbcr.xyz>
>>>> a écrit :
>>>>>>>> Did we agree that MMAL will get extra copies due to our design
>>>>>>>> decisions ?
>>>>>>>>
>>>>>>>> On 2019-08-09 13:55, Rémi Denis-Courmont wrote:
>>>>>>>>> No, it is not my opinion. It is what was agreed collectively.
>>>> Unlike
>>>>>>>> your opinion, which engages only you.
>>>>>>>>>
>>>>>>>>> I am very fed up with people misconstruing earlier decisions as
>>>> my
>>>>>>>> opinion. You can not have it both ways.
>>>>>>>>>
>>>>>>>>> Le 9 août 2019 12:57:53 GMT+03:00, Steve Lhomme
>>>> <robux4 at ycbcr.xyz> a
>>>>>>>> écrit :
>>>>>>>>>> On 2019-08-09 10:16, Rémi Denis-Courmont wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> The MMAL plugins are unmaintained. By definition, if the
>>>>>>>>>> implementation that actually sees users and updates is another
>>>> one,
>>>>>>>>>> then ours is unmaintained.
>>>>>>>>>>>
>>>>>>>>>>> And the point is that I don't want to change the core for a
>>>>>>>>>> misdesigned plugin. I have not seen any technically valid
>>>>>>>> justification
>>>>>>>>>> for adding yet another way to allocate pictures, nor how this
>>>> would
>>>>>>>>>> work.
>>>>>>>>>>>
>>>>>>>>>>> You cannot expect software decoders and filters to allocate
>>>>>>>> pictures
>>>>>>>>> >from decoder device or video context. That's complete denial
>> of
>>>>>>>>>> everything that was agreed upon, and reintroduces a whole lot
>> of
>>>>>>>>>> problems that push was supposed to fix.
>>>>>>>>>>
>>>>>>>>>> That's your opinion and I don't share it. We made some design
>>>>>>>> choices
>>>>>>>>>> but until they were implemented we had no idea if all use
>> cases
>>>> were
>>>>>>>>>> covered. And it turns out not all use cases are covered. With
>>>> MMAL
>>>>>>>> (be
>>>>>>>>>> it the old module or the new module) the design we have is not
>>>> good
>>>>>>>>>> enough. We are forcing copies where they didn't exist before.
>>>>>>>>>>
>>>>>>>>>> And what I propose is still push design. It's still the
>> decoder
>>>> that
>>>>>>>>>> creates a video context and pushes it forward. It just may not
>>>> be
>>>>>>>> aware
>>>>>>>>>>
>>>>>>>>>> it's using it at all.
>>>>>>>>>>
>>>>>>>>>> And as I experienced with this idea, for D3D11 it would make
>>>> perfect
>>>>>>>>>> sense to allow the decoder device be the creator of video
>>>> context.
>>>>>>>> They
>>>>>>>>>>
>>>>>>>>>> are highly related and one doesn't really exist without the
>>>> other.
>>>>>>>>>>
>>>>>>>>>>> Le 9 août 2019 08:50:43 GMT+03:00, Steve Lhomme
>>>> <robux4 at ycbcr.xyz>
>>>>>>>> a
>>>>>>>>>> écrit :
>>>>>>>>>>>> On 2019-08-08 18:27, Rémi Denis-Courmont wrote:
>>>>>>>>>>>>> Le torstaina 8. elokuuta 2019, 15.29.30 EEST Steve Lhomme a
>>>> écrit
>>>>>>>> :
>>>>>>>>>>>>>> Any opinion ?
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see why we should mess the architecture for a
>>>>>>>>>>>> hardware-specific
>>>>>>>>>>>>> implementation-specific unmaintained module.
>>>>>>>>>>>>
>>>>>>>>>>>> It's not unmaintained, I was planning to revive it to make
>>>> sure
>>>>>>>> that
>>>>>>>>>>>> the
>>>>>>>>>>>> default player on Raspberry Pi remains VLC when we release
>>>> 4.0. It
>>>>>>>>>>>> seems
>>>>>>>>>>>> there's a different implementation so I'll adapt that one.
>>>>>>>>>>>>
>>>>>>>>>>>> One reason for that is to make sure our new push
>> architecture
>>>> is
>>>>>>>>>> sound
>>>>>>>>>>>> and can adapt to many use cases. Supporting SoC
>> architectures
>>>>>>>> should
>>>>>>>>>>>> still be possible with the new architecture. Allocating all
>>>>>>>> buffers
>>>>>>>>>>>> once
>>>>>>>>>>>> in the display was making this easy and efficient (in terms
>> of
>>>>>>>> copy,
>>>>>>>>>>>> not
>>>>>>>>>>>> memory usage). We should aim for the same level of
>> efficiency.
>>>>>>>>>>>>
>>>>>>>>>>>> Also let me remind you the VLC motto: "VLC plays everything
>>>> and
>>>>>>>> runs
>>>>>>>>>>>> everywhere".
>>>>>>>>>>>>
>>>>>>>>>>>>> Even when the GPU uses the same RAM as the CPU, it
>> typically
>>>> uses
>>>>>>>>>>>> different
>>>>>>>>>>>>> pixel format, tile format and/or memory coherency protocol,
>>>> or it
>>>>>>>>>>>> might simply
>>>>>>>>>>>>> not have a suitable IOMMU. As such, VLC cannot render
>>>> directly in
>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> And if it could, then by definition, it implies that the
>>>> decoder
>>>>>>>>>> and
>>>>>>>>>>>> filters can
>>>>>>>>>>>>> allocate and *reference* picture buffers as they see fit,
>>>>>>>>>> regardless
>>>>>>>>>>>> of the
>>>>>>>>>>>>> hardware. Which means the software on CPU side is doing the
>>>>>>>>>>>> allocation. If so,
>>>>>>>>>>>>> then there are no good technical reasons why push cannot
>> work
>>>> -
>>>>>>>>>>>> misdesigning
>>>>>>>>>>>>> the display plugin is not a good reason.
>>>>>>>>>>>>
>>>>>>>>>>>> I haven't proposed any design change to the display plugin,
>>>> other
>>>>>>>>>> than
>>>>>>>>>>>> already discussed. What I proposed is a way to allocate CPU
>>>>>>>> pictures
>>>>>>>>>>> >from the GPU. My current solution involves creating a video
>>>>>>>> context
>>>>>>>>>>>> optionally when the decoder doesn't provide one.
>>>>>>>>>>>>
>>>>>>>>>>>> It could even be used on desktop. For example on Intel
>>>> platform
>>>>>>>> it's
>>>>>>>>>>>> possible to do it without much performance penalty. I used
>> to
>>>> do
>>>>>>>> it
>>>>>>>>>> in
>>>>>>>>>>>> D3D11 until I realized it sucked for separate GPU memory.
>> But
>>>> I
>>>>>>>> had
>>>>>>>>>> no
>>>>>>>>>>>> way to know exactly the impact of the switch because the
>> code
>>>> was
>>>>>>>>>> quite
>>>>>>>>>>>>
>>>>>>>>>>>> different. Now it might be possible to tell. I have a
>> feeling
>>>> on
>>>>>>>>>> Intel
>>>>>>>>>>>> it may actually be better to decode in "GPU" buffers
>> directly.
>>>> The
>>>>>>>>>>>> driver can take shortcuts that we can't. It may do the copy
>>>> more
>>>>>>>>>>>> efficiently if it needs one (or maybe it doesn't need one).
>> It
>>>> can
>>>>>>>>>> do
>>>>>>>>>>>> the copy asynchronously (as every command sent to a
>>>>>>>>>>>> ID3D11DeviceContext)
>>>>>>>>>>>> as long as it's ready when it needs to be displayed.
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> vlc-devel mailing list
>>>>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail.
>> Veuillez
>>>>>>>>>> excuser ma brièveté.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> vlc-devel mailing list
>>>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> vlc-devel mailing list
>>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>>>>>>> excuser ma brièveté.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> vlc-devel mailing list
>>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> vlc-devel mailing list
>>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>
>>>>>>> --
>>>>>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>>>> excuser ma brièveté.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> vlc-devel mailing list
>>>>>>> To unsubscribe or modify your subscription options:
>>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>>>
>>>>>> _______________________________________________
>>>>>> vlc-devel mailing list
>>>>>> To unsubscribe or modify your subscription options:
>>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>> _______________________________________________
>>>>> vlc-devel mailing list
>>>>> To unsubscribe or modify your subscription options:
>>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>>>
>>>> _______________________________________________
>>>> vlc-devel mailing list
>>>> To unsubscribe or modify your subscription options:
>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>>> -- 
>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>> excuser ma brièveté.
>>>
>>>
>>> _______________________________________________
>>> vlc-devel mailing list
>>> To unsubscribe or modify your subscription options:
>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>> _______________________________________________
>> vlc-devel mailing list
>> To unsubscribe or modify your subscription options:
>> https://mailman.videolan.org/listinfo/vlc-devel
> 
> -- 
> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
> 
> 
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel
>