[vlc-devel] Software decoding in Hardware buffers

Fri Aug 9 15:11:55 CEST 2019

Did we agree that MMAL will get extra copies due to our design decisions ?

On 2019-08-09 13:55, Rémi Denis-Courmont wrote:
> No, it is not my opinion. It is what was agreed collectively. Unlike your opinion, which engages only you.
> 
> I am very fed up with people misconstruing earlier decisions as my opinion. You can not have it both ways.
> 
> Le 9 août 2019 12:57:53 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a écrit :
>> On 2019-08-09 10:16, Rémi Denis-Courmont wrote:
>>> Hi,
>>>
>>> The MMAL plugins are unmaintained. By definition, if the
>> implementation that actually sees users and updates is another one,
>> then ours is unmaintained.
>>>
>>> And the point is that I don't want to change the core for a
>> misdesigned plugin. I have not seen any technically valid justification
>> for adding yet another way to allocate pictures, nor how this would
>> work.
>>>
>>> You cannot expect software decoders and filters to allocate pictures
>>from decoder device or video context. That's complete denial of
>> everything that was agreed upon, and reintroduces a whole lot of
>> problems that push was supposed to fix.
>>
>> That's your opinion and I don't share it. We made some design choices
>> but until they were implemented we had no idea if all use cases were
>> covered. And it turns out not all use cases are covered. With MMAL (be
>> it the old module or the new module) the design we have is not good
>> enough. We are forcing copies where they didn't exist before.
>>
>> And what I propose is still push design. It's still the decoder that
>> creates a video context and pushes it forward. It just may not be aware
>>
>> it's using it at all.
>>
>> And as I experienced with this idea, for D3D11 it would make perfect
>> sense to allow the decoder device be the creator of video context. They
>>
>> are highly related and one doesn't really exist without the other.
>>
>>> Le 9 août 2019 08:50:43 GMT+03:00, Steve Lhomme <robux4 at ycbcr.xyz> a
>> écrit :
>>>> On 2019-08-08 18:27, Rémi Denis-Courmont wrote:
>>>>> Le torstaina 8. elokuuta 2019, 15.29.30 EEST Steve Lhomme a écrit :
>>>>>> Any opinion ?
>>>>>
>>>>> I don't see why we should mess the architecture for a
>>>> hardware-specific
>>>>> implementation-specific unmaintained module.
>>>>
>>>> It's not unmaintained, I was planning to revive it to make sure that
>>>> the
>>>> default player on Raspberry Pi remains VLC when we release 4.0. It
>>>> seems
>>>> there's a different implementation so I'll adapt that one.
>>>>
>>>> One reason for that is to make sure our new push architecture is
>> sound
>>>> and can adapt to many use cases. Supporting SoC architectures should
>>>> still be possible with the new architecture. Allocating all buffers
>>>> once
>>>> in the display was making this easy and efficient (in terms of copy,
>>>> not
>>>> memory usage). We should aim for the same level of efficiency.
>>>>
>>>> Also let me remind you the VLC motto: "VLC plays everything and runs
>>>> everywhere".
>>>>
>>>>> Even when the GPU uses the same RAM as the CPU, it typically uses
>>>> different
>>>>> pixel format, tile format and/or memory coherency protocol, or it
>>>> might simply
>>>>> not have a suitable IOMMU. As such, VLC cannot render directly in
>> it.
>>>>>
>>>>> And if it could, then by definition, it implies that the decoder
>> and
>>>> filters can
>>>>> allocate and *reference* picture buffers as they see fit,
>> regardless
>>>> of the
>>>>> hardware. Which means the software on CPU side is doing the
>>>> allocation. If so,
>>>>> then there are no good technical reasons why push cannot work -
>>>> misdesigning
>>>>> the display plugin is not a good reason.
>>>>
>>>> I haven't proposed any design change to the display plugin, other
>> than
>>>> already discussed. What I proposed is a way to allocate CPU pictures
>>> >from the GPU. My current solution involves creating a video context 
>>>> optionally when the decoder doesn't provide one.
>>>>
>>>> It could even be used on desktop. For example on Intel platform it's
>>>> possible to do it without much performance penalty. I used to do it
>> in
>>>> D3D11 until I realized it sucked for separate GPU memory. But I had
>> no
>>>> way to know exactly the impact of the switch because the code was
>> quite
>>>>
>>>> different. Now it might be possible to tell. I have a feeling on
>> Intel
>>>> it may actually be better to decode in "GPU" buffers directly. The
>>>> driver can take shortcuts that we can't. It may do the copy more
>>>> efficiently if it needs one (or maybe it doesn't need one). It can
>> do
>>>> the copy asynchronously (as every command sent to a
>>>> ID3D11DeviceContext)
>>>> as long as it's ready when it needs to be displayed.
>>>> _______________________________________________
>>>> vlc-devel mailing list
>>>> To unsubscribe or modify your subscription options:
>>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>>> -- 
>>> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez
>> excuser ma brièveté.
>>>
>>>
>>> _______________________________________________
>>> vlc-devel mailing list
>>> To unsubscribe or modify your subscription options:
>>> https://mailman.videolan.org/listinfo/vlc-devel
>>>
>> _______________________________________________
>> vlc-devel mailing list
>> To unsubscribe or modify your subscription options:
>> https://mailman.videolan.org/listinfo/vlc-devel
> 
> -- 
> Envoyé de mon appareil Android avec Courriel K-9 Mail. Veuillez excuser ma brièveté.
> 
> 
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel
>