[vlc-devel] Software decoding in Hardware buffers
Steve Lhomme
robux4 at ycbcr.xyz
Fri Aug 9 07:50:43 CEST 2019
On 2019-08-08 18:27, Rémi Denis-Courmont wrote:
> Le torstaina 8. elokuuta 2019, 15.29.30 EEST Steve Lhomme a écrit :
>> Any opinion ?
>
> I don't see why we should mess the architecture for a hardware-specific
> implementation-specific unmaintained module.
It's not unmaintained, I was planning to revive it to make sure that the
default player on Raspberry Pi remains VLC when we release 4.0. It seems
there's a different implementation so I'll adapt that one.
One reason for that is to make sure our new push architecture is sound
and can adapt to many use cases. Supporting SoC architectures should
still be possible with the new architecture. Allocating all buffers once
in the display was making this easy and efficient (in terms of copy, not
memory usage). We should aim for the same level of efficiency.
Also let me remind you the VLC motto: "VLC plays everything and runs
everywhere".
> Even when the GPU uses the same RAM as the CPU, it typically uses different
> pixel format, tile format and/or memory coherency protocol, or it might simply
> not have a suitable IOMMU. As such, VLC cannot render directly in it.
>
> And if it could, then by definition, it implies that the decoder and filters can
> allocate and *reference* picture buffers as they see fit, regardless of the
> hardware. Which means the software on CPU side is doing the allocation. If so,
> then there are no good technical reasons why push cannot work - misdesigning
> the display plugin is not a good reason.
I haven't proposed any design change to the display plugin, other than
already discussed. What I proposed is a way to allocate CPU pictures
from the GPU. My current solution involves creating a video context
optionally when the decoder doesn't provide one.
It could even be used on desktop. For example on Intel platform it's
possible to do it without much performance penalty. I used to do it in
D3D11 until I realized it sucked for separate GPU memory. But I had no
way to know exactly the impact of the switch because the code was quite
different. Now it might be possible to tell. I have a feeling on Intel
it may actually be better to decode in "GPU" buffers directly. The
driver can take shortcuts that we can't. It may do the copy more
efficiently if it needs one (or maybe it doesn't need one). It can do
the copy asynchronously (as every command sent to a ID3D11DeviceContext)
as long as it's ready when it needs to be displayed.
More information about the vlc-devel
mailing list