[vlc-devel] Software decoding in Hardware buffers
robux4 at ycbcr.xyz
Thu Aug 8 17:26:08 CEST 2019
On 2019-08-08 16:10, John Cox wrote:
>> On 2019-08-08 14:55, John Cox wrote:
>>>> I'm looking at the display pool in the MMAL (Raspberry Pi) code and it
>>>> seems that we currently decode in "hardware" buffers all the time.
>>>> Either the opaque decoder output, or when the decoder outputs I420.
>>>> In push we don't to use this pool anymore. The decoder will have its own
>>>> pool and the display just deals with what it receives. In most cases
>>>> that means copy from CPU memory to GPU memory. This doesn't work with a
>>>> SoC like on the Raspberry Pi where the memory is the same and can be
>>>> used directly from both sides.
>>>> The idea was that current decoders continue to use decoder_NewPicture()
>>>> as they used to. The pictures will come from the decoder video context,
>>>> if there's one (hardware decoding) or from picture_NewFromFormat() if
>>>> there's none. That means for MMAL we would need to copy this CPU
>>>> allocated memory to the "port allocated" memory (the mechanism to get
>>>> buffers from the display). Given the limited resources that's something
>>>> we should avoid.
>>>> I think we should have a third way to provide pictures: from the decoder
>>>> device. In case of software decoding there is no video context, but
>>>> there is a decoder device (A MMAL one in this case).
>>>> So I suggests the decoders (and filters) get their output picture from:
>>>> - the video context if there is one
>>>> - the decoder device if there is one and it has an allocator
>>>> - picture_NewFromFormat() otherwise
>>>> Any opinion ?
>>> I'm sure I should have an opinion (though I'm not quite sure what it is)
>>> as I've get a substantial rewrite of the entire mmal/Pi modules here,
>>> which I intended to upstream when it was a bit closer to finished (it is
>>> currently in use as the default shipped with Pis but there is still work
>>> to be done before I want it set in anything like stone).
>> Oh ! Then ours is never used unless people build it from scratch
>> themselves ? If it's a complete rewrite I guess we can ditch the old code ?
> Yes, you can ditch the old code. My code replaces it. It is based, in
> part, on the old code but it is substantially new (by now).
Is it available on github or a gitlab ? So I can comment before you can
submit or if I have comments on how to do things.
>> In any case since that's the one currently used by most people it would
>> be good to merge even if not perfect. The one we have in 4.0 won't even
>> compile as it is given all the changes in 4.0.
>>> There are a number of buffer types that I now pass around - all
>>> currently declared as h/w though some have a plausible existance in CPU
>>> memory (though not all). All end up as having their actual allocation
>>> done at source (decoder or filter) though the picture_t is allocated by
>>> the "display"
>> That's how it's supposed to work in 3.0. In 4.0 things will be radically
>> different. MMAL is the last part I'm looking at for this redesign. It's
>> a good example of SoC use of VLC, compared to the other more desktop
>> with GPU oriented display/decoders.
>> Since you seem to know a bit more about MMAL than me, is it possible to
>> allocate memory in heap and then wrap it inside a MMAL_BUFFER_HEADER_T ?
>> It doesn't look like from what I see in mmal_buffer.h . If not we
>> have the problem described in my original post (to avoid a copy we
>> software decoders to be able to use this hardware buffers directly).
> As with many things MMAL the answer is both yes & no. The easy way of
> writing the code is to let MMAL deal with the allocs - but if you like
> pain you can do it all yourself (I appear to like pain a lot).
> You can have ARM-side allocated buffers, but if you do then they will be
> copied into contiguous "GPU-side" buffers by MMAL before use by the
> hardware so you don't save anything. Also note that I420 buffers
> require to be in a single contiguous chunk rather than 3 separate
It's OK, that's already how we allocate planes for all software
decoders. Intel has the same requirement as well for their decoders.
> It is possible to allocate buffers that can be used by the ARM & the GPU
> without copy and I'm working on making that work with VLCs avcodec right
> now (needed for Pi4 H265 decode), though at this precise point in time I
> have issues with getting an ARM-cachable lump of memory, which somewhat
> impacts decode speed :-(
If we can make this work it would be great.
We still have the issue for other SoC that may not allow to do this. But
for now they would be unsupported.
> For my info (I haven't looked at 4.x yet) how do you deal with:
> decoder -> filter -> filter -> display in the new world? In particular
> what happens if the filter wants a "h/w" buffer as input (MMAL has
> deinterlace and rescale filters that want this)?
Each stage allocates its own output pictures. I may be through a
specific video context it creates or just generic code that does
picture_NewFromFormat like now. The display doesn't have a picture
allocator anymore, it just tells what picture format (and optionally
video context) it wants and the core matches that. (later we may even
recreate the display module when something changes on the input)
For MMAL one different thing is that the pictures may come from
different MMAL pools (one per each output) as that's how buffers are
allocated. It may be possible to add some more pictures in the decoder
pool (the biggest and the first to be created) and share it with the
filters. But in the end you allocate more even if you're not going to
use it. That's not a nice thing to do on a Raspberry Pi. So hopefully
buffers from different pools can still be used as if they came from the
>> vlc-devel mailing list
>> To unsubscribe or modify your subscription options:
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
More information about the vlc-devel