[vlc-devel] Software decoding in Hardware buffers

John Cox jc at kynesim.co.uk
Thu Aug 8 16:10:40 CEST 2019


>On 2019-08-08 14:55, John Cox wrote:
>> Hi
>>> Hi,
>>> I'm looking at the display pool in the MMAL (Raspberry Pi) code and it
>>> seems that we currently decode in "hardware" buffers all the time.
>>> Either the opaque decoder output, or when the decoder outputs I420.
>>> In push we don't to use this pool anymore. The decoder will have its own
>>> pool and the display just deals with what it receives. In most cases
>>> that means copy from CPU memory to GPU memory. This doesn't work with a
>>> SoC like on the Raspberry Pi where the memory is the same and can be
>>> used directly from both sides.
>>> The idea was that current decoders continue to use decoder_NewPicture()
>>> as they used to. The pictures will come from the decoder video context,
>>> if there's one (hardware decoding) or from picture_NewFromFormat() if
>>> there's none. That means for MMAL we would need to copy this CPU
>>> allocated memory to the "port allocated" memory (the mechanism to get
>>> buffers from the display). Given the limited resources that's something
>>> we should avoid.
>>> I think we should have a third way to provide pictures: from the decoder
>>> device. In case of software decoding there is no video context, but
>>> there is a decoder device (A MMAL one in this case).
>>> So I suggests the decoders (and filters) get their output picture from:
>>> - the video context if there is one
>>> - the decoder device if there is one and it has an allocator
>>> - picture_NewFromFormat() otherwise
>>> Any opinion ?
>> I'm sure I should have an opinion (though I'm not quite sure what it is)
>> as I've get a substantial rewrite of the entire mmal/Pi modules here,
>> which I intended to upstream when it was a bit closer to finished (it is
>> currently in use as the default shipped with Pis but there is still work
>> to be done before I want it set in anything like stone).
>Oh ! Then ours is never used unless people build it from scratch 
>themselves ? If it's a complete rewrite I guess we can ditch the old code ?

Yes, you can ditch the old code.  My code replaces it.  It is based, in
part, on the old code but it is substantially new (by now).

>In any case since that's the one currently used by most people it would 
>be good to merge even if not perfect. The one we have in 4.0 won't even 
>compile as it is given all the changes in 4.0.
>> There are a number of buffer types that I now pass around - all
>> currently declared as h/w though some have a plausible existance in CPU
>> memory (though not all).  All end up as having their actual allocation
>> done at source (decoder or filter) though the picture_t is allocated by
>> the "display"
>That's how it's supposed to work in 3.0. In 4.0 things will be radically 
>different. MMAL is the last part I'm looking at for this redesign. It's 
>a good example of SoC use of VLC, compared to the other more desktop 
>with GPU oriented display/decoders.
>Since you seem to know a bit more about MMAL than me, is it possible to 
>allocate memory in heap and then wrap it inside a MMAL_BUFFER_HEADER_T ? 
>It doesn't look like from what I see in mmal_buffer.h [1]. If not we 
>have the problem described in my original post (to avoid a copy we 
>software decoders to be able to use this hardware buffers directly).

As with many things MMAL the answer is both yes & no.  The easy way of
writing the code is to let MMAL deal with the allocs - but if you like
pain you can do it all yourself (I appear to like pain a lot).

You can have ARM-side allocated buffers, but if you do then they will be
copied into contiguous "GPU-side" buffers by MMAL before use by the
hardware so you don't save anything.  Also note that I420 buffers
require to be in a single contiguous chunk rather than 3 separate

It is possible to allocate buffers that can be used by the ARM & the GPU
without copy and I'm working on making that work with VLCs avcodec right
now (needed for Pi4 H265 decode), though at this precise point in time I
have issues with getting an ARM-cachable lump of memory, which somewhat
impacts decode speed :-(

For my info (I haven't looked at 4.x yet) how do you deal with:

decoder -> filter -> filter -> display in the new world?   In particular
what happens if the filter wants a "h/w" buffer as input (MMAL has
deinterlace and rescale filters that want this)?



>vlc-devel mailing list
>To unsubscribe or modify your subscription options:

More information about the vlc-devel mailing list