[vlc-devel] [RFC 0/1] Let decoders decide over DPB size.

Mon Aug 26 12:03:09 CEST 2013

Am 26.08.2013 11:42, schrieb Martin Storsjö:
> On Fri, 23 Aug 2013, Rémi Denis-Courmont wrote:
>
>> Le vendredi 23 août 2013 17:38:45 Julian Scheel a écrit :
>>
>>> While working with the omx modules I ran into this problem on Tegra 2 as
>>> well as Raspberry Pi platforms, because both did not have enough
>>> memory to
>>> store 24 or more full 1080p frames in the GPU memory. But as they do not
>>> require the dpb to be stored in the picture pool, but deal with it
>>> internally it is in fact possible to remove the dpb_size form the
>>> picture
>>> pool and run with a much smaller picture pool without any issues.
>>
>> I believe that is a problem within the OMX decoder. It would seem to
>> perform indirect rendering. This is slow, and indeed wasteful of
>> memory space, especially on low-end systems.
>
> Yes, it does (more or less) indirect rendering. With how things are done
> right now, we copy data from the decoder-allocated buffer into an output
> buffer (from a picture pool or ideally from the vout directly), but
> Julian is trying to use the mode where we provide the buffer to write
> into, hopefully sparing at least one memcpy.
>
> As far as I understand the OMX specs, I'm not sure if you actually can
> do real full direct rendering where the external buffers are used as
> internal DPB at all.
>
>
> So on the Raspberry Pi, the codec internally allocates all the DPB
> buffers regardless of what buffer modes you use, and then there's not
> enough memory to allocate a full set of DPB buffers in the vout, and
> even if there was enough memory, you couldn't really use this as
> internal buffers for the codec as in avcodec.

I did some homework in the last days and came to the following results:
When calling AllocateBuffer on an OMX component which is not part of a 
tunnel the memory is allocated on CPU side. If the obtained buffer is 
then passed to a UseBuffer call of another component it is filled by a 
DMA transfer on OMX_FillThisBuffer. The other way round the buffer is 
transferred back to the GPU with OMX_EmptyThisBuffer.
So this indeed is no real direct rendering, but some DMA accelerated 
indirect rendering. The only way to get a full direct rendering pipeline 
in OMX seems to be using the OMX tunnels, which would not fit very well 
into the VLC module concept.

Regarding the DPB it's as Martin says. The DPB list is taken care of in 
the GPU memory by the OMX decoder and the buffers headed to the user 
through AllocateBuffer are not part of the DPB. So if the videocore 
forces the videooutput to allocate enough buffers to keep the full dpb, 
this will be a waste of memory and actually waste enough memory to let 
it fail in many scenarios (1080p h264 on RPi for example).

So my proposal would indeed be to shift the dpb size requirements into 
the decoder modules. I will try to walk through the existing decoders 
and extend my patch, so that all decoders, that might use the buffers 
for DPB, actually get them allocated.

I will post an updated patch then.

-Julian