[vlc-devel] [PATCH 3/8] picture_pool: use the internal condition/lock indirectly

Steve Lhomme robux4 at ycbcr.xyz
Wed Jul 29 07:48:48 CEST 2020


On 2020-07-28 17:02, Rémi Denis-Courmont wrote:
> Le tiistaina 28. heinäkuuta 2020, 8.33.49 EEST Steve Lhomme a écrit :
>>>> In pull model it was easy. The pool belonged to the vout so it was not
>>>> tied to such lock. It was possible to cancel the pool regardless of who
>>>> is waiting. Now that pools are in many decoders (and in many forms) it
>>>> becomes tricky to do this. But we need a solution anyway.
>>>
>>> You can't expect to be able to pass a "cancellable" to each and every
>>> place
>>> that might sleep. Experience has shown that this just doesn't work.
>>
>> Experience is a personal thing. I have never experienced anyone trying
>> this before.
>> My intuition tells me it's ugly, I agree. The "cancellable" objet can
>> only be used by one pool (maybe it would need some attach/detach to
>> ensure this). It could be recycled between pools of the same decoder.
> 
> I am referring to the project's interrupt framework, what led to that and what
> was there before. Specifically, you can't impose a specific mean of waiting, nor
> a specific instance of it. And once you fix that, you end up with exactly the
> same thing as the interrupt framework, including with all its intrinsic
> problems. In particular, you can only really interrupt for final stop/destroy,
> even though there are typically other wait scenarii that would need to be
> interruptible.
> 
> Besides, the picture pool used to be a mandatory interface between the core
> and the video output. But nowadays, it's just a convenience for allocating and
> recycling main RAM picture buffers. So if it does not suit a given use case,
> then there are no reasons to use it. And indeed, opaque decoder surfaces have
> always been managed by ad-hoc means, since the picture pool cannot allocate
> them. It makes even less sense to use the picture pool for that now than
> before.

I agree. The solution I propose would put the picture pool and its wait 
mechanism in the decoder owner. The decoder would never have to know 
about it nor wait for available buffers.

>>> If a module has a private object upon which it sleeps, it is responsible
>>> for not locking itself up. For instance, access and access_demux modules
>>> are responsible for stopping any asynchronous or threaded activities as
>>> needed. A contrario, stream filter and demuxer modules rely on the
>>> interrupt framework - because the sleeping happens within the underlying
>>> stream that is given to them.
>>
>> The picture pool isn't locked during picture_pool_Wait(). So it works as
>> designed.
>>
>> The problematic lock is `p_sys->lock` in avcodec/video.c.
> 
> In other words, it's a bug in the avcodec plugin. The decoder core and the
> pool helpers have nothing to do with it.

I don't think it's a bug but a design problem. And I can't think of an 
internal redesign that could solve this.

My understanding is that get_format cannot be used concurrently in lavc 
so the p_va cannot change in multiple threads at the same time (nor 
during a get_frame, the format of new frame shouldn't have changes that 
require a new VA). So between decoder thread(s) if there's one waiting 
for a p_va surface, there shouldn't be other threads (including the 
DecoderThread) trying to change the p_va at the same time.

The problem is adding the input thread in the mix. To tell the p_va it 
needs to stop waiting it needs to lock it, because p_va may be changing 
at the time of the call (added or removed during a get_format).

Maybe it could be possible to tell the decoder to stop waiting via 
DecoderThread which may not need a (conflicting) lock to do that. But it 
doesn't seem correct in the case of a single thread decoder 
(--avcodec-threads=1) which is waiting in the actual thread we want to stop.

Again, I don't see a solution around this. If the "object" to cancel is 
not directly accessible by the DecoderThread, in other threads, we can't 
safely cancel the wait.

>> When the picture_pool_Wait() is called, that lock is held. I don't think
>> there's a way around this.
> 
> Well the obvious way around is not tot use picture_pool_Wait() or even
> picture_pool_t if it does not suit the given use case.

va_surface doesn't use picture_pool and yet it waits. mediacodec and 
VDPAU also have wait loops not using picture pools. All of these are not 
cancelable.

> And a decoder cannot block the ES output or the video output on pause anyway.
> It's not only about pause then stop; it's pause then anything (other than
> resume).

What is the decoder supposed to do on pause ?

It seems pause+seek still works. And closing after that, still in pause, 
also works. So there might just be a bug in the order we do things on 
Stop/Close that creates the deadlock.

> (...)
>> That would mean, contrary to what we said, decoders cannot handle their
>> own picture pool.
> 
> I don't think we meant that decoders must use picture_pool_t. picture_pool_t
> is potentially convenient for software decoders. It never was convenient and
> probably never will be for hardware decoders whereby it's just a glorified
> free-list. You could just as well use a vlc_queue_t or a list and a lock to
> implement the picture pool as a *concept*.
> 
> If there was a problem with locking in the decoder core, then we'd have a
> problem. But this looks like a problem within avcodec here.

For decoders using a single thread (I suppose that includes mediacodec) 
a callback called from the input thread may be sufficient to tell it to 
stop waiting. For decoders using multiple threads it can lead to a race 
condition.


More information about the vlc-devel mailing list