[vlc-devel] [PATCH 3/8] picture_pool: use the internal condition/lock indirectly

Tue Jul 28 07:33:49 CEST 2020

On 2020-07-27 20:51, Rémi Denis-Courmont wrote:
> Le maanantaina 27. heinäkuuta 2020, 20.40.49 EEST Steve Lhomme a écrit :
>> I don't think I said there's no deadlock anymore.
> 
> You removed the code that prevented that class of deadlock. And I specifically
> asked if that simplification wouldn't expose the decoder to deadlocks...

I'm curious to know what commit you are referring to, because this 
problem is new because of the push design. It cannot be solved with the 
old code that was removed.

Neither 1fe351861a6681d3bf03c9ae08f76a600aa84de1 or 
6fc5e2bc5f33c6c2ffef13fe6f8408e1c5877070 would solve this.

>> In pull model it was easy. The pool belonged to the vout so it was not
>> tied to such lock. It was possible to cancel the pool regardless of who
>> is waiting. Now that pools are in many decoders (and in many forms) it
>> becomes tricky to do this. But we need a solution anyway.
> 
> You can't expect to be able to pass a "cancellable" to each and every place
> that might sleep. Experience has shown that this just doesn't work.

Experience is a personal thing. I have never experienced anyone trying 
this before.
My intuition tells me it's ugly, I agree. The "cancellable" objet can 
only be used by one pool (maybe it would need some attach/detach to 
ensure this). It could be recycled between pools of the same decoder.

> If a module has a private object upon which it sleeps, it is responsible for
> not locking itself up. For instance, access and access_demux modules are
> responsible for stopping any asynchronous or threaded activities as needed. A
> contrario, stream filter and demuxer modules rely on the interrupt framework -
> because the sleeping happens within the underlying stream that is given to
> them.

The picture pool isn't locked during picture_pool_Wait(). So it works as 
designed.

The problematic lock is `p_sys->lock` in avcodec/video.c. When the 
picture_pool_Wait() is called, that lock is held. I don't think there's 
a way around this. And that's why other solutions I tried don't work. To 
be able to access the pool that needs to be unlocked from the decoder 
thread, you need to have this lock as well. In particular to access p_va 
which is the object with a pool.

> If a decoder is waiting on a pool through the decoder owner from the video
> output, it makes sense, or at least it's hardly avoidable, that the decoder
> owner and the video output are partly responsible for unblocking. But if it's
> a private pool of the decoder, it is the decoder's responsibility to do the
> needful upon flush or destroy.

That's exactly what I'm trying to solve here. But as explained, there is 
a lock that prevents doing anything (on destroy in this case).

So the solution has to be something that doesn't require this lock. 
Either by using something stable provided by the decoder_t (my proposed 
solution), or something that the decoder/va would provide the decoder_t 
(one type or dynamically) to cancel waiting operations on close.

BTW va_surface (d3d11va, dxva2, vaapi) doesn't use a condition when 
waiting for buffers so I had to adapt so they also became cancelable.

For now the issue is with hardware decoders. We could have a picture 
pool exactly the size of the surface pool and only wait for pictures 
(and then attach the surface to it). If we have a picture available that 
means we have a matching surface available, no need for a blocking call.
The problem is to use the proper pool size for decoder_t->out_pool. It 
may even be dynamic. The pool size required could be passed via the 
format_update() callback.

That would mean, contrary to what we said, decoders cannot handle their 
own picture pool. They have to rely on the owner to provide the 
pictures. Right now only nvdec has its own picture pool, so it's not a 
deal breaker.