[vlc-devel] The big one : Frame threading regressions

Rémi Denis-Courmont remi at remlab.net
Tue Sep 17 17:56:01 CEST 2019


Le sunnuntaina 15. syyskuuta 2019, 22.17.52 EEST Francois Cartegnie a écrit :
> Hi,
> 
> We have lots of users complaining for too long about regressions with
> VLC, and usually we can't pinpoint that regression and usually blame
> hardware decoding.
> 
> We already know low frame rate is an issue in vlc. But It got worse.
> 
> The unpleasant truth is my shiny new 6 cores 12 threads system is
> unable to display raw 30 fps hevc video in software, when my old 2 cores
> one did. All pictures are *late* by a fixed amount.
> 
> Let's start describing few things.
> -----------------------------
> 
> The Core:
> - The picture display date is computed by adding the buffering delay and
> the pts delay, and the pcr delay to the to system converted timestamp.
> - If that date exceeds the current system time, this is considered as a
> late picture, and not displayed.
> - The pcr delay is "extended" by detecting the delay on first timestamp
> conversion (which is decoder output).
> - We can only extend delays. (otherwise we need to implement temporary
> rate change)
> 
> The Packetizer delay:
> - There's an unavoidable delay when we need to wait for the next picture
> to know the limit of the current access unit.
> 
> The Decoder delay:
> - We usually have a push/pull model. Today it is asynchronous.
> - When we pull, it is always triggered by the next incoming block.
> - The next incoming block might also be paced by the clock, or the
> stream itself
> 
> All those delays were compensated by the >= 300ms delay we set as
> pts-delay which is also buffering and the pcr delay extension done by
> the core.
> 
> 
> In my 30fps hevc case
> --------------------
> 
> The number of avcodec threads creates a global frame output delay which
> depends directly of the number of threads (and of course of the GOP
> references between the frames). With 10 threads, the 25fps video is now
> unable to playback, the total output delay being > 300ms.
> Why does pcr delay extension fails ? First (IFrame) picture
> outputs faster (no refs).
> 
> 
> Why was it working before ?
> -------------------------
> 
> In 2.x and 3.0 we gradually introduced changes:
> 
> * We changed the synchronous avcodec decoder push/pullwait model to an
> asynchronous push/pull.
> Potentially we increased the delay when the source is paced. This is
> also the case depending on fps and PTS<->PCR delay.
> If you have an audio stream, you also have a race with the first
> converted timestamp, which then is guaranteed to set up a lowest, too
> small, extended delay.
> 
> * We enabled threading in avcodec for H264: Frame Threading, which
> creates delay (this was documented !).
> Slice threading is also available and has less delay overhead but we did
> not enable it because this is not hardware decoding friendly.
> 
> 
> Low delay considerations (specific case)
> -----------------------
> The opposite case of the described problem is when you try to do low
> delay. (Your GOP is usually intra in that case).
> Your first picture will always output later than every other picture,
> because of the decoder startup time.
> 
> 
> So what ?
> ---------
> 
> There's few ways I can think about to fix the main issue
> - Have a way to report decoder delay.. but that would mean no playback
> until data decodes (what if only bogus data ? mutiple decs ?..).

VLC has been doing that for years. It waits for the first data of the decoder. 
Of course, that breaks in a number of corner cases, not the least of which is 
asynchronous/threaded decoding.

We have to live with it until 5.0 buffer rework. But I don't see how this 
solves your problem, TBH.

> - Bump default pts-delay for now.

That's "easy" now that there's only 4 PTS delay settings instead of one per 
access. But AFAIK, that's really meant as a kludge for input jitter. AFAIU, 
increasing PTS delay will only make your problem slightly less likely; I doubt 
that we can find an acceptable tradeoff here.

> - Adapt pts-delay based on number of threads.

I don't know. Is it linear? How do you compute the correct value?

> - Rewrite the core to be able to add delay without rebuffering. I don't
> see how that's doable: that's similar to implementing delay reduction.
> - Kill frame threading based on a number of threads and fps.

Hypothetically, what about implementing hardware slice decoders natively, like 
the proposed NVDEC plugin, and only use libavcodec for software decoding?

-- 
レミ・デニ-クールモン
http://www.remlab.net/





More information about the vlc-devel mailing list