[vlc-devel] The big one : Frame threading regressions

Steve Lhomme robux4 at ycbcr.xyz
Mon Sep 16 08:42:14 CEST 2019


On 2019-09-15 21:17, Francois Cartegnie wrote:
> Hi,
> 
> We have lots of users complaining for too long about regressions with
> VLC, and usually we can't pinpoint that regression and usually blame
> hardware decoding.
> 
> We already know low frame rate is an issue in vlc. But It got worse.
> 
> The unpleasant truth is my shiny new 6 cores 12 threads system is
> unable to display raw 30 fps hevc video in software, when my old 2 cores
> one did. All pictures are *late* by a fixed amount.
> 
> Let's start describing few things.
> -----------------------------
> 
> The Core:
> - The picture display date is computed by adding the buffering delay and
> the pts delay, and the pcr delay to the to system converted timestamp.
> - If that date exceeds the current system time, this is considered as a
> late picture, and not displayed.

Actually there are 2 types, the late frames which are displayed but not 
in time. And the very late frames which are really dropped.

> - The pcr delay is "extended" by detecting the delay on first timestamp
> conversion (which is decoder output).
> - We can only extend delays. (otherwise we need to implement temporary
> rate change)
> 
> The Packetizer delay:
> - There's an unavoidable delay when we need to wait for the next picture
> to know the limit of the current access unit.
> 
> The Decoder delay:
> - We usually have a push/pull model. Today it is asynchronous.
> - When we pull, it is always triggered by the next incoming block.
> - The next incoming block might also be paced by the clock, or the
> stream itself
> 
> All those delays were compensated by the >= 300ms delay we set as
> pts-delay which is also buffering and the pcr delay extension done by
> the core.
> 
> 
> In my 30fps hevc case
> --------------------
> 
> The number of avcodec threads creates a global frame output delay which
> depends directly of the number of threads (and of course of the GOP
> references between the frames). With 10 threads, the 25fps video is now
> unable to playback, the total output delay being > 300ms.
> Why does pcr delay extension fails ? First (IFrame) picture
> outputs faster (no refs).
> 
> 
> Why was it working before ?
> -------------------------
> 
> In 2.x and 3.0 we gradually introduced changes:
> 
> * We changed the synchronous avcodec decoder push/pullwait model to an
> asynchronous push/pull.
> Potentially we increased the delay when the source is paced. This is
> also the case depending on fps and PTS<->PCR delay.
> If you have an audio stream, you also have a race with the first
> converted timestamp, which then is guaranteed to set up a lowest, too
> small, extended delay.
> 
> * We enabled threading in avcodec for H264: Frame Threading, which
> creates delay (this was documented !).
> Slice threading is also available and has less delay overhead but we did
> not enable it because this is not hardware decoding friendly.
> 
> 
> Low delay considerations (specific case)
> -----------------------
> The opposite case of the described problem is when you try to do low
> delay. (Your GOP is usually intra in that case).
> Your first picture will always output later than every other picture,
> because of the decoder startup time.
> 
> 
> So what ?
> ---------
> 
> There's few ways I can think about to fix the main issue
> - Have a way to report decoder delay.. but that would mean no playback
> until data decodes (what if only bogus data ? mutiple decs ?..).

That seems doable. This is the *max* decoder delay. If the source has 
bogus that will increase this delay, it's not really a problem.

> - Bump default pts-delay for now.

Not friendly to low delay, AFAIK.

> - Adapt pts-delay based on number of threads.

If the pts-delay is decoder based, why not. Because not every codec use 
threads, nor every decoder.

> - Rewrite the core to be able to add delay without rebuffering. I don't
> see how that's doable: that's similar to implementing delay reduction.
> - Kill frame threading based on a number of threads and fps.

We want to use all the threads available/needed. Not using them would be 
bad in general.


More information about the vlc-devel mailing list