[vlc-devel] The big one : Frame threading regressions

Francois Cartegnie fcartegnie at free.fr
Sun Sep 15 21:17:52 CEST 2019


We have lots of users complaining for too long about regressions with
VLC, and usually we can't pinpoint that regression and usually blame
hardware decoding.

We already know low frame rate is an issue in vlc. But It got worse.

The unpleasant truth is my shiny new 6 cores 12 threads system is
unable to display raw 30 fps hevc video in software, when my old 2 cores
one did. All pictures are *late* by a fixed amount.

Let's start describing few things.

The Core:
- The picture display date is computed by adding the buffering delay and
the pts delay, and the pcr delay to the to system converted timestamp.
- If that date exceeds the current system time, this is considered as a
late picture, and not displayed.
- The pcr delay is "extended" by detecting the delay on first timestamp
conversion (which is decoder output).
- We can only extend delays. (otherwise we need to implement temporary
rate change)

The Packetizer delay:
- There's an unavoidable delay when we need to wait for the next picture
to know the limit of the current access unit.

The Decoder delay:
- We usually have a push/pull model. Today it is asynchronous.
- When we pull, it is always triggered by the next incoming block.
- The next incoming block might also be paced by the clock, or the
stream itself

All those delays were compensated by the >= 300ms delay we set as
pts-delay which is also buffering and the pcr delay extension done by
the core.

In my 30fps hevc case

The number of avcodec threads creates a global frame output delay which
depends directly of the number of threads (and of course of the GOP
references between the frames). With 10 threads, the 25fps video is now
unable to playback, the total output delay being > 300ms.
Why does pcr delay extension fails ? First (IFrame) picture
outputs faster (no refs).

Why was it working before ?

In 2.x and 3.0 we gradually introduced changes:

* We changed the synchronous avcodec decoder push/pullwait model to an
asynchronous push/pull.
Potentially we increased the delay when the source is paced. This is
also the case depending on fps and PTS<->PCR delay.
If you have an audio stream, you also have a race with the first
converted timestamp, which then is guaranteed to set up a lowest, too
small, extended delay.

* We enabled threading in avcodec for H264: Frame Threading, which
creates delay (this was documented !).
Slice threading is also available and has less delay overhead but we did
not enable it because this is not hardware decoding friendly.

Low delay considerations (specific case)
The opposite case of the described problem is when you try to do low
delay. (Your GOP is usually intra in that case).
Your first picture will always output later than every other picture,
because of the decoder startup time.

So what ?

There's few ways I can think about to fix the main issue
- Have a way to report decoder delay.. but that would mean no playback
until data decodes (what if only bogus data ? mutiple decs ?..).
- Bump default pts-delay for now.
- Adapt pts-delay based on number of threads.
- Rewrite the core to be able to add delay without rebuffering. I don't
see how that's doable: that's similar to implementing delay reduction.
- Kill frame threading based on a number of threads and fps.

That's not fun at all.


More information about the vlc-devel mailing list