[vlc-devel] The big one : Frame threading regressions
remi at remlab.net
Tue Sep 17 17:56:01 CEST 2019
Le sunnuntaina 15. syyskuuta 2019, 22.17.52 EEST Francois Cartegnie a écrit :
> We have lots of users complaining for too long about regressions with
> VLC, and usually we can't pinpoint that regression and usually blame
> hardware decoding.
> We already know low frame rate is an issue in vlc. But It got worse.
> The unpleasant truth is my shiny new 6 cores 12 threads system is
> unable to display raw 30 fps hevc video in software, when my old 2 cores
> one did. All pictures are *late* by a fixed amount.
> Let's start describing few things.
> The Core:
> - The picture display date is computed by adding the buffering delay and
> the pts delay, and the pcr delay to the to system converted timestamp.
> - If that date exceeds the current system time, this is considered as a
> late picture, and not displayed.
> - The pcr delay is "extended" by detecting the delay on first timestamp
> conversion (which is decoder output).
> - We can only extend delays. (otherwise we need to implement temporary
> rate change)
> The Packetizer delay:
> - There's an unavoidable delay when we need to wait for the next picture
> to know the limit of the current access unit.
> The Decoder delay:
> - We usually have a push/pull model. Today it is asynchronous.
> - When we pull, it is always triggered by the next incoming block.
> - The next incoming block might also be paced by the clock, or the
> stream itself
> All those delays were compensated by the >= 300ms delay we set as
> pts-delay which is also buffering and the pcr delay extension done by
> the core.
> In my 30fps hevc case
> The number of avcodec threads creates a global frame output delay which
> depends directly of the number of threads (and of course of the GOP
> references between the frames). With 10 threads, the 25fps video is now
> unable to playback, the total output delay being > 300ms.
> Why does pcr delay extension fails ? First (IFrame) picture
> outputs faster (no refs).
> Why was it working before ?
> In 2.x and 3.0 we gradually introduced changes:
> * We changed the synchronous avcodec decoder push/pullwait model to an
> asynchronous push/pull.
> Potentially we increased the delay when the source is paced. This is
> also the case depending on fps and PTS<->PCR delay.
> If you have an audio stream, you also have a race with the first
> converted timestamp, which then is guaranteed to set up a lowest, too
> small, extended delay.
> * We enabled threading in avcodec for H264: Frame Threading, which
> creates delay (this was documented !).
> Slice threading is also available and has less delay overhead but we did
> not enable it because this is not hardware decoding friendly.
> Low delay considerations (specific case)
> The opposite case of the described problem is when you try to do low
> delay. (Your GOP is usually intra in that case).
> Your first picture will always output later than every other picture,
> because of the decoder startup time.
> So what ?
> There's few ways I can think about to fix the main issue
> - Have a way to report decoder delay.. but that would mean no playback
> until data decodes (what if only bogus data ? mutiple decs ?..).
VLC has been doing that for years. It waits for the first data of the decoder.
Of course, that breaks in a number of corner cases, not the least of which is
We have to live with it until 5.0 buffer rework. But I don't see how this
solves your problem, TBH.
> - Bump default pts-delay for now.
That's "easy" now that there's only 4 PTS delay settings instead of one per
access. But AFAIK, that's really meant as a kludge for input jitter. AFAIU,
increasing PTS delay will only make your problem slightly less likely; I doubt
that we can find an acceptable tradeoff here.
> - Adapt pts-delay based on number of threads.
I don't know. Is it linear? How do you compute the correct value?
> - Rewrite the core to be able to add delay without rebuffering. I don't
> see how that's doable: that's similar to implementing delay reduction.
> - Kill frame threading based on a number of threads and fps.
Hypothetically, what about implementing hardware slice decoders natively, like
the proposed NVDEC plugin, and only use libavcodec for software decoding?
More information about the vlc-devel