[vlc-devel] The big one : Frame threading regressions
thomas at gllm.fr
Wed Sep 25 17:11:34 CEST 2019
For the record, Android with mediacodec has exactly the same issue: https://code.videolan.org/tguillem/vlc-android/blob/master/libvlc/src/org/videolan/libvlc/Media.java#L761
On Tue, Sep 17, 2019, at 17:56, Rémi Denis-Courmont wrote:
> Le sunnuntaina 15. syyskuuta 2019, 22.17.52 EEST Francois Cartegnie a écrit :
> > Hi,
> > We have lots of users complaining for too long about regressions with
> > VLC, and usually we can't pinpoint that regression and usually blame
> > hardware decoding.
> > We already know low frame rate is an issue in vlc. But It got worse.
> > The unpleasant truth is my shiny new 6 cores 12 threads system is
> > unable to display raw 30 fps hevc video in software, when my old 2 cores
> > one did. All pictures are *late* by a fixed amount.
> > Let's start describing few things.
> > -----------------------------
> > The Core:
> > - The picture display date is computed by adding the buffering delay and
> > the pts delay, and the pcr delay to the to system converted timestamp.
> > - If that date exceeds the current system time, this is considered as a
> > late picture, and not displayed.
> > - The pcr delay is "extended" by detecting the delay on first timestamp
> > conversion (which is decoder output).
> > - We can only extend delays. (otherwise we need to implement temporary
> > rate change)
> > The Packetizer delay:
> > - There's an unavoidable delay when we need to wait for the next picture
> > to know the limit of the current access unit.
> > The Decoder delay:
> > - We usually have a push/pull model. Today it is asynchronous.
> > - When we pull, it is always triggered by the next incoming block.
> > - The next incoming block might also be paced by the clock, or the
> > stream itself
> > All those delays were compensated by the >= 300ms delay we set as
> > pts-delay which is also buffering and the pcr delay extension done by
> > the core.
> > In my 30fps hevc case
> > --------------------
> > The number of avcodec threads creates a global frame output delay which
> > depends directly of the number of threads (and of course of the GOP
> > references between the frames). With 10 threads, the 25fps video is now
> > unable to playback, the total output delay being > 300ms.
> > Why does pcr delay extension fails ? First (IFrame) picture
> > outputs faster (no refs).
> > Why was it working before ?
> > -------------------------
> > In 2.x and 3.0 we gradually introduced changes:
> > * We changed the synchronous avcodec decoder push/pullwait model to an
> > asynchronous push/pull.
> > Potentially we increased the delay when the source is paced. This is
> > also the case depending on fps and PTS<->PCR delay.
> > If you have an audio stream, you also have a race with the first
> > converted timestamp, which then is guaranteed to set up a lowest, too
> > small, extended delay.
> > * We enabled threading in avcodec for H264: Frame Threading, which
> > creates delay (this was documented !).
> > Slice threading is also available and has less delay overhead but we did
> > not enable it because this is not hardware decoding friendly.
> > Low delay considerations (specific case)
> > -----------------------
> > The opposite case of the described problem is when you try to do low
> > delay. (Your GOP is usually intra in that case).
> > Your first picture will always output later than every other picture,
> > because of the decoder startup time.
> > So what ?
> > ---------
> > There's few ways I can think about to fix the main issue
> > - Have a way to report decoder delay.. but that would mean no playback
> > until data decodes (what if only bogus data ? mutiple decs ?..).
> VLC has been doing that for years. It waits for the first data of the decoder.
> Of course, that breaks in a number of corner cases, not the least of which is
> asynchronous/threaded decoding.
> We have to live with it until 5.0 buffer rework. But I don't see how this
> solves your problem, TBH.
> > - Bump default pts-delay for now.
> That's "easy" now that there's only 4 PTS delay settings instead of one per
> access. But AFAIK, that's really meant as a kludge for input jitter. AFAIU,
> increasing PTS delay will only make your problem slightly less likely; I doubt
> that we can find an acceptable tradeoff here.
> > - Adapt pts-delay based on number of threads.
> I don't know. Is it linear? How do you compute the correct value?
> > - Rewrite the core to be able to add delay without rebuffering. I don't
> > see how that's doable: that's similar to implementing delay reduction.
> > - Kill frame threading based on a number of threads and fps.
> Hypothetically, what about implementing hardware slice decoders natively, like
> the proposed NVDEC plugin, and only use libavcodec for software decoding?
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
More information about the vlc-devel