[vlc-devel] issue arising from OggDirac muxing

Mon Oct 27 22:50:58 CET 2008

I'm currently trying to implement the OggDirac mapping, and fairly
easily have something that works.  However, I've come up against a
short coming in the ogg muxing.

The OggDirac mapping goes to great effort to allow the regeneration
of both DTS and PTS from the granulepos.  The numbers are munged a
bit into units of pictures rather than 90kHz.

Unfortunately Ogg doesn't really help matters with a lack of any real
synchronisation primitives.

I'll start with an example of what a stream should look like, then
how it all goes wrong:

So a correct OggDirac stream has pt(pts but in units of pictures
and dt(similarly dts): (picture types are arbitrary examples)
     I  P  P  P  P  P
pt: [5, 0, 1, 2, 3, 4, ...]
dt: [-1,0, 1, 2, 3, 4, ...]

Now, lets assume i'm transcoding some video that ends up with
the following at the encoder output:
Audio-pts: [ -, 10, 11, 12, 13, 14, 15 ...]
Video-pts: [15, 10, 11, 12, 13, 14, ...]
Video-dts: [ 9, 10, 11, 12, 13, 14, ...]

The first thing the oggmuxer does is to stash a copy of the first dts;
call it dts[0]. The timestamps in the stream are now all normalised
by subtracting dts[0] from them.

It is important to note that this is done *separately* for each stream:
Audio-pt: [-, 0, 1, 2, 3, 4, ...
Video-pt: [6, 1, 2, 3, 4, ...]
Video-dt: [0, 1, 2, 3, 4, ...]

Notice that the 'pt' for the first audio packet(pt=0) and first output
video picture(pt=1) are no longer identical?  This results in this
example for a 1frame a/v sync error.

This all happened due to that renorm.  It isn't an issue for theora,
due to its lack of out-of-order pictures (ie, pts===dts for each theora
picture).

I can see two ways of fixing this problem; only one of them actually
works:
 - For each video elementary stream, hold an extra piece of metadata
   to signal the timestamp of PTS(0), ie the first picture to emerge
   from a decoder.

   PTS(0) is then subtracted from all timestamps in that elementary
   stream, much like it currently happens.

   Note, I was quite deliberate in making the above example start
   with a reordered frame -- you can't guess the correct offset by
   staring at the timestamps (there could be a highernesting depth
   still to come after the first 6 pictures that would require the
   first dt to be -2.

 - One could buffer a load of blocks and try to guess the numbers
   (based upon looking for the time that pts==dts), however it
   isn't a bounded problem (only that you get more confidence with
   the more blocks you inspect).

So i propose adding an extra field to es_format_t to store the first
pts output from a decoder (or packetizer if it can solve it).
i_time_pts0 ? any other suggestions?

..david