[vlc-devel] issue arising from OggDirac muxing
David Flynn
davidf+nntp at woaf.net
Mon Oct 27 22:50:58 CET 2008
I'm currently trying to implement the OggDirac mapping, and fairly
easily have something that works. However, I've come up against a
short coming in the ogg muxing.
The OggDirac mapping goes to great effort to allow the regeneration
of both DTS and PTS from the granulepos. The numbers are munged a
bit into units of pictures rather than 90kHz.
Unfortunately Ogg doesn't really help matters with a lack of any real
synchronisation primitives.
I'll start with an example of what a stream should look like, then
how it all goes wrong:
So a correct OggDirac stream has pt(pts but in units of pictures
and dt(similarly dts): (picture types are arbitrary examples)
I P P P P P
pt: [5, 0, 1, 2, 3, 4, ...]
dt: [-1,0, 1, 2, 3, 4, ...]
Now, lets assume i'm transcoding some video that ends up with
the following at the encoder output:
Audio-pts: [ -, 10, 11, 12, 13, 14, 15 ...]
Video-pts: [15, 10, 11, 12, 13, 14, ...]
Video-dts: [ 9, 10, 11, 12, 13, 14, ...]
The first thing the oggmuxer does is to stash a copy of the first dts;
call it dts[0]. The timestamps in the stream are now all normalised
by subtracting dts[0] from them.
It is important to note that this is done *separately* for each stream:
Audio-pt: [-, 0, 1, 2, 3, 4, ...
Video-pt: [6, 1, 2, 3, 4, ...]
Video-dt: [0, 1, 2, 3, 4, ...]
Notice that the 'pt' for the first audio packet(pt=0) and first output
video picture(pt=1) are no longer identical? This results in this
example for a 1frame a/v sync error.
This all happened due to that renorm. It isn't an issue for theora,
due to its lack of out-of-order pictures (ie, pts===dts for each theora
picture).
I can see two ways of fixing this problem; only one of them actually
works:
- For each video elementary stream, hold an extra piece of metadata
to signal the timestamp of PTS(0), ie the first picture to emerge
from a decoder.
PTS(0) is then subtracted from all timestamps in that elementary
stream, much like it currently happens.
Note, I was quite deliberate in making the above example start
with a reordered frame -- you can't guess the correct offset by
staring at the timestamps (there could be a highernesting depth
still to come after the first 6 pictures that would require the
first dt to be -2.
- One could buffer a load of blocks and try to guess the numbers
(based upon looking for the time that pts==dts), however it
isn't a bounded problem (only that you get more confidence with
the more blocks you inspect).
So i propose adding an extra field to es_format_t to store the first
pts output from a decoder (or packetizer if it can solve it).
i_time_pts0 ? any other suggestions?
..david
More information about the vlc-devel
mailing list