[vlc-devel] Updated patch for ogg/theora seeking

Gregory Maxwell gmaxwell at gmail.com
Wed Jun 16 08:03:31 CEST 2010


On Wed, Jun 9, 2010 at 7:04 PM, salsaman <salsaman at gmail.com> wrote:
> While I am waiting for a reply, I just thought I would explain the search
> algorithm used, so you verify it is very efficient.
>
> We create an index on the fly, this index is basically:
> offset in file -> maximum frame number
>
> When asked to seek to a particular frame, we first check the index to see if
> we have approximate boundaries for the seek, otherwise we will seek in the
> whole file, from data_start until the end.
>
> The area to be searched is divided into two halves. We first check the upper
> half, and get the highest and lowest granulepos. (The granulepos is
> basically keyframe * keyframe_offset + frame offset). If our target frame
> lies in this region we subdivide it into two halves, otherwise we check the
> lower half from earlier. We stop when we have found the keyframe for our
> target frame, or the search region is < minimum_page_size.


This sounds like a plain bisection search. You can get faster results
using an interpolated bisection.

If the current range is 0........10,000 seconds. And 0....100Mbytes
and you want 2500 seconds, you seek to ~25mbytes (minus the average or
maximum page size) because  2500 seconds is 25% of 10000 and 25 is 25%
of 100.

This is guaranteed to converge just like a bisection search.  You can
optimize this further, becoming closer to bisection as the ranges get
smaller and the variability becomes greater, and biasing more downward
because backward motion is more expensive that forward motion... but
simply using an interpolation in the outer most step can cut several
seeks out of a large file. If the bitrate is very constant (like
audio) it can get your average down to something like three no matter
what the file size is.  (libvorbisfile does this)

Regarding keyframes.  Your text above is concerning me.  You do not
know the granpos of the keyframe covering a particular frame until you
have visited that frame.    The granshift implies a _maximum_ keyframe
interval, but the keyframes can be placed sooner than that as the
encoder desires, and they often are.    It wasn't quite clear to me
that your approach was dealing with this.  I should probably go read
the code.

Normally, In order to do a fully synchronized seek you must first seek
to the desired frame, extract the granpos of its keyframe, seek back
to that point, then decode forward.  If you cache your seeking
requests along the way and use them to hot-start the second search
then you should be able to hit the key frame very quickly in one seek
or so.


I'd recommend adding both keyframes and non-keyframes to your index
(or keep two indexes in memory, if you're minimizing memory usage and
don't want non-keyframes polluting a finite index) since both can help
constrain the bounds of your future searches, not just ones belonging
to a common keyframe offset. Every seek provides information which
might be useful in the future.

The indexing scheme in ogg index isn't precisely compatible with your
approach, so you won't just be able to load its values into your
index. This is because instead of mapping time->bytes it maps time to
a byte location which is equal to or less than the byte location of
the requested time,  such that the expressed byte-location is
sufficient to perfectly reconstruct the requested time.    It is done
this way so that it will work uniformly across formats with
drastically different reference structures (dirac, theora, vorbis)
without requiring too more per-codec smarts in the applications.
But this also means that 1:00-> byte 1, 2:00->byte 2, 3:00->byte 3 is
a perfectly valid, if useless, index. (since all those byte locations
are sufficient starting points for finding those times.  In the common
(theora keyframe) case the byte location will exactly match the page
which starts the keyframe that you want, but the promise is that it
will only be equal to or less than.

So instead of copying the ogg skeleton index into your indexes, you
should use it to hot start (choose the position of your first seek)
your existing algorithm.  When everything works well the combination
will give you the absolute minimum number of seeks possible, and when
the index is missing or corrupt you would still always produce correct
results.

(the skeleton index also isn't guaranteed to store all (or even most)
possible seek points, otherwise it would grow to enormous sizes, so
you must still have regular seeking logic to get acceptable
granularity)


Cheers!



More information about the vlc-devel mailing list