[vlc-devel] IVTC: this week: thinking outside the (temporal) box

Wed Feb 23 00:19:21 CET 2011

Hi all,

In this week's dev update, we'll be breaking video frame boundaries and
introducing a statistical density estimator into the deinterlacer module :)

When I began the work on the updated IVTC patch, I noticed that, since
Phosphor requires the same ComposeFrame() function as IVTC does, it
would make sense to update that patch first.

So, I started working on that. I have pretty much completed it, and also
upgraded ComposeFrame() into a general-use helper function with several
chroma handling modes for 4:2:0 input. This is keeping in mind the
upcoming split, which will have a separate file for such general-use
functions. I'll post an updated Phosphor soon, as a base which I can
then build the updated IVTC on.

As sometimes happens during development, I got a new idea. Since
Phosphor is a field renderer, its input can be thought of - as well as
the traditional stream of frames - as a stream of fields. The output of
the Phosphor filter is basically a composition of the latest two input
fields, plus the luma decay filtering. What happens if we switch the
darkening filter off?

Consider a stream of input frames:

11, 22, 33, 44, ...

where the first number refers to the first field (in time), and the
second number to the second field (in time).

Seeing this input stream, Phosphor (now with darkening off) will render
the following frames:

11, 12, 22, 23, 33, 34, 44, ...

where each frame now takes half the original frame display time (hence,
framerate doubler).

The pattern looks somewhat familiar. Consider a telecined stream:

11, 12, 23, 33, 44, aa, ...

where the numbers now indicate the original film frames, and "aa" is the
next "11". Phosphor will render

11, 11, 12, 22, 23, 33, 33, 34, 44, 4a, aa, ...

This stream has a nice property the original does not: each film frame
can be extracted from it as-is (including the problematic "22"). Note
that again this stream has been framerate-doubled.

Even if we do no further deinterlacing, watching a telecined signal
through this filter gives a much more even look than the 3 progressive
frames, 2 interlaced frames, 3 progressive frames, ... look that one
gets without any filtering. The framerate doubling really helps.

Developing the idea a bit further, we render only those frames that are
judged progressive, and otherwise keep (re-render) the previous output
picture. This approach does no framerate conversion, so the output will
run at 60fps.

Assume, for the sake of argument, an ideal interlace detector. Let's see
what happens to the above example stream. The mixed frames are rejected,
and we get the following output stream:

11, 11, (11), 22, (22), 33, 33, (33), 44, (44), aa, ...

where the parentheses mean that the previous output has been
re-rendered. The display times alternate between 3 and 2 *fields* for
the original film frames - just like in the original telecined signal.

The critical difference is, of course, that this stream is progressive.

Now, in the real world, what can we do to detect interlaced vs. progressive?

Interlace detection is based on a comb metric, which is used to compute
a score, which can range from 0 to some large positive number for each
frame. We already know that the scores for *progressive* frames will
jump wildly based on the actual content of the video frame. Anime is
especially difficult, because it contains one-pixel thick outlines,
which easily produce false positives. Thus, an adaptive strategy is needed.

It is fairly obvious that in a telecined stream, assuming no scene cut,
the interlace scores of the Phosphor output frames cluster around two
(unknown) values, corresponding to progressive and interlaced frames.
The problem transforms into: what can we do to cluster the scores?

Let's assume that the data is stochastic. If we can construct an
estimate for its probability density, and find out which of its modes
(local maxima) each data point belongs to, we have a data clustering method.

Kernel density estimation (a.k.a. the Parzen-Rosenblatt window method)
is a popular technique for constructing unknown probability densities in
a nonparametric manner. Now the problem transforms into determining an
optimal kernel bandwidth, which does not over- or undersmooth the data.

For this purpose, the study of Sheather & Jones (1991) provides a set of
formulas, which can be used fairly easily, and which (based on my tests)
seems to converge sufficiently fast for the purposes of a realtime field
renderer. The only problem is that the method starts working properly at
n = 50 or so - and we only have a handful of data points. But it seems
that triplicating the data (entering each data point three times)
sharpens the optimal bandwidth estimate enough, so that it becomes
usable for this particular application.

So, about 500 lines of statistics later, out of all this we get
Phosphate/IVTC2x: a framerate-doubling IVTC filter based on Phosphor.
After running some tests, my personal opinion is that it's not as good
as the traditional IVTC, but on the other hand it's just a prototype so
I'm sure it could be tuned a bit. I'll post the code soon after the
updated Phosphor patch :)

That's all for now,

 -J

References:

Sheather, S. J. & Jones, M. C. A Reliable Data-Based Bandwidth Selection
Method for Kernel Density Estimation. Journal of the Royal Statistical
Society. Series B (Methodological). 53(3), 1991, 683--690.