[vlc-devel] IVTC accuracy improvement idea

Wed Jan 12 10:20:47 CET 2011

Hi all,

I did some further R&D on the IVTC filter. Current results below.

On 01/10/2011 02:12 PM, Juha Jeronen wrote:
> I suddenly got one idea for the improvement of IVTC accuracy. Keep the 
> frame composer running all the time (even when no telecine is 
> detected), and only handle the framerate conversion (and frame 
> dropping) with the cadence tracking mechanism.
This turned out not to be such a bright idea after all. Its only effect 
was that it reintroduced the subtitle flicker problem that the Transcode 
composition strategy had completely gotten rid of. So, the current 
posted version is better.

During those talking scenes that tend not to work correctly, it's not 
that the filter exits film mode (as I thought at first) - it doesn't - 
but rather that the Transcode comb metric ranks the field pairs 
incorrectly in such cases.

The cadence detector isn't that good for general input. This explains 
why the Transcode strategy works the best for frame composition - it 
doesn't need to know the cadence position. (This is another aspect that 
makes it ingenious.)

The detector, as of the posted version, already works rather well in 
cases with horizontal motion, but making it better in general is a hard 
problem. I tried to make it more reliable by judging how much of an 
outlier the winning score is (by computing the variance of 
pi_ivtc_scores with and without the winner included). Theoretically, 
this should indicate how certain the detection result is, but in 
practice, it didn't really help.

Horizontal camera pans give the correct result a 50-70% boost (i.e. 
1.5x..1.7x) to variance with the winner included, indicating that the 
other scores are indeed much higher (worse) as expected. But if there is 
just some horizontal motion (e.g. one small-ish moving object), the 
difference decreases to ~20%, which already gives false positives rather 
often. Also, by this metric, the stencil positions "abc" and "bcd" seem 
consistently harder to detect than the other three.

These results don't change even if I define the score as 
(pi_best_field_pairs - pi_worst_field_pairs), so the previous 
observation stands that both strategies do the same thing.

The best this kind of strategy can do is to lock on when there is a pan, 
and then keep the detected cadence until a new reliable detection is 
possible. This stops working when there is a bad cut in the film. I 
tried a cut detection approach ("i_blocks_with_motion > 5000" seems 
pretty reliable while still avoiding false positives), and switching the 
frame composition strategy while the detector acquires a new lock-on, 
but this didn't work very well either.

I also played around with progressive material detection (check if 
position "dea" comes up 3 times in a row, while there is motion so the 
frames are not identical), to see if the reliability-estimating detector 
could do that better. Mostly it worked, but false positives caused some 
previously easy cases to break, so I think it can't be included after all.

I now have another idea, which I haven't tested yet. Since there is a 
block-based motion detector, it is possible to include only blocks 
having motion into the interlace detection. The scores can be normalized 
by the number of blocks with motion, at least theoretically making them 
comparable.

The difficult part with this approach is what to do with the "trivial" 
combinations, where both fields come from the same frame, since by 
definition they have no motion between them. I'm thinking that for P (N 
respectively), it's possible to use the motion detection result between 
P and C (C and N respectively), and include only those blocks, so that 
the same blocks take part in the detection as for the combinations 
TPBC/TCBP (TCBN/TNBC respectively). For C, one can probably include the 
union (in the sense of logical OR) of those motion detection results, 
and the normalization should take care of the rest.

Even if this strategy is used only in the frame composer (and not in the 
cadence detector, as the case of C is a bit unsure how it'll compare to 
the rest), it should make the frame construction more accurate. Emphasis 
on "should" - IVTC for hard telecine is more of an art than a science.

I'll test this when I have time.

Oh, and I built a soft telecine remover. It was simple (based on 
i_nb_fields of the frames in the three-frame cache) and works fine.

That's all for today :)

  -J