[vlc-devel] [PATCH] Added IVTC deinterlacer (NTSC film mode)

Wed Jan 5 22:04:40 CET 2011

Hi,

Thanks for the review.

Comments below.

On 01/05/2011 01:13 AM, Laurent Aimar wrote:
> On Mon, Jan 03, 2011 at 02:33:52AM +0200, Juha Jeronen wrote:
>    
>> Anyway, here is the updated IVTC patch (in the message body this time).
>>      
>   It seems that the patch got broken by that.
>    

Yes, it seems spurious newlines were inserted. Sorry about that. I'll 
send the next version as an attachment.

>   You can use a shorter way:
> typedef struct
> {
>   [blah]
> } ivtc_sys_t
>
>    
Will fix.

> Using
>   ivtc_sys_t ivtc;
> here would simplify a bit.
>    
Good suggestion. Will fix.

>> +/* Helper function: compose frame from given field pair.
>> +
>> +   The inputs are full pictures (frames); only one field will be used
>> from each.
>> +   Caller must manage allocation/deallocation of p_outpic.
>> +   Pitches of the inputs must match!
>> +*/
>>      
>   Using /** will make it doxygen compatible.
>    
Good point. Will fix.

On which note, what do you think, should I make the lengthy explanation 
of the filter doxygen compatible, or should it be moved into a separate 
file?

I think the explanation is necessary as documentation - to make it 
easier to understand how the code works, what the filter does and for 
which kind of material it is useful, but I also think that a 300-line 
block of English doesn't really belong in a source file. I attempted to 
make it shorter, but that's the shortest I could do (while still 
including everything that I think is relevant).

>> +static void ComposeFrame( filter_t *p_filter, picture_t *p_outpic,
>>      
---8<---[snip]---8<---
>> subsampled chroma... -JJ */
>>      
>   If I correctly understand your code, I would say it probably depends on
> how the telecine frames where generated. And I bet they are usually created
> by using one line of every two from the original I420 progressive source. If
> so, you simply revert it without any loss.
>    
Yes, to my understanding that's the correct way to do telecine. And 
reverting it without loss (if possible) is indeed the idea.

So, I suppose the I420 refers to the chroma format inside each field, 
and not to the whole frame? This would explain why it works the way it does.

(When the progressive frame is reconstructed correctly, it obviously 
does not matter which of its fields the I420 chroma information is taken 
from. Thus the line-alternating copy - effectively switching the chroma 
source field every two luma lines - will produce the correct result.)

>> +                for( ; p_out<  p_out_end ; )
>> +                {
>> +                    vlc_memcpy( p_out, p_in_top,
>> p_inpic_top->p[i_plane].i_pitch );
>> +                    p_out += p_outpic->p[i_plane].i_pitch;
>> +                    vlc_memcpy( p_out, p_in_bottom,
>> p_inpic_bottom->p[i_plane].i_pitch );
>> +                    p_out += p_outpic->p[i_plane].i_pitch;
>> +
>> +                    p_in_top    += 2 * p_inpic_top->p[i_plane].i_pitch;
>> +                    p_in_bottom += 2 * p_inpic_bottom->p[i_plane].i_pitch;
>> +                }
>> +                break;
>>      
>   I may have missed something, but I believe the switch() is in fact useless.
>    
True. Thanks for catching. It was left over from the old test and I 
missed it. Will fix.

>> +                } /* end of disabled development version */
>>      
>   You should simply remove it as it is dead code.
>    
Ok. Will fix.

>> +static int CalculateInterlaceScore(filter_t *p_filter, picture_t* p_pic)
>> +{
>> +    assert( p_filter != NULL );
>> +    assert( p_pic != NULL );
>> +
>> +    /* This is based on the comb detector used in the IVTC filter of
>> Transcode 1.1.5. */
>>      
>   Is it your own reimplementation of the algorithm?
>    
The comb detector? Yes, the implementation is my own, I just used the 
existing comb metric. The important part is the (P-C)*(N-C) and the 
threshold of 100.

There are some implementation differences: Transcode never explicitly 
constructs the temporary frames, but performs the comb detection 
directly on the original data. I implemented it this way to keep the 
function reusable, and because I thought it was conceptually cleaner 
(although it does add some runtime cost).

For frame construction, Transcode pairs TCBP, TCBC and TCBN (using a 
unique top field from the input stream for each output frame - this is 
ingenious), while my code pairs TNBN, TNBC and TCBN. My code also keeps 
track of the scores of all seven possible combinations in the stencil, 
because it's required for the cadence detector.

As far as the code is concerned, I wrote the filter from scratch. Parts 
of VLC's own deinterlace.c were useful as a template (RenderBob, 
RenderX, RenderYadif, and some MMX code), but that's it.

About the ideas used, I had independently come up with the idea of the 
lowest interlace score winning (for the triplets in the cadence 
detector), but then I saw the same idea used in Transcode in another way 
(output frame selection). Currently my implementation uses the idea for 
both purposes, and Transcode's is acknowledged in the comment at the 
frame construction phase. The cadence detector is original - I haven't 
seen this strategy used before.

(Note that Transcode's ivtc filter does no cadence detection - instead, 
it works by rejecting the most likely duplicate in each input frame 
group of 5, doing this in a separate "decimate" filter.)

Cadence tracking in my filter borrows some very general ideas from the 
IVTC filter in TVTime (in xine-lib 1.1.19: 
src/post/deinterlace/pulldown.c, tvtime.c) - mainly, actively looking 
for cadence breaks by keeping a counter and comparing to detected 
position at every frame. Although the necessity for this is rather 
obvious, after reading about how badly anime is often edited (no one 
makes cuts that don't break the telecine)...

The part marked "scratch paper" in pulldown.c was somewhat useful to get 
started with, but in the end I made my own cadence tables and thought 
through the output frame timings myself. Compared to TVTime, the 
implementation differences are pretty big: TVTime uses the TFF flag, 
whereas my filter doesn't. TVTime uses only two input frames, while my 
filter uses three.

All the logic based on the three-frame stencil is original. The basic 
detection strategy follows pretty easily from the cadence information, 
if one starts looking what could be done viewing an NTSC telecine 
through a three-frame stencil (hence the cadence tables in the lengthy 
comment; these are original, although easily produced). Also the voting 
for TFF/BFF telecine in cadence analysis is original.

Finally, about the documentation contained in the lengthy comment. The 
cadence detection section is completely original. The explanation of 
NTSC telecine is original, but the same information can be easily found 
in several online sources. Explanation of frame reconstruction is short, 
but original. Timestamp mangling follows easily from the explanation of 
telecine, so I've just worked it out and written it down. The lock-on 
delay consideration for my detector is original (although trivial).

(On the timestamp note - I noticed now that I've removed something 
critical from the final version of the explanation, while I was making 
it shorter. Namely: it is clear that the dropped frame must be "e"; this 
can be seen from the timings and the cadence tables. Timings: if we have 
only one future frame, "e" is the only one whose PTS, comparing to the 
film frames, allows dropping it safely (to see this, consider which film 
frame needs to be rendered as each new input frame arrives). Cadence 
tables: it is ok to drop "e", because the same film frame "1" is 
available also at the next PCN position "eab". I'll add a note to this 
effect.)

Anyway - maybe "This is based on..." is too strong, and "This uses ideas 
from..." would be better? I'm not sure a completely accurate note any 
shorter than what I just wrote is necessarily possible ;)

If you want to see Transcode's implementation of comb detection, look at 
lines 207-252 in filter/filter_ivtc.c. You can get the sources to 
version 1.1.5 here:
http://developer.berlios.de/project/showfiles.php?group_id=10094

On a historical note, I'm not sure who wrote the original code that 
ended up in Transcode. Transcode's ivtc filter is by T. Tsiodras, but 
those lines are marked as a "blatant copy" with no source indicated. The 
most likely source is MPlayer - they got an IVTC filter around the same 
time, and I think MPlayer has a "decimate", too. Unless it was 
originally written for something else. At least it's not TVTime, as that 
implementation is completely different. It could be the DScaler 
DirectShow filter or AviSynth, but now I'm just guessing as I haven't 
looked at those sources.

Now, with this sidetrack out of the way...

>> +    int i_plane;
>> +    int i_score = 0;
>> +    int y, x;
>>      
>    Moving x/y inside the loop is a tiny bit better.
>    
Will fix.

>> +    for( i_plane = 0 ; i_plane<  p_pic->i_planes ; i_plane++ )
>> +                    /* Low-order part: */
>> +                    movq_r2r( mm2, mm0 );
>> +                    punpckldq_r2r( mm7, mm0 );
>> +                    int32_t result3;
>> +                    movq_r2m( mm0, result3 );
>> +                    /* High-order part: */
>> +                    movq_r2r( mm2, mm0 );
>> +                    punpckhdq_r2r( mm7, mm0 );
>> +                    int32_t result4;
>> +                    movq_r2m( mm0, result4 );
>>      
>   movq (for the 4 assignments) cannot do what you want, making the whole
> function always returning 0 (at least for me)... If you also end up with
> a non valid value, I fear that some of the (quality) tests you did are not
> meaningful.
>    
It worked for me, so I didn't notice this until now. Thanks.

What I wanted to do there was to move the doubleword result into memory. 
So indeed, the instruction movq is wrong - it should be movd instead.

When fixing this, I found one other serious bug in that function - 
punpcklbw instead of punpcklwd (and same for hbw/hwd) when unpacking the 
multiplication results. Oops. I wonder how that worked at all. I guess I 
got lucky.

So, you're right, I need to redo my output quality tests. I'll also need 
to re-tune the residual interlace threshold :)

While at it, I converted the whole inner loop into MMX, as I had 
originally planned. I had misread the MMX reference - didn't notice at 
first that MMX actually does provide a signed packed comparison (I 
usually work in high-level languages... excuses, excuses ;) ). With 
that, this is easy.

The score is kept in a register. We need five fewer memory accesses per 
iteration of the inner loop, since i_score and result1-4 no longer need 
to be accessed. We only need three 32-bit fetches, and three pointer 
updates, which cannot be avoided.

Details will follow in the next version of the patch.

>> +enum ivtc_field_pair { FIELD_PAIR_TPBP = 0, FIELD_PAIR_TPBC = 1,
>> +                       FIELD_PAIR_TCBP = 2, FIELD_PAIR_TCBC = 3,
>> +                       FIELD_PAIR_TCBN = 4, FIELD_PAIR_TNBC = 5,
>> +                       FIELD_PAIR_TNBN = 6 };
>> +typedef enum ivtc_field_pair ivtc_field_pair;
>>      
> typedef enum {
>   [blah]
> } ivtc_field_pair;
> is enough.
>    
Will fix.

>> +    assert( p_next != NULL );
>> +    int i_chroma = p_filter->fmt_in.video.i_chroma;
>> +    int i_size_x = p_next->p[Y_PLANE].i_pitch;
>> +    int i_size_y = p_next->p[Y_PLANE].i_visible_lines;
>> +    /* Note that both frames in the working area are always allocated
>> at the same time,
>> +       so it is enough to check [0]. */
>>      
>   As you fill p_history with pictures created by picture_NewFromFormat()
> and as you can consider that the video size will not dynamically change,
> it is not needed to support size changes here. It will remove a lot of code.
>    
On this I have a question... can we be sure it doesn't change even if 
the user switches to another video source? Say, from a DVD to a file, 
which didn't come from an NTSC source. Does the core always restart the 
filter chain in such a case? If so, can we rely on it doing so in the 
future, too?

(My main future concern is that there have been posts asking about 
seamless playback on the forums. If there are plans for that in 1.2.x 
(as I understood), will it affect the filter chain restart?)

>> +            char temp[1000];
>> +            sprintf(temp, "IVTC: removing residual interlacing (score
>> %d>  %d)", i_result_interlace_score, RENDERIVTC_INTERLACE_THRESHOLD );
>> +            msg_Dbg( p_filter, temp );
>>      
>   Why using a temporary buffer? msg_Dbg() support printf like arguments.
>    
Because I hadn't noticed that ;)

Thanks for the tip. Will fix.

>> +        if( p_ivtc->pi_interlace_scores[FIELD_PAIR_TNBN]>
>> RENDERIVTC_INTERLACE_THRESHOLD)
>> +            RenderX( p_dst, p_next );
>> +        else
>> +            picture_Copy( p_dst, p_next );
>> +
>> +        return VLC_SUCCESS;
>> +    }
>> +    else /* now the only possibility is (!p_prev&&  p_curr&&  p_next) */
>>      
>   It's better to add an assert() than a comment in such cases.
>    
True. Will fix.

Or maybe even actually check the case, and error out in a final else 
block if none of the conditions match. It's not like it costs too many 
CPU cycles ;)

>> +
>> +    /* IVTC */
>> +    for( int i = 0; i<  IVTC_WORKING_AREA_SIZE; i++ )
>> +    {
>> +        if( p_ivtc->pp_ivtc_working_area[i] )
>> +            picture_Release( p_ivtc->pp_ivtc_working_area[i] );
>> +        p_ivtc->pp_ivtc_working_area[i] = NULL;
>> +    }
>> +    p_ivtc->b_possible_cadence_break_detected = false;
>> +    p_ivtc->i_cadence_pos = CADENCE_POS_INVALID;
>> +    p_ivtc->i_telecine_field_dominance = TFD_INVALID;
>> +    p_ivtc->i_ivtc_filter_mode = IVTC_MODE_DETECTING;
>> +    for( int i = 0; i<  IVTC_NUM_FIELD_PAIRS; i++ )
>> +        p_ivtc->pi_interlace_scores[i] = 0;
>> +    for( int i = 0; i<  IVTC_DETECTION_HISTORY_SIZE; i++ )
>> +        p_ivtc->pi_cadence_pos_history[i] = CADENCE_POS_INVALID; /*
>> detected positions */
>> +    p_ivtc->i_old_chroma = -1;
>> +    p_ivtc->i_old_size_x = -1;
>> +    p_ivtc->i_old_size_y = -1;
>>      
>   It seems that some part could be factorized with the init code.
>    
Yes. I'll make a function...

Before I fix this, I would like your comments on the video size change 
issue and what to do with the lengthy comment (filter documentation).

Thanks again for the review :)

  -J