[vlc-devel] [PATCH] Added IVTC deinterlacer (NTSC film mode)
Juha Jeronen
juha.jeronen at jyu.fi
Wed Jan 5 22:04:40 CET 2011
Hi,
Thanks for the review.
Comments below.
On 01/05/2011 01:13 AM, Laurent Aimar wrote:
> On Mon, Jan 03, 2011 at 02:33:52AM +0200, Juha Jeronen wrote:
>
>> Anyway, here is the updated IVTC patch (in the message body this time).
>>
> It seems that the patch got broken by that.
>
Yes, it seems spurious newlines were inserted. Sorry about that. I'll
send the next version as an attachment.
> You can use a shorter way:
> typedef struct
> {
> [blah]
> } ivtc_sys_t
>
>
Will fix.
> Using
> ivtc_sys_t ivtc;
> here would simplify a bit.
>
Good suggestion. Will fix.
>> +/* Helper function: compose frame from given field pair.
>> +
>> + The inputs are full pictures (frames); only one field will be used
>> from each.
>> + Caller must manage allocation/deallocation of p_outpic.
>> + Pitches of the inputs must match!
>> +*/
>>
> Using /** will make it doxygen compatible.
>
Good point. Will fix.
On which note, what do you think, should I make the lengthy explanation
of the filter doxygen compatible, or should it be moved into a separate
file?
I think the explanation is necessary as documentation - to make it
easier to understand how the code works, what the filter does and for
which kind of material it is useful, but I also think that a 300-line
block of English doesn't really belong in a source file. I attempted to
make it shorter, but that's the shortest I could do (while still
including everything that I think is relevant).
>> +static void ComposeFrame( filter_t *p_filter, picture_t *p_outpic,
>>
---8<---[snip]---8<---
>> subsampled chroma... -JJ */
>>
> If I correctly understand your code, I would say it probably depends on
> how the telecine frames where generated. And I bet they are usually created
> by using one line of every two from the original I420 progressive source. If
> so, you simply revert it without any loss.
>
Yes, to my understanding that's the correct way to do telecine. And
reverting it without loss (if possible) is indeed the idea.
So, I suppose the I420 refers to the chroma format inside each field,
and not to the whole frame? This would explain why it works the way it does.
(When the progressive frame is reconstructed correctly, it obviously
does not matter which of its fields the I420 chroma information is taken
from. Thus the line-alternating copy - effectively switching the chroma
source field every two luma lines - will produce the correct result.)
>> + for( ; p_out< p_out_end ; )
>> + {
>> + vlc_memcpy( p_out, p_in_top,
>> p_inpic_top->p[i_plane].i_pitch );
>> + p_out += p_outpic->p[i_plane].i_pitch;
>> + vlc_memcpy( p_out, p_in_bottom,
>> p_inpic_bottom->p[i_plane].i_pitch );
>> + p_out += p_outpic->p[i_plane].i_pitch;
>> +
>> + p_in_top += 2 * p_inpic_top->p[i_plane].i_pitch;
>> + p_in_bottom += 2 * p_inpic_bottom->p[i_plane].i_pitch;
>> + }
>> + break;
>>
> I may have missed something, but I believe the switch() is in fact useless.
>
True. Thanks for catching. It was left over from the old test and I
missed it. Will fix.
>> + } /* end of disabled development version */
>>
> You should simply remove it as it is dead code.
>
Ok. Will fix.
>> +static int CalculateInterlaceScore(filter_t *p_filter, picture_t* p_pic)
>> +{
>> + assert( p_filter != NULL );
>> + assert( p_pic != NULL );
>> +
>> + /* This is based on the comb detector used in the IVTC filter of
>> Transcode 1.1.5. */
>>
> Is it your own reimplementation of the algorithm?
>
The comb detector? Yes, the implementation is my own, I just used the
existing comb metric. The important part is the (P-C)*(N-C) and the
threshold of 100.
There are some implementation differences: Transcode never explicitly
constructs the temporary frames, but performs the comb detection
directly on the original data. I implemented it this way to keep the
function reusable, and because I thought it was conceptually cleaner
(although it does add some runtime cost).
For frame construction, Transcode pairs TCBP, TCBC and TCBN (using a
unique top field from the input stream for each output frame - this is
ingenious), while my code pairs TNBN, TNBC and TCBN. My code also keeps
track of the scores of all seven possible combinations in the stencil,
because it's required for the cadence detector.
As far as the code is concerned, I wrote the filter from scratch. Parts
of VLC's own deinterlace.c were useful as a template (RenderBob,
RenderX, RenderYadif, and some MMX code), but that's it.
About the ideas used, I had independently come up with the idea of the
lowest interlace score winning (for the triplets in the cadence
detector), but then I saw the same idea used in Transcode in another way
(output frame selection). Currently my implementation uses the idea for
both purposes, and Transcode's is acknowledged in the comment at the
frame construction phase. The cadence detector is original - I haven't
seen this strategy used before.
(Note that Transcode's ivtc filter does no cadence detection - instead,
it works by rejecting the most likely duplicate in each input frame
group of 5, doing this in a separate "decimate" filter.)
Cadence tracking in my filter borrows some very general ideas from the
IVTC filter in TVTime (in xine-lib 1.1.19:
src/post/deinterlace/pulldown.c, tvtime.c) - mainly, actively looking
for cadence breaks by keeping a counter and comparing to detected
position at every frame. Although the necessity for this is rather
obvious, after reading about how badly anime is often edited (no one
makes cuts that don't break the telecine)...
The part marked "scratch paper" in pulldown.c was somewhat useful to get
started with, but in the end I made my own cadence tables and thought
through the output frame timings myself. Compared to TVTime, the
implementation differences are pretty big: TVTime uses the TFF flag,
whereas my filter doesn't. TVTime uses only two input frames, while my
filter uses three.
All the logic based on the three-frame stencil is original. The basic
detection strategy follows pretty easily from the cadence information,
if one starts looking what could be done viewing an NTSC telecine
through a three-frame stencil (hence the cadence tables in the lengthy
comment; these are original, although easily produced). Also the voting
for TFF/BFF telecine in cadence analysis is original.
Finally, about the documentation contained in the lengthy comment. The
cadence detection section is completely original. The explanation of
NTSC telecine is original, but the same information can be easily found
in several online sources. Explanation of frame reconstruction is short,
but original. Timestamp mangling follows easily from the explanation of
telecine, so I've just worked it out and written it down. The lock-on
delay consideration for my detector is original (although trivial).
(On the timestamp note - I noticed now that I've removed something
critical from the final version of the explanation, while I was making
it shorter. Namely: it is clear that the dropped frame must be "e"; this
can be seen from the timings and the cadence tables. Timings: if we have
only one future frame, "e" is the only one whose PTS, comparing to the
film frames, allows dropping it safely (to see this, consider which film
frame needs to be rendered as each new input frame arrives). Cadence
tables: it is ok to drop "e", because the same film frame "1" is
available also at the next PCN position "eab". I'll add a note to this
effect.)
Anyway - maybe "This is based on..." is too strong, and "This uses ideas
from..." would be better? I'm not sure a completely accurate note any
shorter than what I just wrote is necessarily possible ;)
If you want to see Transcode's implementation of comb detection, look at
lines 207-252 in filter/filter_ivtc.c. You can get the sources to
version 1.1.5 here:
http://developer.berlios.de/project/showfiles.php?group_id=10094
On a historical note, I'm not sure who wrote the original code that
ended up in Transcode. Transcode's ivtc filter is by T. Tsiodras, but
those lines are marked as a "blatant copy" with no source indicated. The
most likely source is MPlayer - they got an IVTC filter around the same
time, and I think MPlayer has a "decimate", too. Unless it was
originally written for something else. At least it's not TVTime, as that
implementation is completely different. It could be the DScaler
DirectShow filter or AviSynth, but now I'm just guessing as I haven't
looked at those sources.
Now, with this sidetrack out of the way...
>> + int i_plane;
>> + int i_score = 0;
>> + int y, x;
>>
> Moving x/y inside the loop is a tiny bit better.
>
Will fix.
>> + for( i_plane = 0 ; i_plane< p_pic->i_planes ; i_plane++ )
>> + /* Low-order part: */
>> + movq_r2r( mm2, mm0 );
>> + punpckldq_r2r( mm7, mm0 );
>> + int32_t result3;
>> + movq_r2m( mm0, result3 );
>> + /* High-order part: */
>> + movq_r2r( mm2, mm0 );
>> + punpckhdq_r2r( mm7, mm0 );
>> + int32_t result4;
>> + movq_r2m( mm0, result4 );
>>
> movq (for the 4 assignments) cannot do what you want, making the whole
> function always returning 0 (at least for me)... If you also end up with
> a non valid value, I fear that some of the (quality) tests you did are not
> meaningful.
>
It worked for me, so I didn't notice this until now. Thanks.
What I wanted to do there was to move the doubleword result into memory.
So indeed, the instruction movq is wrong - it should be movd instead.
When fixing this, I found one other serious bug in that function -
punpcklbw instead of punpcklwd (and same for hbw/hwd) when unpacking the
multiplication results. Oops. I wonder how that worked at all. I guess I
got lucky.
So, you're right, I need to redo my output quality tests. I'll also need
to re-tune the residual interlace threshold :)
While at it, I converted the whole inner loop into MMX, as I had
originally planned. I had misread the MMX reference - didn't notice at
first that MMX actually does provide a signed packed comparison (I
usually work in high-level languages... excuses, excuses ;) ). With
that, this is easy.
The score is kept in a register. We need five fewer memory accesses per
iteration of the inner loop, since i_score and result1-4 no longer need
to be accessed. We only need three 32-bit fetches, and three pointer
updates, which cannot be avoided.
Details will follow in the next version of the patch.
>> +enum ivtc_field_pair { FIELD_PAIR_TPBP = 0, FIELD_PAIR_TPBC = 1,
>> + FIELD_PAIR_TCBP = 2, FIELD_PAIR_TCBC = 3,
>> + FIELD_PAIR_TCBN = 4, FIELD_PAIR_TNBC = 5,
>> + FIELD_PAIR_TNBN = 6 };
>> +typedef enum ivtc_field_pair ivtc_field_pair;
>>
> typedef enum {
> [blah]
> } ivtc_field_pair;
> is enough.
>
Will fix.
>> + assert( p_next != NULL );
>> + int i_chroma = p_filter->fmt_in.video.i_chroma;
>> + int i_size_x = p_next->p[Y_PLANE].i_pitch;
>> + int i_size_y = p_next->p[Y_PLANE].i_visible_lines;
>> + /* Note that both frames in the working area are always allocated
>> at the same time,
>> + so it is enough to check [0]. */
>>
> As you fill p_history with pictures created by picture_NewFromFormat()
> and as you can consider that the video size will not dynamically change,
> it is not needed to support size changes here. It will remove a lot of code.
>
On this I have a question... can we be sure it doesn't change even if
the user switches to another video source? Say, from a DVD to a file,
which didn't come from an NTSC source. Does the core always restart the
filter chain in such a case? If so, can we rely on it doing so in the
future, too?
(My main future concern is that there have been posts asking about
seamless playback on the forums. If there are plans for that in 1.2.x
(as I understood), will it affect the filter chain restart?)
>> + char temp[1000];
>> + sprintf(temp, "IVTC: removing residual interlacing (score
>> %d> %d)", i_result_interlace_score, RENDERIVTC_INTERLACE_THRESHOLD );
>> + msg_Dbg( p_filter, temp );
>>
> Why using a temporary buffer? msg_Dbg() support printf like arguments.
>
Because I hadn't noticed that ;)
Thanks for the tip. Will fix.
>> + if( p_ivtc->pi_interlace_scores[FIELD_PAIR_TNBN]>
>> RENDERIVTC_INTERLACE_THRESHOLD)
>> + RenderX( p_dst, p_next );
>> + else
>> + picture_Copy( p_dst, p_next );
>> +
>> + return VLC_SUCCESS;
>> + }
>> + else /* now the only possibility is (!p_prev&& p_curr&& p_next) */
>>
> It's better to add an assert() than a comment in such cases.
>
True. Will fix.
Or maybe even actually check the case, and error out in a final else
block if none of the conditions match. It's not like it costs too many
CPU cycles ;)
>> +
>> + /* IVTC */
>> + for( int i = 0; i< IVTC_WORKING_AREA_SIZE; i++ )
>> + {
>> + if( p_ivtc->pp_ivtc_working_area[i] )
>> + picture_Release( p_ivtc->pp_ivtc_working_area[i] );
>> + p_ivtc->pp_ivtc_working_area[i] = NULL;
>> + }
>> + p_ivtc->b_possible_cadence_break_detected = false;
>> + p_ivtc->i_cadence_pos = CADENCE_POS_INVALID;
>> + p_ivtc->i_telecine_field_dominance = TFD_INVALID;
>> + p_ivtc->i_ivtc_filter_mode = IVTC_MODE_DETECTING;
>> + for( int i = 0; i< IVTC_NUM_FIELD_PAIRS; i++ )
>> + p_ivtc->pi_interlace_scores[i] = 0;
>> + for( int i = 0; i< IVTC_DETECTION_HISTORY_SIZE; i++ )
>> + p_ivtc->pi_cadence_pos_history[i] = CADENCE_POS_INVALID; /*
>> detected positions */
>> + p_ivtc->i_old_chroma = -1;
>> + p_ivtc->i_old_size_x = -1;
>> + p_ivtc->i_old_size_y = -1;
>>
> It seems that some part could be factorized with the init code.
>
Yes. I'll make a function...
Before I fix this, I would like your comments on the video size change
issue and what to do with the lengthy comment (filter documentation).
Thanks again for the review :)
-J
More information about the vlc-devel
mailing list