[vlc-devel] [PATCH] Phosphor deinterlacer, reworked += 4

Juha Jeronen juha.jeronen at jyu.fi
Mon Mar 28 00:34:01 CEST 2011


Hi,

Thanks for the comments and for taking the time to review.

I'll prepare an updated patch and post when done. Some specific comments
below.


On 03/27/2011 07:05 PM, Laurent Aimar wrote:
> Hi,
>
> On Thu, Mar 24, 2011 at 08:56:18PM +0200, Juha Jeronen wrote:
>> >From 1388f52b8734e180f7dd439b2e9d7f9581d02a73 Mon Sep 17 00:00:00 2001
>> From: Juha Jeronen <juha.jeronen at jyu.fi>
>> Date: Thu, 24 Mar 2011 20:33:35 +0200
>> Subject: [PATCH 1/2] RenderX(): cache vlc_CPU()
>  No remarks, I will apply it.

Ok. Thanks :)


>> >From e3f851bb9822379052ec2071a63b6c7896048e3c Mon Sep 17 00:00:00 2001
>> From: Juha Jeronen <juha.jeronen at jyu.fi>
>> Date: Thu, 24 Mar 2011 20:50:52 +0200
>> Subject: [PATCH 2/2] Phosphor deinterlacer
>   
>> +/* Converts a full-frame plane_t to a field plane_t */
>> +static void PlaneToFieldPlane( plane_t *p_dst, const plane_t *p_src,
>> +                               int i_field );
>  The order of the parameter and the function name does not seem
> to match (FieldFromPlane or swaping dst/src maybe). Not important.

Oops. :)

The order of dst/src is consistent with the rest of deinterlace.c.
FieldFromPlane sounds good. Will rename.


>>  static const char *const ppsz_filter_options[] = {
>> -    "mode", NULL
>> +    "mode", "phosphor-chroma", "phosphor-dimmer", NULL
>>  };
>  Could you put NULL on an independant line? (it would simplify
> future changes if any). Not important.

Yes. Will change.


>> +static void DarkenField( picture_t *p_dst, int i_field, int i_strength )
>> +{
>> +    assert( p_dst != NULL );
>> +    assert( i_field == 0 || i_field == 1 );
>> +    assert( i_strength >= 1 && i_strength <= 3 );
>> +
>> +    unsigned ui_cpu = vlc_CPU();
>> +
>> +    /* Bitwise ANDing with this clears the i_strength highest bits
>> +       of each byte */
>> +    uint64_t remove_high_u64;
>> +#ifdef CAN_COMPILE_MMXEXT
>> +    uint64_t i_strength_u64 = i_strength; /* for MMX version (needs to know
>> +                                             number of bits) */
>> +#endif
>> +    int_fast16_t div; /* for C version of 4:2:2 chroma handler */
>> +    if( i_strength == 1 )
>> +    {
>> +        remove_high_u64 = INT64_C(0x7F7F7F7F7F7F7F7F);
>> +        div = 2;
>> +    }
>> +    else if( i_strength == 2 )
>> +    {
>> +        remove_high_u64 = INT64_C(0x3F3F3F3F3F3F3F3F);
>> +        div = 4;
>> +    }
>> +    else /* i_strength == 3 */
>> +    {
>> +        remove_high_u64 = INT64_C(0x1F1F1F1F1F1F1F1F);
>> +        div = 8;
>> +    }
>  div = 1 << i_strength;
>  remove_high_u8 = 0xFF >> i_strength;
>  remove_high_u64 = remove_high_u8 * INT64_C(0x0101010101010101);
> Would be simpler and more generic.

Thanks. For some reason I didn't think of the *INT64_C(0x01...01)
possibility when I was coding that, though I do similar things all the
time in MATLAB ( some_scalar * ones(m,n) ).

Will fix.


>> +#ifdef CAN_COMPILE_MMXEXT
>> +                /* See also easy-to-read C version below. */
>> +                if( ui_cpu & CPU_CAPABILITY_MMXEXT )
>> +                {
>> +                    static const mmx_t b128 = { .uq = 0x8080808080808080ULL };
>> +                    movq_m2r( b128, mm5 );
>> +                    movq_m2r( i_strength_u64,  mm6 );
>> +                    movq_m2r( remove_high_u64, mm7 );
>> +
>> +                    uint64_t *po = (uint64_t *)p_out;
>> +                    for( int x = 0 ; x < w8; x += 8 )
>> +                    {
>> +                        movq_m2r( (*po), mm0 );
>> +
>> +                        movq_r2r( mm5, mm2 ); /* 128 */
>> +                        movq_r2r( mm0, mm1 ); /* copy of data */
>> +                        psubusb_r2r( mm2, mm1 ); /* mm1 = max(data - 128, 0) */
>> +                        psubusb_r2r( mm0, mm2 ); /* mm2 = max(128 - data, 0) */
>> +                        /* mm0 no longer needed */
> Once here, I think you can simply do:
>  psrlq_r2r(i_strength_u64(mm6), mm1);
>  psrlq_r2r(i_strength_u64(mm6), mm2);
>  pand_r2r(remove_high_u64(mm7), mm1);
>  pand_r2r(remove_high_u64(mm7), mm2);
>  psubb(mm2, mm1);
>  padd(b128, mm1);
>  movq(mm1, *po);
> No ?

Mm... looking at this solution, yes. Taking advantage of the two's
complement representation that way (with the psubb/padd) didn't occur to
me. Your MMX skills rock, as usual ;)

23 vs. 12 instructions per loop. This just might make it faster,
assuming it's not memory bound :)

Will change (and then re-time to find out the effect on speed).


>  Another way would be to use the fact that x/2^n can be optimized
> into (x  + (x < 0) * (2^n - 1)) >> n, but because psrab does not
> exist, I end up using more instructions.

(Ah, I hadn't run into this optimization. Thanks.)


>> +                    /* handle the width remainder */
>> +                    if( wm8 )
>> +                    {
>> +                        uint8_t *po8 = (uint8_t *)po;
>> +                        int_fast16_t I; /* must be able to handle -128..255 */
>  I am not sure why you need this temporary variable.
> C ensures that all operands to an operator are first upconverted to 'int' type
> if they are smaller.

Thanks, I wasn't aware of that. Is that behaviour new in C99, or did I
just miss it back in the day? :)

Oh, well, if I understand section 6.3.1.1, item 2, of C99 correctly, it
should convert uint8 into signed int (plain "int").

If so, this can indeed be made slightly simpler. I suggest:

if( wm8 )
{
    uint8_t *po8 = (uint8_t *)po;
    for( int x = 0 ; x < wm8 ; ++x, ++po8)
        (*po8) = 128 + ( ((*po8) - 128) / div );
}

...and then replacing the div by 1 << i_strength, as suggested below.

Basically, I wanted to ensure signed arithmetic, and thought that inline
casting looked messy.


>> +                    /* 4:2:2 chroma handler, C version */
>> +                    uint8_t *po = p_out;
>> +                    int_fast16_t I; /* must be able to handle -128..255 */
>> +                    for( int x = 0 ; x < w; ++x )
>> +                    {
>> +                        I = (*po);
>> +                        /* the output is closer to 128 than the input;
>> +                            the result always fits in uint8 */
>> +                        (*po++) = (uint8_t)( 128 + ( (I - 128) / div ) );
>  I wonder if using directly (1 << i_strength) would not be better for the
> compiler. (A sign division by a multiple of 2 is easily optimized).

Hmm, I suppose the parameter i_strength could be made const, and then do
that.

Will change.


>> +    if( p_sys->phosphor.i_dimmer_strength > 0 )
>> +        DarkenField( p_dst, !i_field, p_sys->phosphor.i_dimmer_strength );
>  If you want to make things faster, doing the copy done in composeframe
> and the field darkening in one pass instead of 2, could be done (or at least
> tested), or even simply doing the copy on 1 line and then the darkening on it
> just after (less cache pressure).
>  But this will wait until the current way is commited.

I would like to make it fast, yes :)

Good point about cache pressure. I hadn't considered that. Passing twice
over each line in turn is probably the sane way to do it. Not
speed-optimal, but should be faster than the current version, and is
easy to keep bug-free.

I thought about the more complicated way (everything in one pass), but
decided at that point that the implementation would become too complex.

A full one-pass approach should be doable by giving ComposeFrame() a
function pointer to a user-defined line processor, and an opaque user
data parameter to go along with that (for passing through state from
caller that is not seen by ComposeFrame()).

ComposeFrame() would decide where the data for each line comes from. The
role of the line processor would be to copy pixels, and optionally
modify them while doing that.

There still be some dragons in how the chroma handling mode should
affect how the line processor is called. Maybe the user routine should
be defined as a field processor instead - in that case it may be
possible (but probably not very simple) to figure out in a general way
when the additional processing (on top of a simple copy) is desired and
when not.

(As for why, here is the particular example use case for Phosphor. Luma
should always be processed. For chroma, if input is 4:2:0, then only the
CC_UPCONVERT mode should process the chroma, but if input is 4:2:2, then
CC_ALTLINE should do it (because it is the only mode that is used for
4:2:2 input). In all cases, which field should be dimmed depends on the
parameters i_order and i_field given to RenderPhosphor() - which are not
(and I think should not be) seen by ComposeFrame().)

On a related note, if we use plane_CopyPixels(), I think the current
two-pass strategy is the only possibility, because the deinterlace
plugin no longer controls the copy loop. Thus inserting custom
operations at every line during the copy (even for the simpler approach)
is not possible. For changing this, I see three options: 1) Define our
own "memcpy" routine for this (but that approach probably won't
accommodate passing through the caller state), 2) Make a new version of
plane_CopyPixels that allows for a callback (plane_ProcessPixels()?),
and export that from the core... or 3) Don't use plane_CopyPixels(), and
do the copying manually, running each line through the line processor.

Regarding making things faster, I suppose processing 16 pixels at once
(using SSE2 when available instead of MMX) might also help. But for
that, I'll need to learn SSE2, and test if it's any faster in practice
(i.e. find out whether the MMX version is CPU or memory bound).

So yeah, I agree, all of this can wait until the next version :)

(Scheduled at some point of time after IVTC and the big refactor...)

(I'll also probably experiment with the "future Phosphor" idea at that
point.)


>>      /* Set output timestamps, if the algorithm didn't request CUSTOM_PTS for this frame. */
>> @@ -1948,6 +2696,7 @@ static int Open( vlc_object_t *p_this )
>>      p_sys->i_frame_offset = 0; /* start with default value (first-ever frame cannot have offset) */
>>      for( int i = 0; i < HISTORY_SIZE; i++ )
>>          p_sys->pp_history[i] = NULL;
>> +    p_sys->phosphor.i_chroma_for_420 = PC_ALTLINE;
>  Not sure why you do that here (as you also init the whole p_sys->phosphor
> just below).

Umm... me neither? :)

Will fix.


>>  #if defined(CAN_COMPILE_C_ALTIVEC)
>>      if( vlc_CPU() & CPU_CAPABILITY_ALTIVEC )
>> @@ -2002,6 +2751,52 @@ static int Open( vlc_object_t *p_this )
>>      SetFilterMethod( p_filter, psz_mode, p_filter->fmt_in.video.i_chroma );
>>      free( psz_mode );
>>  
>> +    if( p_sys->i_mode == DEINTERLACE_PHOSPHOR )
>> +    {
>> +        int i_c420 = var_GetInteger( p_filter,
>> +                                     FILTER_CFG_PREFIX "phosphor-chroma" );
>> +        if( !i_c420 ) /* not set */
>> +        {
>> +            msg_Dbg( p_filter, "Phosphor 4:2:0 input chroma mode not set, "\
>> +                               "using default" );
>> +            i_c420 = PC_ALTLINE;
>> +        }
>  I don't think that the value 0 needs a special case.

Ah, true, the general out-of-range check will catch it. Will fix.


>> +        if( i_c420 != PC_LATEST  &&  i_c420 != PC_ALTLINE  &&
>> +            i_c420 != PC_BLEND   && i_c420 != PC_UPCONVERT )
>> +        {
>> +            msg_Dbg( p_filter, "Phosphor 4:2:0 input chroma mode out of "\
>> +                               "range (valid: 1, 2, 3 or 4), using default" );
>> +            i_c420 = PC_ALTLINE;
>> +        }
>  I am not sure if it is really needed as you use change_integer_list() on it.
> Anyone know if a variable can be set outside the values defined by
> change_integer_list?

Maybe far-fetched, but someone modifying the config file by hand and
making a typo (or not knowing the valid range)?


>> +        msg_Dbg( p_filter, "using Phosphor 4:2:0 input chroma mode %d",
>> +                           i_c420 );
>> +        /* This maps directly to the phosphor_chroma_t enum. */
>> +        p_sys->phosphor.i_chroma_for_420 = i_c420;
>> +
>> +        int i_dimmer = var_GetInteger( p_filter,
>> +                                       FILTER_CFG_PREFIX "phosphor-dimmer" );
>> +        if( !i_dimmer )
>> +        {
>> +            msg_Dbg( p_filter, "Phosphor dimmer strength not set, "\
>> +                               "using default" );
>> +            i_dimmer = 2; /* low */
>> +        }
>  Idem than for phosphor-chroma.
>> +        if( i_dimmer < 1  ||  i_dimmer > 4 )
>> +        {
>> +            msg_Dbg( p_filter, "Phosphor dimmer strength out of range "\
>> +                               "(valid: 1, 2, 3 or 4), using default" );
>> +            i_dimmer = 2; /* low */
>> +        }
>  Idem than for phosphor-chroma.

Ok (x2).


 -J




More information about the vlc-devel mailing list