[vlc-devel] [PATCH] Do adjust filter in SSE2 and SSE4.1

Jean-Baptiste Kempf jb at videolan.org
Mon Jul 18 18:31:29 CEST 2011


Hello Martin,

First, thanks a lot for the work.

On Fri, Jul 15, 2011 at 09:08:18PM +0200, gamajun at seznam.cz wrote :
> +    *p_out++ = clip_uint8_vlc( (( ((i_u * i_cos + i_v * i_sin - i_x) >> 8) \
> +                           * i_sat) >> 8) + 128); \
Not sure you really need this many parenthesis...

> +                WRITE_UV_CLIP_PLANAR_SSE4_1();
> +              //  WRITE_UV_CLIP_PLANAR_SSE4_1();
Why keeping the Second line?

> +            p_in += p_pic->p[U_PLANE].i_pitch
> +                  - p_pic->p[U_PLANE].i_visible_pitch;
> +            p_in_v += p_pic->p[V_PLANE].i_pitch
> +                    - p_pic->p[V_PLANE].i_visible_pitch;
> +            p_out += p_outpic->p[U_PLANE].i_pitch
> +                   - p_outpic->p[U_PLANE].i_visible_pitch;
> +            p_out_v += p_outpic->p[V_PLANE].i_pitch
> +                     - p_outpic->p[V_PLANE].i_visible_pitch;
Some alignment could improve the readability.

> +#elif defined(CAN_COMPILE_SSE2)
Maybe this should be in a different function?

> +#if defined(CAN_COMPILE_SSE4_1)
> +    if ( vlc_CPU() & CPU_CAPABILITY_SSE4_1 && i_sat > 256 )
> +    {
> +#define WRITE_UV_CLIP() \
> +    i_u = *p_in; p_in += 4; i_v = *p_in_v; p_in_v += 4; \
> +    *p_out = clip_uint8_vlc( (( ((i_u * i_cos + i_v * i_sin - i_x) >> 8) \
> +                           * i_sat) >> 8) + 128); \
> +    p_out += 4; \
> +    *p_out_v = clip_uint8_vlc( (( ((i_v * i_cos - i_u * i_sin - i_y) >> 8) \
> +                           * i_sat) >> 8) + 128); \
> +    p_out_v += 4
> +
> +        uint8_t i_u, i_v;
> +
> +        WRITE_UV_CLIP_PACKED_PREPARE;
> +
> +        for( ; p_in < p_in_end ; )
> +        {
> +            p_line_end = p_in + i_visible_pitch - 8 * 4;
> +
> +            for( ; p_in < p_line_end ; )
> +            {
> +                /* Do 8 pixels at a time */
> +                WRITE_UV_CLIP_PACKED_SSE4_1();
> +            }
> +
> +            p_line_end += 8 * 4;
> +
> +            for( ; p_in < p_line_end ; )
> +            {
> +                WRITE_UV_CLIP();
> +            }
> +
> +            p_in += i_pitch - i_visible_pitch;
> +            p_in_v += i_pitch - i_visible_pitch;
> +            p_out += i_pitch - i_visible_pitch;
> +            p_out_v += i_pitch - i_visible_pitch;
Mostly same remarks as above.

No opinion about the ASM. Speedups numbers?

Best Regards,

-- 
Jean-Baptiste Kempf
http://www.jbkempf.com/ - +33 672 704 734
Sent from my Electronic Device



More information about the vlc-devel mailing list