[vlc-devel] [PATCH] Do adjust filter in SSE2 and SSE4.1
Martin Briza
Gamajun at seznam.cz
Mon Jul 18 19:06:13 CEST 2011
Hi,
> On Fri, Jul 15, 2011 at 09:08:18PM +0200, gamajun at seznam.cz wrote :
>> + *p_out++ = clip_uint8_vlc( (( ((i_u * i_cos + i_v * i_sin - i_x)
>> >> 8) \
>> + * i_sat) >> 8) + 128); \
> Not sure you really need this many parenthesis...
This is just a recycled macro, that has been in place before I started
working on it, but I think it's ok.
>> + WRITE_UV_CLIP_PLANAR_SSE4_1();
>> + // WRITE_UV_CLIP_PLANAR_SSE4_1();
> Why keeping the Second line?
Uh, this is just a residue from testing I didn't notice.
>> + p_in += p_pic->p[U_PLANE].i_pitch
>> + - p_pic->p[U_PLANE].i_visible_pitch;
>> + p_in_v += p_pic->p[V_PLANE].i_pitch
>> + - p_pic->p[V_PLANE].i_visible_pitch;
>> + p_out += p_outpic->p[U_PLANE].i_pitch
>> + - p_outpic->p[U_PLANE].i_visible_pitch;
>> + p_out_v += p_outpic->p[V_PLANE].i_pitch
>> + - p_outpic->p[V_PLANE].i_visible_pitch;
> Some alignment could improve the readability.
Agreed.
>> +#elif defined(CAN_COMPILE_SSE2)
> Maybe this should be in a different function?
A different function? You mean to split static picture_t *FilterPlanar in
more functions?
> No opinion about the ASM. Speedups numbers?
It's about 50% on my system, but highly dependant on size of the processed
image (better on bigger images).
I'll wait for your reviews and after you tell me everything you think
about it, I'll make another patch to repair what you notice.
Regards,
Martin Briza
More information about the vlc-devel
mailing list