[vlc-devel] [WIP] sharpen: reimplement with SKIPSM
Felix Abecassis
felix.abecassis at gmail.com
Fri May 23 11:25:29 CEST 2014
2014-05-23 10:12 GMT+02:00 Tristan Matthews <le.businessman at gmail.com>:
> On Thu, May 22, 2014 at 3:42 PM, Felix Abecassis
> <felix.abecassis at gmail.com> wrote:
> >
> > Interesting.
> >
> > Did you benchmark the two implementations?
>
> Yup, somewhat crudely thus far though (top, rdtsc, clock(),
> kcachegrind). It seems that the new implementation consistently
> performs faster/with fewer instructions so far. If you have any tips
> for useful metrics/results, I could post them as well.
>
> This should be fine in order to get a first approximation of the
performance. Please share your results :).
I'm wondering how this SKIPSM implementation compares to a simple but
vectorized implementation. The inner loop of the original filter might be
automatically vectorized if we help the compiler a bit.
> >
> > Can the implementation be easily extended to a larger kernel width?
>
> Possibly, but the paper I referenced was specifically for 3x3 kernels
> and since this is the current behaviour of the sharpen filter I didn't
> dig much further. In another paper, "Efficient algorithm for Gaussian
> blur using finite-state machines", the same author does discuss 3x5
> and 5x5 gaussian blur implementations, and compares the algorithmic
> complexity of these vs. an NxN SKIPSM.
For large kernels, it is probably better to use an horizontal pass and a
vertical pass if the filter is separable. These two passes can also be
vectorized. But you are right that this is out of the scope of this patch.
> He also mentions that an NxN
> SKIPSM can be decomposed into several 3x3 SKIPSMs for comparable
> performance.
>
> >
> >
> >
> > 2014-05-22 17:36 GMT+02:00 Tristan Matthews <le.businessman at gmail.com>:
> >>
> >> On Thu, May 22, 2014 at 11:30 AM, Tristan Matthews <
> le.businessman at gmail.com> wrote:
> >>>
> >>> SKIPSM (Separated-Kernel Image Processing using finite-State Machines)
> allows
> >>> sharpening with fewer repeated operations. Two finite-state machines
> >>> (a 2 element row FSM, and a width-element column FSM) are used to to
> avoid
> >>> duplicate reads/arithmetic.
> >>>
> >>> This is a WIP. sharpen2 is meant to replace sharpen but both are
> included here
> >>> for ease of live comparison.
> >>>
> >>> Reference:
> >>>
> http://www-personal.engin.umd.umich.edu/~jwvm/ece488588/Papers/skipsm/17_Misc3x3.pdf
> >>>
> >>> Maybe refs #9458
> >>> ---
> >>> modules/MODULES_LIST | 1 +
> >>> modules/gui/qt4/components/extended_panels.cpp | 3 +
> >>> modules/gui/qt4/ui/video_effects.ui | 46 ++++
> >>> modules/video_filter/Modules.am | 2 +
> >>> modules/video_filter/sharpen2.c | 298
> +++++++++++++++++++++++++
> >>> 5 files changed, 350 insertions(+)
> >>> create mode 100644 modules/video_filter/sharpen2.c
> >>>
> >>> diff --git a/modules/MODULES_LIST b/modules/MODULES_LIST
> >>> index 61ad62b..bc60143 100644
> >>> --- a/modules/MODULES_LIST
> >>> +++ b/modules/MODULES_LIST
> >>> @@ -309,6 +309,7 @@ $Id$
> >>> * sepia: Sepia video filter
> >>> * sftp: SFTP network access module
> >>> * sharpen: Sharpen video filter
> >>> + * sharpen2: Sharpen2 video filter
> >>> * shine: MP3 encoder using Shine, a fixed point implementation
> >>> * shm: Shared memory framebuffer access module
> >>> * sid: Sidplay demuxer
> >>> diff --git a/modules/gui/qt4/components/extended_panels.cpp
> b/modules/gui/qt4/components/extended_panels.cpp
> >>> index 84d16ae..9583196 100644
> >>> --- a/modules/gui/qt4/components/extended_panels.cpp
> >>> +++ b/modules/gui/qt4/components/extended_panels.cpp
> >>> @@ -150,6 +150,9 @@ ExtVideo::ExtVideo( intf_thread_t *_p_intf,
> QTabWidget *_parent ) :
> >>> SETUP_VFILTER( sharpen )
> >>> SETUP_VFILTER_OPTION( sharpenSigmaSlider, valueChanged( int ) )
> >>>
> >>> + SETUP_VFILTER( sharpen2 )
> >>> + SETUP_VFILTER_OPTION( sharpen2SigmaSlider, valueChanged( int ) )
> >>> +
> >>> SETUP_VFILTER( ripple )
> >>>
> >>> SETUP_VFILTER( wave )
> >>> diff --git a/modules/gui/qt4/ui/video_effects.ui
> b/modules/gui/qt4/ui/video_effects.ui
> >>> index 6284e22..a6564d7 100644
> >>> --- a/modules/gui/qt4/ui/video_effects.ui
> >>> +++ b/modules/gui/qt4/ui/video_effects.ui
> >>> @@ -316,6 +316,50 @@
> >>> </layout>
> >>> </widget>
> >>> </item>
> >>> + <item row="3" column="1">
> >>> + <widget class="QGroupBox" name="sharpen2Enable">
> >>> + <property name="title">
> >>> + <string>Sharpen2</string>
> >>> + </property>
> >>> + <property name="checkable">
> >>> + <bool>true</bool>
> >>> + </property>
> >>> + <property name="checked">
> >>> + <bool>false</bool>
> >>> + </property>
> >>> + <layout class="QGridLayout">
> >>> + <item row="0" column="0">
> >>> + <widget class="QLabel" name="label_29">
> >>> + <property name="text">
> >>> + <string>Sigma</string>
> >>> + </property>
> >>> + <property name="buddy">
> >>> + <cstring>sharpen2SigmaSlider</cstring>
> >>> + </property>
> >>> + </widget>
> >>> + </item>
> >>> + <item row="0" column="1">
> >>> + <widget class="QSlider" name="sharpen2SigmaSlider">
> >>> + <property name="maximum">
> >>> + <number>200</number>
> >>> + </property>
> >>> + <property name="pageStep">
> >>> + <number>10</number>
> >>> + </property>
> >>> + <property name="orientation">
> >>> + <enum>Qt::Horizontal</enum>
> >>> + </property>
> >>> + <property name="tickPosition">
> >>> + <enum>QSlider::TicksBelow</enum>
> >>> + </property>
> >>> + <property name="tickInterval">
> >>> + <number>50</number>
> >>> + </property>
> >>> + </widget>
> >>> + </item>
> >>> + </layout>
> >>> + </widget>
> >>> + </item>
> >>> </layout>
> >>> </widget>
> >>> <widget class="QWidget" name="tab_3">
> >>> @@ -1950,6 +1994,8 @@
> >>> <tabstop>gradfunRadiusSlider</tabstop>
> >>> <tabstop>grainEnable</tabstop>
> >>> <tabstop>grainVarianceSlider</tabstop>
> >>> + <tabstop>sharpen2Enable</tabstop>
> >>> + <tabstop>sharpen2SigmaSlider</tabstop>
> >>> <tabstop>cropTopPx</tabstop>
> >>> <tabstop>cropBotPx</tabstop>
> >>> <tabstop>topBotCropSync</tabstop>
> >>> diff --git a/modules/video_filter/Modules.am
> b/modules/video_filter/Modules.am
> >>> index 3bb8cdb..ae0b63c 100644
> >>> --- a/modules/video_filter/Modules.am
> >>> +++ b/modules/video_filter/Modules.am
> >>> @@ -78,6 +78,7 @@ video_filter_LTLIBRARIES += librotate_plugin.la
> >>> SOURCES_colorthres = colorthres.c
> >>> SOURCES_extract = extract.c
> >>> SOURCES_sharpen = sharpen.c
> >>> +SOURCES_sharpen2 = sharpen2.c
> >>> SOURCES_erase = erase.c
> >>> SOURCES_bluescreen = bluescreen.c
> >>> SOURCES_alphamask = alphamask.c
> >>> @@ -153,6 +154,7 @@ video_filter_LTLIBRARIES += \
> >>> libscene_plugin.la \
> >>> libsepia_plugin.la \
> >>> libsharpen_plugin.la \
> >>> + libsharpen2_plugin.la \
> >>> libsubsdelay_plugin.la \
> >>> libtransform_plugin.la \
> >>> libwave_plugin.la \
> >>> diff --git a/modules/video_filter/sharpen2.c
> b/modules/video_filter/sharpen2.c
> >>> new file mode 100644
> >>> index 0000000..cdabc20
> >>> --- /dev/null
> >>> +++ b/modules/video_filter/sharpen2.c
> >>> @@ -0,0 +1,298 @@
> >>>
> +/*****************************************************************************
> >>> + * sharpen2.c: Sharpen video filter
> >>> +
> *****************************************************************************
> >>> + * Copyright (C) 2003-2007 VLC authors and VideoLAN
> >>> + * $Id$
> >>> + *
> >>> + * Author: Jérémy DEMEULE <dj_mulder at djduron dot no-ip dot org>
> >>> + * Jean-Baptiste Kempf <jb at videolan dot org>
> >>> + *
> >>> + * This program is free software; you can redistribute it and/or
> modify it
> >>> + * under the terms of the GNU Lesser General Public License as
> published by
> >>> + * the Free Software Foundation; either version 2.1 of the License, or
> >>> + * (at your option) any later version.
> >>> + *
> >>> + * This program is distributed in the hope that it will be useful,
> >>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> >>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> >>> + * GNU Lesser General Public License for more details.
> >>> + *
> >>> + * You should have received a copy of the GNU Lesser General Public
> License
> >>> + * along with this program; if not, write to the Free Software
> Foundation,
> >>> + * Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301, USA.
> >>> +
> *****************************************************************************/
> >>> +
> >>> +/* The sharpen filter. */
> >>> +/*
> >>> + * static int filter[] = { -1, -1, -1,
> >>> + * -1, 8, -1,
> >>> + * -1, -1, -1 };
> >>> + */
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Preamble
> >>> +
> *****************************************************************************/
> >>> +
> >>> +#ifdef HAVE_CONFIG_H
> >>> +# include "config.h"
> >>> +#endif
> >>> +
> >>> +#include <vlc_common.h>
> >>> +#include <vlc_plugin.h>
> >>> +
> >>> +#include <vlc_filter.h>
> >>> +#include "filter_picture.h"
> >>> +
> >>> +#define SIG_TEXT N_("Sharpen strength (0-2)")
> >>> +#define SIG_LONGTEXT N_("Set the Sharpen strength, between 0 and 2.
> Defaults to 0.05.")
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Local prototypes
> >>> +
> *****************************************************************************/
> >>> +static int Create ( vlc_object_t * );
> >>> +static void Destroy ( vlc_object_t * );
> >>> +
> >>> +static picture_t *Filter( filter_t *, picture_t * );
> >>> +static int SharpenCallback( vlc_object_t *, char const *,
> >>> + vlc_value_t, vlc_value_t, void * );
> >>> +
> >>> +#define SHARPEN2_HELP N_("Augment contrast between contours.")
> >>> +#define FILTER_PREFIX "sharpen2-"
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Module descriptor
> >>> +
> *****************************************************************************/
> >>> +vlc_module_begin ()
> >>> + set_description( N_("Sharpen2 video filter") )
> >>> + set_shortname( N_("Sharpen2") )
> >>> + set_help(SHARPEN2_HELP)
> >>> + set_category( CAT_VIDEO )
> >>> + set_subcategory( SUBCAT_VIDEO_VFILTER )
> >>> + set_capability( "video filter2", 0 )
> >>> + add_float_with_range( "sharpen2-sigma", 0.05, 0.0, 2.0,
> >>> + SIG_TEXT, SIG_LONGTEXT, false )
> >>> + add_shortcut( "sharpen2" )
> >>> + set_callbacks( Create, Destroy )
> >>> +vlc_module_end ()
> >>> +
> >>> +static const char *const ppsz_filter_options[] = {
> >>> + "sigma", NULL
> >>> +};
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * filter_sys_t: Sharpen video filter descriptor
> >>> +
> *****************************************************************************
> >>> + * This structure is part of the video output thread descriptor.
> >>> + * It describes the Sharpen specific properties of an output thread.
> >>> +
> *****************************************************************************/
> >>> +
> >>> +struct filter_sys_t
> >>> +{
> >>> + vlc_mutex_t lock;
> >>> + int tab_precalc[512];
> >>> + int16_t *column_state[2];
> >>> +};
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * clip: avoid negative value and value > 255
> >>> +
> *****************************************************************************/
> >>> +inline static uint8_t clip( int32_t a )
> >>> +{
> >>> + return (a > 255) ? 255 : (a < 0) ? 0 : a;
> >>> +}
> >>> +
> >>> +static void init_precalc_table(filter_sys_t *p_filter, float sigma)
> >>> +{
> >>> + for(int i = 0; i < 512; ++i)
> >>> + {
> >>> + p_filter->tab_precalc[i] = (i - 256) * sigma;
> >>> + }
> >>> +}
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Create: allocates Sharpen video thread output method
> >>> +
> *****************************************************************************
> >>> + * This function allocates and initializes a Sharpen vout method.
> >>> +
> *****************************************************************************/
> >>> +static int Create( vlc_object_t *p_this )
> >>> +{
> >>> + filter_t *p_filter = (filter_t *)p_this;
> >>> +
> >>> + const vlc_fourcc_t fourcc = p_filter->fmt_in.video.i_chroma;
> >>> + const vlc_chroma_description_t *p_chroma =
> vlc_fourcc_GetChromaDescription( fourcc );
> >>> + if( !p_chroma || p_chroma->plane_count != 3 ||
> p_chroma->pixel_size != 1 ) {
> >>> + msg_Err( p_filter, "Unsupported chroma (%4.4s)",
> (char*)&fourcc );
> >>> + return VLC_EGENERIC;
> >>> + }
> >>> +
> >>> + /* Allocate structure */
> >>> + p_filter->p_sys = malloc( sizeof( filter_sys_t ) );
> >>> + if( p_filter->p_sys == NULL )
> >>> + return VLC_ENOMEM;
> >>> +
> >>> + for( int i = 0; i < 2; ++i) {
> >>> + p_filter->p_sys->column_state[i] = malloc(
> sizeof(*p_filter->p_sys->column_state[i]) *
> >>> +
> p_filter->fmt_in.video.i_visible_width );
> >>> + if( p_filter->p_sys->column_state[i] == NULL )
> >>> + return VLC_ENOMEM;
> >>> + }
> >>> +
> >>> + p_filter->pf_video_filter = Filter;
> >>> +
> >>> + config_ChainParse( p_filter, FILTER_PREFIX, ppsz_filter_options,
> >>> + p_filter->p_cfg );
> >>> +
> >>> + float sigma = var_CreateGetFloatCommand( p_filter, FILTER_PREFIX
> "sigma" );
> >>> + init_precalc_table(p_filter->p_sys, sigma);
> >>> +
> >>> + vlc_mutex_init( &p_filter->p_sys->lock );
> >>> + var_AddCallback( p_filter, FILTER_PREFIX "sigma",
> >>> + SharpenCallback, p_filter->p_sys );
> >>> +
> >>> + return VLC_SUCCESS;
> >>> +}
> >>> +
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Destroy: destroy Sharpen video thread output method
> >>> +
> *****************************************************************************
> >>> + * Terminate an output method created by SharpenCreateOutputMethod
> >>> +
> *****************************************************************************/
> >>> +static void Destroy( vlc_object_t *p_this )
> >>> +{
> >>> + filter_t *p_filter = (filter_t *)p_this;
> >>> + filter_sys_t *p_sys = p_filter->p_sys;
> >>> +
> >>> + var_DelCallback( p_filter, FILTER_PREFIX "sigma",
> SharpenCallback, p_sys );
> >>> + vlc_mutex_destroy( &p_sys->lock );
> >>> + for (int i = 0; i < 2; ++i)
> >>> + free( p_sys->column_state[i] );
> >>> + free( p_sys );
> >>> +}
> >>> +
> >>>
> +/*****************************************************************************
> >>> + * Render: displays previously rendered output
> >>> +
> *****************************************************************************
> >>> + * This function send the currently rendered image to Invert image,
> waits
> >>> + * until it is displayed and switch the two rendering buffers,
> preparing next
> >>> + * frame.
> >>> + *
> >>> + * Reference:
> >>> + *
> http://www-personal.engin.umd.umich.edu/~jwvm/ece488588/Papers/skipsm/17_Misc3x3.pdf
> >>> + *
> >>> + * Row Machine
> >>> + * 1 Tmp1 = Input[row j][col i];
> >>> + * 2 Tmp2 = Tmp1 + RS1;
> >>> + * 3 Tmp3 = 9*RS0;
> >>> + * 4 RS1 = RS0 + Tmp1;
> >>> + * 5 RS0 = Tmp1;
> >>> + * 6 Tmp1 = Tmp3 - Tmp2;
> >>> +
> >>> + * Column Machine
> >>> + * (Division by 8 omitted to get same behaviour as current sharpen
> filter)
> >>> + * Out[row j-1][col i-1]) = (CS1[col i] - Tmp2)/8;
> >>> + * CS1[col i] = Tmp1 - CS0[col i];
> >>> + * CS0[col i] = Tmp2
> >>> + *
> >>> +
> *****************************************************************************/
> >>> +static picture_t *Filter( filter_t *p_filter, picture_t *p_pic )
> >>> +{
> >>> + picture_t *p_outpic;
> >>> + unsigned i, j;
> >>> + uint8_t *p_src = NULL;
> >>> + uint8_t *p_out = NULL;
> >>> + int i_src_pitch;
> >>> + int i_out_pitch;
> >>> + uint8_t pix;
> >>> + int16_t row_state0, row_state1;
> >>> + filter_sys_t *sys = p_filter->p_sys;
> >>> + int16_t *column_state[2] = {sys->column_state[0],
> sys->column_state[1]};
> >>> + const unsigned i_visible_lines =
> p_pic->p[Y_PLANE].i_visible_lines;
> >>> + const unsigned i_visible_pitch =
> p_pic->p[Y_PLANE].i_visible_pitch;
> >>> +
> >>> + if( !p_pic ) return NULL;
> >>> +
> >>> + p_outpic = filter_NewPicture( p_filter );
> >>> + if( !p_outpic )
> >>> + {
> >>> + picture_Release( p_pic );
> >>> + return NULL;
> >>> + }
> >>> +
> >>> + /* process the Y plane */
> >>> + p_src = p_pic->p[Y_PLANE].p_pixels;
> >>> + p_out = p_outpic->p[Y_PLANE].p_pixels;
> >>> + i_src_pitch = p_pic->p[Y_PLANE].i_pitch;
> >>> + i_out_pitch = p_outpic->p[Y_PLANE].i_pitch;
> >>> +
> >>> + /* reset column state at beginning of operation */
> >>> + for (unsigned c = 0; c < 2; ++c)
> >>> + memset(column_state[c], 0, sizeof(*column_state[c]) *
> >>> + p_filter->fmt_in.video.i_visible_width);
> >>> +
> >>> + /* perform convolution only on Y plane. Avoid border line. */
> >>> + vlc_mutex_lock( &p_filter->p_sys->lock );
> >>> +
> >>> + /* copy first row */
> >>> + memcpy(p_out, p_src, i_visible_pitch);
> >>> +
> >>> + for( i = 1; i < i_visible_lines - 1; i++ )
> >>> + {
> >>> + /* row state must be initialized for each row */
> >>> + row_state0 = row_state1 = 0;
> >>> +
> >>> + /* copy first pixel in row */
> >>> + p_out[i * i_out_pitch] = p_src[i * i_src_pitch];
> >>> +
> >>> + for( j = 1; j < i_visible_pitch - 1; j++ )
> >>> + {
> >>> + /* row machine */
> >>> + int16_t tmp1 = p_src[i * i_src_pitch + j];
> >>> + const int16_t tmp2 = tmp1 + row_state1;
> >>> + const int16_t tmp3 = 9 * row_state0;
> >>> + row_state1 = row_state0 + tmp1;
> >>> + row_state0 = tmp1;
> >>> + tmp1 = tmp3 - tmp2;
> >>> +
> >>> + /* column machine */
> >>> + pix = clip(column_state[1][j] - tmp2);
> >>> +
> >>> + /* mix with original signal and write to output */
> >>> + p_out[(i - 1) * i_out_pitch + j - 1] =
> >>> + clip( p_src[(i - 1) * i_src_pitch + j - 1] +
> >>> + p_filter->p_sys->tab_precalc[pix + 256]);
> >>> +
> >>> + column_state[1][j] = tmp1 - column_state[0][j];
> >>> + column_state[0][j] = tmp2;
> >>> + }
> >>> +
> >>> + /* copy last pixel */
> >>> + p_out[i * i_out_pitch + i_visible_pitch - 1] =
> >>> + p_src[i * i_src_pitch + i_visible_pitch - 1];
> >>> + }
> >>> +
> >>> + /* copy last row */
> >>> + for( j = 0; j < i_visible_pitch; j++ )
> >>> + p_out[(i_visible_lines - 1) * i_out_pitch + j] =
> >>> + p_src[(i_visible_lines - 1) * i_src_pitch + j];
> >>> +
> >>> + vlc_mutex_unlock( &p_filter->p_sys->lock );
> >>> +
> >>> + plane_CopyPixels( &p_outpic->p[U_PLANE], &p_pic->p[U_PLANE] );
> >>> + plane_CopyPixels( &p_outpic->p[V_PLANE], &p_pic->p[V_PLANE] );
> >>> +
> >>> + return CopyInfoAndRelease( p_outpic, p_pic );
> >>> +}
> >>> +
> >>> +static int SharpenCallback( vlc_object_t *p_this, char const *psz_var,
> >>> + vlc_value_t oldval, vlc_value_t newval,
> >>> + void *p_data )
> >>> +{
> >>> + VLC_UNUSED(p_this); VLC_UNUSED(oldval); VLC_UNUSED(psz_var);
> >>> + filter_sys_t *p_sys = (filter_sys_t *)p_data;
> >>> +
> >>> + vlc_mutex_lock( &p_sys->lock );
> >>> + init_precalc_table( p_sys, VLC_CLIP( newval.f_float, 0., 2. ) );
> >>> + vlc_mutex_unlock( &p_sys->lock );
> >>> + return VLC_SUCCESS;
> >>> +}
> >>> --
> >>> 1.9.0
> >>>
> >>
> >> Just to clarify, if the new algo is ok, a proper patch will be sent
> that will only modify sharpen.c. The GUI won't change.
> >> This was sent together just for testing/comparison purposes.
> >>
> >> -t
> >>
> >>
> >> _______________________________________________
> >> vlc-devel mailing list
> >> To unsubscribe or modify your subscription options:
> >> https://mailman.videolan.org/listinfo/vlc-devel
> >>
> >
> >
> >
> > --
> > Félix Abecassis
> > http://felix.abecassis.me
> >
> > _______________________________________________
> > vlc-devel mailing list
> > To unsubscribe or modify your subscription options:
> > https://mailman.videolan.org/listinfo/vlc-devel
> >
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel
>
--
Félix Abecassis
http://felix.abecassis.me
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20140523/acb8da73/attachment.html>
More information about the vlc-devel
mailing list