[vlc-devel] [RFC, PATCH 1/6] demux/mkv: Added EbmlTypeDispatcher (including changes to Makefile.am)

Hugo Beauzée-Luyssen hugo at beauzee.fr
Thu Mar 3 17:39:01 CET 2016


Hi,

On 03/03/2016 01:30 PM, Filip Roséen wrote:
> The EbmlTypeDispatcher is a type used to "dispatch" an object of type
> EbmlElement* to an appropriate callback depending on the dynamic type of
> the referred to object.
>
> It can effectivelly replace the massive if-else branches found
> throughout the module by not only making the code easier to understand
> and maintain, but also by making it _a lot_ faster (benchmarks show a
> speed increase between 450 and 700% in terms of lookup).
>
> --
>
> -----------------------------------------------------------------------
> dispatcher.hpp
> -----------------------------------------------------------------------
>
> This header declares a bunch of helpers to easily interact and register
> callbacks to a certain dispatcher (such as EbmlTypeDispatcher), as well
> as thread-safely creating and initializing such dispatcher.
>
> Since this module is stuck in C++03 land, there are some quirks in the
> language specification that does not allow us to write as clean code as
> we would want.
>
> In C++11 we could use a lookup-table filled with lambdas, but since the
> feature (obviously) is not available in C++03 we have two alternatives.
>
>    - Write the function declaration + definition by itself, and then
>      manually register the function in the appropriate instance of the
>      dispatcher type.
>
>    - Introduce MACROs that does this automatically through some clever
>      loop holes in the language (that are required to exists according to
>      the ISO/IEC 14882:2003(E) C++ Standard).
>
> `dispatcher.hpp` choose the latter route, because the first alternative
> is even more ugly than a massive if-else tree (and very error prone)--
> plus:
>
>    - The functions would be distant by definition so by looking at the
>      code responsible for registering the callback will not tell you what
>      will actually be executed.
>
>    - Implement the function but forget to include it in the dispatcher
>      will yield undesirable behavior.
>
>    - Removing the code responsible for registrering the callback will
>      create lingering function definitions, which will affect
>      maintenance.
>
>    - Introducing two handlers for the same type would not yield a compile
>      time diagnostic.
>
> The MACRO approach (even though I try to avoid MACROs at all cost)
> solves all of the issues listed above, and more.
>
> See the comments in `module/mkv/demux/dispatcher.hpp` for more
> information regarding its contents.
>
> -----------------------------------------------------------------------
> Ebml_dispatcher.hpp
> -----------------------------------------------------------------------
>
> This header makes use of the functionality presented by
> `dispatcher.hpp`, and effectivelly allows a developer to write code as
> the following.
>
>      struct EbmlPayload {
>        matroska_segment_c * const parent;
>        mkv_track_t        * const   p_tk;
>        unsigned int                level;
>      } captures = { this, tk, 3 };
>
>      MKV_SWITCH_CREATE( EbmlTypeDispatcher, EbmlHandler, EbmlPayload, vars )
>      {
>        MKV_SWITCH_INIT();
>
>        /* ... */
>
>        E_CASE( KaxSeekPreRoll, spr ) {
>          vars.tk->i_seek_preroll = uint64_t( spr ) / 1000;
>        }
>
>        E_CASE( KaxTrackAudio, tka) {
>          vars.p_tk->fmt.audio.i_channels = 1;
>          vars.p_tk->fmt.audio_i_rate = 8000;
>
>          vars.level += 1;
>
>          dispatcher.iterate( tka.begin(), tka.end(), Payload( vars ));
>
>          vars.level -= 1;
>        }
>
>        E_CASE( KaxAudioSamplingFreq, afreq ) {
>          vars.p_tk->i_original_rate = vars.p_tk->fmt.audio.i_rate = (int)float( afreq );
>        }
>
>        E_CASE( KaxAudioOutputSamplingFreq, afreq ) {
>          vars.p_tk->fmt.audio.i_rate  = (int)float( afreq );
>        }
>
>        E_CASE( KaxAudioChannels, achan ) {
>          vars.p_tk->fmt.audio.i_channels = uint8( chan );
>        }
>
>        E_CASE( KaxAudioBitDepth, abits ) {
>          vars.p_tk->fmt.audio.i_bitspersample = uint8( abits );
>        }
>
>        E_CASE( EbmlVoid, ) { // ignore
>          VLC_UNUSED( vars );
>        }
>
>        E_CASE_DEFAULT(element) {
>          MkvTree( *vars.p_demuxer, vars.level, "Unknown (%s)", typeid(element).name() );
>        }
>      };
>
>      EbmlHandler::Dispatcher().iterate(
>        m->begin(), m->end(), EbmlHandler::Payload( captures )
>      );
>
> Effictivelly replacing the old massive nested if-else trees, such as the
> below taken from `matroska_segment_parse.cpp`.
>
>      for( size_t i = 0; i < m->ListSize(); i++ )
>      {
>          EbmlElement *l = (*m)[i];
>
>          if /* ... */
>          else if( MKV_IS_ID( l, KaxSeekPreRoll ) )
>          {
>              KaxSeekPreRoll &spr = *(KaxSeekPreRoll*)l;
>              tk->i_seek_preroll = uint64_t(spr) / 1000;
>          }
>          else  if( MKV_IS_ID( l, KaxTrackAudio ) )
>          {
>              EbmlMaster *tka = static_cast<EbmlMaster*>(l);
>
>              /* Initialize default values */
>              tk->fmt.audio.i_channels = 1;
>              tk->fmt.audio.i_rate = 8000;
>
>              for( unsigned int j = 0; j < tka->ListSize(); j++ )
>              {
>                  EbmlElement *l = (*tka)[j];
>
>                  if( MKV_IS_ID( l, KaxAudioSamplingFreq ) )
>                  {
>                      KaxAudioSamplingFreq &afreq = *(KaxAudioSamplingFreq*)l;
>
>                      tk->i_original_rate = tk->fmt.audio.i_rate = (int)float( afreq );
>                  }
>                  else if( MKV_IS_ID( l, KaxAudioOutputSamplingFreq ) )
>                  {
>                      KaxAudioOutputSamplingFreq &afreq = *(KaxAudioOutputSamplingFreq*)l;
>
>                      tk->fmt.audio.i_rate = (int)float( afreq );
>                  }
>                  else if( MKV_IS_ID( l, KaxAudioChannels ) )
>                  {
>                      KaxAudioChannels &achan = *(KaxAudioChannels*)l;
>
>                      tk->fmt.audio.i_channels = uint8( achan );
>                      msg_Dbg( &sys.demuxer, "|   |   |   |   + achan=%u", uint8( achan ) );
>                  }
>                  else if( MKV_IS_ID( l, KaxAudioBitDepth ) )
>                  {
>                      KaxAudioBitDepth &abits = *(KaxAudioBitDepth*)l;
>
>                      tk->fmt.audio.i_bitspersample = uint8( abits );
>                  }
>                  else if ( !MKV_IS_ID( l, EbmlVoid ) )
>                  {
>                      MkvTree( sys.demuxer, 4, "Unknown (%s)", typeid(*l).name() );
>                  }
>              }
>          }
>          else if ( !MKV_IS_ID( l, EbmlVoid ) )
>          {
>              MkvTree( sys.demuxer, 3, "Unknown (%s)", typeid(*l).name() );
>          }
>      }
>
> -----------------------------------------------------------------------
> How does it work?
> -----------------------------------------------------------------------
>
> The MACROs will result in a local struct (`EbmlHandler` in the previous
> example) being declared, that has each of the case-handlers inside of
> it.
>
> This type will have a static function named `Dispatcher` that will yield
> a constant reference to the desired dispatcher.
>
> If this static instance of the dispatcher (one dispatcher per
> MKV_SWITCH_CREATE)  has not yet been initialized, all of the
> case-handlers will have their constructor executed effectivelly
> making themselves available to the dispatcher.
>
> If the static instance has already been constructed, we simply return a
> reference to that one.
>
> The comments in `dispatcher.hpp` and `Ebml_dispatcher.hpp` should be
> explain things in more detail.
>
> -----------------------------------------------------------------------
> Is it faster?
> -----------------------------------------------------------------------
>
> The benchmarks shows that using the EbmlTypeDispatcher increases the
> performance of the lookup (since it is using a lookup table instead of
> if-else) by about 450% - 700% (sometimes more).
>
> -----------------------------------------------------------------------
> Will it ease code maintenance?
> -----------------------------------------------------------------------
>
> Hopefully.
>
> -----------------------------------------------------------------------
> Why not put everything in `Ebml_dispatcher.hpp`?
> -----------------------------------------------------------------------
>
> There are other massive if-else trees besides those related to
> EbmlElement, with a separate "base" it will be easy to implement other
> type of dispatchers.
>
> -----------------------------------------------------------------------
>
> Feedback is more than welcome, thanks!
>
> Yours truly,
> Filip Roséen
> ---
>   modules/demux/Makefile.am             |   2 +
>   modules/demux/mkv/Ebml_dispatcher.hpp | 161 ++++++++++++++++++++++++++++++++++
>   modules/demux/mkv/dispatcher.hpp      | 150 +++++++++++++++++++++++++++++++
>   3 files changed, 313 insertions(+)
>   create mode 100644 modules/demux/mkv/Ebml_dispatcher.hpp
>   create mode 100644 modules/demux/mkv/dispatcher.hpp
>
> diff --git a/modules/demux/Makefile.am b/modules/demux/Makefile.am
> index 11518fe..caf298c 100644
> --- a/modules/demux/Makefile.am
> +++ b/modules/demux/Makefile.am
> @@ -173,7 +173,9 @@ libmkv_plugin_la_SOURCES = \
>   	demux/mkv/matroska_segment.hpp demux/mkv/matroska_segment.cpp \
>   	demux/mkv/matroska_segment_parse.cpp \
>   	demux/mkv/demux.hpp demux/mkv/demux.cpp \
> +	demux/mkv/dispatcher.hpp \
>   	demux/mkv/Ebml_parser.hpp demux/mkv/Ebml_parser.cpp \
> +	demux/mkv/Ebml_dispatcher.hpp \
>   	demux/mkv/chapters.hpp demux/mkv/chapters.cpp \
>   	demux/mkv/chapter_command.hpp demux/mkv/chapter_command.cpp \
>   	demux/mkv/stream_io_callback.hpp demux/mkv/stream_io_callback.cpp \
> diff --git a/modules/demux/mkv/Ebml_dispatcher.hpp b/modules/demux/mkv/Ebml_dispatcher.hpp
> new file mode 100644
> index 0000000..bb0c56a
> --- /dev/null
> +++ b/modules/demux/mkv/Ebml_dispatcher.hpp
> @@ -0,0 +1,161 @@
> +/*****************************************************************************
> + * Ebml_dispatcher.hpp : matroska demuxer
> + *****************************************************************************
> + * Copyright (C) 2016 Videolabs
> + * $Id$
> + *
> + * Authors: Filip Roseen <filip at videolabs.io>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License as published by
> + * the Free Software Foundation; either version 2.1 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public License
> + * along with this program; if not, write to the Free Software Foundation,
> + * Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301, USA.
> + *****************************************************************************/
> +#ifndef VLC_MKV_EBML_DISPATCHER_HPP_
> +#define VLC_MKV_EBML_DISPATCHER_HPP_
> +
> +#include "dispatcher.hpp"
> +
> +#include "ebml/EbmlElement.h"
> +#include "ebml/EbmlId.h"
> +
> +#include <vlc_threads.h>
> +
> +#include <algorithm>
> +#include <typeinfo>
> +#include <vector>
> +
> +namespace {
> +  namespace detail {
> +    template<class T>
> +    static std::type_info const* typeid_ptr () {
> +      static std::type_info const& ti = typeid (T);
> +      return &ti;
> +    }
> +  }
> +

Is the anonymous namespace needed/useful in a header file?
And doesn't it imply "static" linkage for the functions it contains?

> +  struct EbmlProcessorEntry {
> +    typedef void (*EbmlProcessor) (EbmlElement*, void*);
> +
> +    EbmlId         const* p_ebmlid;
> +    std::type_info const* p_typeid;
> +    EbmlProcessor         callback;
> +
> +    EbmlProcessorEntry (EbmlId const& id, std::type_info const* ti, EbmlProcessor cb)
> +      : p_ebmlid (&id), p_typeid (ti), callback (cb)
> +    { }
> +  };
> +
> +  struct ProcessorEntrySorter {
> +    typedef EbmlProcessorEntry value_type;
> +
> +    bool operator() (value_type const& lhs, value_type const& rhs) const {
> +      EbmlId const& lid = *lhs.p_ebmlid;
> +      EbmlId const& rid = *rhs.p_ebmlid;
> +
> +      return lid.GetLength() < rid.GetLength() || (
> +        !( rid.GetLength() < lid.GetLength() ) && lid.GetValue() < rid.GetValue()
> +      );
> +    }
> +  };
> +
> +  class EbmlTypeDispatcher : public Dispatcher<EbmlTypeDispatcher, EbmlProcessorEntry::EbmlProcessor> {
> +    protected:
> +      typedef std::vector<EbmlProcessorEntry> ProcessorContainer;
> +
> +    public:
> +      void insert (EbmlProcessorEntry const& data) {
> +        _processors.push_back (data);
> +      }
> +
> +      void on_create () {
> +        std::sort (_processors.begin(), _processors.end(), _ebml_sorter);
> +      }
> +
> +      bool send (EbmlElement * const& element, void* payload) const
> +      {
> +        EbmlProcessorEntry eb = EbmlProcessorEntry (
> +          static_cast<EbmlId const&> (*element), NULL, NULL
> +        );
> +
> +        // --------------------------------------------------------------
> +        // Find the appropriate callback for the received EbmlElement
> +        // --------------------------------------------------------------
> +
> +        ProcessorContainer::const_iterator cit_end = _processors.end();
> +        ProcessorContainer::const_iterator cit     = std::lower_bound (
> +            _processors.begin(), cit_end, eb, _ebml_sorter
> +        );
> +
> +        if (element && cit != cit_end)
> +        {
> +          // --------------------------------------------------------------
> +          // normally we only need to compare the addresses of the EbmlId
> +          // since libebml returns a reference to a _static_ instance.
> +          // --------------------------------------------------------------
> +
> +          while (cit != cit_end && (cit->p_ebmlid == eb.p_ebmlid || (*cit->p_ebmlid == *eb.p_ebmlid))) {
> +            std::type_info const& ti = typeid (*element);
> +
> +            // --------------------------------------------------------------
> +            // even though the EbmlId are equivalent, we still need to make
> +            // sure that the typeid also matches.
> +            // --------------------------------------------------------------
> +
> +            if (*(cit->p_typeid) == ti) {
> +              cit->callback (element, payload);
> +              return true;
> +            }
> +
> +            ++cit;
> +          }
> +        }
> +
> +        if (_default_handler == NULL)
> +            return false;
> +
> +        _default_handler (element, payload);
> +        return true;
> +      }
> +
> +    public:
> +      ProcessorContainer           _processors;
> +      static ProcessorEntrySorter _ebml_sorter;
> +  };
> +
> +} /* end-of-namespace */
> +
> +#define EBML_ELEMENT_CASE_DEF(EbmlType_, ClassName_, VariableName_, InitializationExpr_) \
> +    MKV_SWITCH_CASE_DEFINITION( ClassName_, EbmlType_, EbmlElement*, VariableName_, vars,          \
> +      InitializationExpr_, static_cast<EbmlType_&> (*data)                            \
> +    )
> +
> +// -----------------------------------------------------------------------------------
> +// The use of `detail::typeid_ptr` below is so that we do not have to invoke "typeid"
> +// every time we are requested to do a lookup. `std::type_info` cannot be copied, so
> +// we cannot pass it by value.
> +//
> +// In C++11 you could use the hash value present inside std::type_info to do lookup,
> +// but we are stuck in C++03 and have to use the below instead.
> +// -----------------------------------------------------------------------------------
> +
> +#define E_CASE(EbmlType_, VariableName_)            \
> +    EBML_ELEMENT_CASE_DEF(EbmlType_, EbmlType_, VariableName_, \
> +      (dispatcher.insert( EbmlProcessorEntry( EbmlType_ ::ClassInfos.ClassId(), detail::typeid_ptr<EbmlType_>(), &EbmlType_ ## _callback) ) ) \
> +    )
> +
> +#define E_CASE_DEFAULT(VariableName_)                    \
> +    EBML_ELEMENT_CASE_DEF(EbmlElement, ebml_default, VariableName_, \
> +      dispatcher.set_default_handler (&ebml_default_callback)    \
> +    )
> +
> +#endif
> diff --git a/modules/demux/mkv/dispatcher.hpp b/modules/demux/mkv/dispatcher.hpp
> new file mode 100644
> index 0000000..2f6169e
> --- /dev/null
> +++ b/modules/demux/mkv/dispatcher.hpp
> @@ -0,0 +1,150 @@
> +/*****************************************************************************
> + * dispatcher.hpp : matroska demuxer
> + *****************************************************************************
> + * Copyright (C) 2016 Videolabs
> + * $Id$
> + *
> + * Authors: Filip Roseen <filip at videolabs.io>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of the GNU Lesser General Public License as published by
> + * the Free Software Foundation; either version 2.1 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> + * GNU Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public License
> + * along with this program; if not, write to the Free Software Foundation,
> + * Inc., 51 Franklin Street, Fifth Floor, Boston MA 02110-1301, USA.
> + *****************************************************************************/
> +#ifndef VLC_MKV_DISPATCHER_HPP_
> +#define VLC_MKV_DISPATCHER_HPP_
> +
> +// ----------------------------------------------------------------------------
> +// This header contains helpers to simulate lambdas in C++03.
> +//
> +// It will be used to create "dispatchers" which can be thought of like a
> +// `switch` but only more dynamic.
> +// ----------------------------------------------------------------------------
> +
> +namespace {
> +
> +  template<class Impl, class Processor>
> +  class Dispatcher {
> +    protected:
> +      Dispatcher() : _default_handler (NULL) { }

For the sake of nitpicking, wouldn't it be "more generic" to default 
construct _default_handler, so that is doesn't have to be a pointer type?

> +    public:
> +      template<class It>
> +      void iterate (It beg, It end, void* const& payload) const {
> +        for (; beg != end; ++beg)
> +          static_cast<Impl const*> (this)->Impl::send (*beg, payload);
> +      }
> +
> +      void set_default_handler (Processor const& callback) {
> +        _default_handler = callback;
> +      }
> +
> +      void on_create () {
> +        /* empty default implementation */
> +      }
> +
> +      Processor _default_handler;
> +  };
> +
> +  struct DispatcherTag { };
> +
> +  template<DispatcherTag*, class DispatcherType>
> +  class DispatchContainer {
> +    public:    static DispatcherType dispatcher;
> +    protected: static vlc_mutex_t   _dispatcher_lock;
> +  };
> +
> +  template<DispatcherTag* P, class DT>
> +  DT DispatchContainer<P, DT>::dispatcher;
> +
> +  template<DispatcherTag* P, class DT>
> +  vlc_mutex_t DispatchContainer<P, DT>::_dispatcher_lock = VLC_STATIC_MUTEX;
> +}
> +
> +// ----------------------------------------------------------------------------
> +//   * `GroupName_ ## _tag` is used so that we can refer to a static dispatcher
> +//      of the correct type without instantiating DispatchContainer with a
> +//      locally declared type (since it is illegal in C++03).
> +//
> +//      We are however allowed to pass the variable of a variable declared
> +//      `extern`, and this will effectively be our handle to the "foreign"
> +//      static dispatcher.
> +//
> +//   *  `GroupName_ ## _base` is used to declare a bunch of helpers; names that
> +//       must be available in our fake "lambdas" (C++03 is a pain).
> +// ----------------------------------------------------------------------------
> +
> +#define MKV_SWITCH_CREATE(DispatchType_, GroupName_, PayloadType_) \
> +  extern DispatcherTag GroupName_ ## _tag;                         \
> +  struct GroupName_ ## _base : DispatchContainer<&GroupName_ ## _tag, DispatchType_> { \
> +      typedef      PayloadType_ payload_t;                         \
> +      typedef     DispatchType_ dispatch_t;                        \
> +      typedef struct GroupName_ handler_t;                         \
> +      static void* Payload (payload_t& data) {                     \
> +          return static_cast<void*> (&data);                       \

Honest question: isn't reinterpret_cast more suited for cast to/from 
void* ? It feels to me like there are basically 2 religions, but I'm 
interested in your point of view on the matter :)

> +      }                                                            \
> +  };                                                               \
> +  struct GroupName_ : GroupName_ ## _base
> +
> +// ----------------------------------------------------------------------------
> +//   * `Dispatcher` is a static function used to access the dispatcher in a
> +//      thread-safe manner. We only want _one_ thread to actually construct
> +//      and initialize it, hence the lock.
> +// ----------------------------------------------------------------------------
> +
> +#define MKV_SWITCH_INIT()                     \
> +  static dispatch_t& Dispatcher () {          \
> +      static handler_t * p_handler = NULL;    \
> +      vlc_mutex_lock( &_dispatcher_lock );    \

There is a vlc_mutex_locker helper to simulate std::lock_guard you could 
use here.

> +      if (unlikely( p_handler == NULL) ) {    \
> +          static handler_t handler;           \
> +          p_handler = &handler;               \
> +          p_handler->dispatcher.on_create (); \
> +      }                                       \
> +      vlc_mutex_unlock( &_dispatcher_lock );  \
> +      return p_handler->dispatcher;           \
> +  } struct PleaseAddSemicolon {}

Also, would it make sense to have another implementation when C++11 is 
enabled that takes advantage of the new memory model by only 
initializing a static handler_t instead of a pointer, and getting rid of 
the lock?

> +
> +// ----------------------------------------------------------------------------
> +//   * The following is to be used inside `struct GroupName_`, effectivelly
> +//     declaring a local struct and a data-member, `ClassName_ ## _processor`,
> +//     of that type.
> +//
> +//     When the data-member is constructed it will run `InitializationExpr_`,
> +//     meaning that it can access the static dispatcher and register itself (or
> +//     whatever is desired).
> +//
> +//   * Since we need to do type-erasure, once again, because of the fact that
> +//     C++03 does not support locally declared types as template-arguments, we
> +//     declare a static function `ClassName_ ## _callback` that will cast our
> +//     payload from `void*` to the appropriate type.
> +//
> +//   * The body of `ClassName_ ## _handler` will be written by the user of the
> +//     MACRO, and the implementation will effectively be invoked whenever the
> +//     dispatcher decides to (through `ClassName_ ## _callback`).
> +//
> +//   * Since the type of the `VariableName_` argument to `ClassName_ ## _handler`
> +//     might not necessarily be the same as the type passed from the dispatcher
> +//     (because of the type-erasure), the macro provides a way to change the
> +//     type with a `UnwrapExpr_` (see the function call within `ClassName_ ##
> +//     _callback`).
> +// ----------------------------------------------------------------------------
> +
> +#define MKV_SWITCH_CASE_DEFINITION(ClassName_, RealType_, Type_, VariableName_, PayloadName_, InitializationExpr_, UnwrapExpr_) \
> +  struct ClassName_ ## _processor {                                             \
> +    ClassName_ ## _processor () { InitializationExpr_; }                        \
> +  } ClassName_ ## __wrapper;                                                    \
> +  static inline void ClassName_ ## _callback (Type_ data, void* PayloadName_) { \
> +    ClassName_ ## _handler (UnwrapExpr_, *static_cast<payload_t*> (vars));      \
> +  }                                                                             \
> +  static inline void ClassName_ ## _handler (RealType_& VariableName_, payload_t& PayloadName_)
> +
> +#endif
>

I don't think this addresses the logic of the patch, which I still have 
a hard time to wrap my head around :)
I'll probably give it another go soonish

Regards,



More information about the vlc-devel mailing list