[vlc-devel] [PATCH] Detecting language of subtitles using its filename

Walter Cacau waltercacau at gmail.com
Sat Feb 2 13:40:05 CET 2013


Hi Rémi,

I felt the need to get the name of the video file, so that's why I used the
input. Do you know any alternative for getting that name? Or do you think
is best to use a strategy that does not require it? Any thoughts?

My first draft actually did not used the input, it only looked for the
string between the last two dots in the file name. So, "video.x.avi" would
output "x". The problem with that is the fact that many video files use
dots instead of spaces as word separator, so the function might end up
detecting something that is not actually the subtitle language. For
example, in "cool.video.avi" that naive strategy would yield "video".

If I do need to go through the input, could you point me to some reference
part of the code that does the necessary locking and ref counting?


Thanks

On Sat, Feb 2, 2013 at 3:32 AM, Rémi Denis-Courmont <remi at remlab.net> wrote:

> Le vendredi 1 février 2013 22:46:16, Walter Cacau a écrit :
> > A user who has many subtitles for a single video can now distinguish
> > between them by the suffix containing language information.
> >
> > So, for example, suppose the user has a video file named "video.mp4".
> > If he names a subtitle as "video.en.srt", the English language will
> > be detected. Similarly, he can name other subtitles with different
> > languages, ex: "video.pt.srt", "video.pt-BR.srt", ...
> >
> > If the subtitle name is equal to the video name (except for the
> > file extension), then no language is detected.
> >
> > This change has the positive side effect of making the subtitle menu
> > show the detected language instead of only "Track X".
> > ---
> >  modules/demux/subtitle.c | 57
> > ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57
> > insertions(+)
> >
> > diff --git a/modules/demux/subtitle.c b/modules/demux/subtitle.c
> > index aa1c7e6..dd91dbe 100644
> > --- a/modules/demux/subtitle.c
> > +++ b/modules/demux/subtitle.c
> > @@ -35,6 +35,7 @@
> >  #include <vlc_plugin.h>
> >  #include <vlc_input.h>
> >  #include <vlc_memory.h>
> > +#include <vlc_url.h>
> >
> >  #include <ctype.h>
> >
> > @@ -223,6 +224,7 @@ static int Demux( demux_t * );
> >  static int Control( demux_t *, int, va_list );
> >
> >  static void Fix( demux_t * );
> > +static char * get_language_from_filename( const char *, const char * );
> >
> >
>  /*************************************************************************
> > **** * Module initializer
> > @@ -529,6 +531,19 @@ static int Open ( vlc_object_t *p_this )
> >      }
> >      else
> >          es_format_Init( &fmt, SPU_ES, VLC_CODEC_SUBT );
> > +
> > +    /* Detecting subtitle language using its filename */
> > +    char * psz_language = get_language_from_filename(
> > +        p_demux->psz_location,
> > +        input_GetItem(p_demux->p_input)->psz_uri
> > +    );
>
> Please leave the input alone; you don't need it to know your location.
> AFAIK,
> there might not even be an input. Also that code looks like it lacks
> reference
> counting and locking.
>
> > +    if( psz_language )
> > +    {
> > +        fmt.psz_language = psz_language;
> > +        msg_Dbg( p_demux, "detected language %s of subtitle: %s",
> > psz_language, +                 p_demux->psz_location );
> > +    }
> > +
> >      if( unicode )
> >          fmt.subs.psz_encoding = strdup( "UTF-8" );
> >      char *psz_description = var_InheritString( p_demux,
> "sub-description"
> > ); @@ -2079,3 +2094,45 @@ static int ParseSubViewer1( demux_t *p_demux,
> > subtitle_t *p_subtitle, int i_idx return VLC_SUCCESS;
> >  }
> >
> > +static char * get_language_from_filename( const char * psz_sub_location,
> > +                                          const char * psz_video_url )
> > +{
> > +    char * psz_ret = NULL;
> > +    char * psz_video_file = NULL;
> > +    char * psz_sub_file = NULL;
> > +    char * psz_tmp;
> > +    char * psz_sub_suffix;
> > +    char * ps_language_end;
> > +
> > +    psz_video_file = strrchr( psz_video_url, '/' );
> > +    if( !psz_video_file ) goto end;
> > +    psz_video_file++;
> > +    psz_video_file = decode_URI_duplicate(psz_video_file);
> > +    if( !psz_video_file ) goto end;
> > +
> > +    psz_sub_file = strrchr( psz_sub_location, '/' );
> > +    if( !psz_sub_file ) goto end;
> > +    psz_sub_file++;
> > +    psz_sub_file = decode_URI_duplicate(psz_sub_file);
> > +    if( !psz_video_file ) goto end;
> > +
> > +    /* Removing extension, but leaving the dot */
> > +    psz_tmp = strrchr( psz_video_file, '.' );
> > +    if( !psz_tmp ) goto end;
> > +    psz_tmp[1] = '\0';
> > +
> > +    /* Extracting sub file prefix */
> > +    if( strstr(psz_sub_file, psz_video_file) != psz_sub_file ) goto end;
> > +    psz_sub_suffix = psz_sub_file + strlen(psz_video_file);
> > +
> > +    ps_language_end = strrchr( psz_sub_suffix, '.' );
> > +    if( !ps_language_end ) goto end;
> > +    *ps_language_end = '\0';
> > +
> > +    psz_ret = strdup(psz_sub_suffix);
> > +
> > +end:
> > +    FREENULL(psz_video_file);
> > +    FREENULL(psz_sub_file);
> > +    return psz_ret;
> > +}
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>



-- 
Walter Carlos P. Cacau Filho
ITA T12 - Engenharia de Computação
waltercacau at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20130202/35467ec4/attachment.html>


More information about the vlc-devel mailing list