[vlc-devel] Re: Non-western character encoding

Rémi Denis-Courmont rem at videolan.org
Sun Mar 12 19:09:05 CET 2006

Le Dimanche 12 Mars 2006 17:54, Måns Rullgård a écrit :
> Rémi Denis-Courmont <rem at videolan.org> writes:
> > It comes from LC_ALL, LC_CTYPE or LANG. The mapping is
> > in /usr/share/i18n/SUPPORTED.
> No such file on my system.

Debian specific, I believe.

> > I have to disagree here. I don't believe japanese subtitles
> > automagically change from Shift-JIS to EUC-JP as they are
> > downloaded on a Linux system.

> No, but the user might convert them manually.  It only takes a few
> seconds.

Most users don't. Most users don't even know how to. And when that can 
be done automatically, that should be done automatically.

Actually, I'd not be surprised when not even half of the vlc-devel 
suscribers, however more computer-knowledgeable they are, don't know 
how to do that.

> I see what you are getting at.  The thing is, with no other
> indication of encoding (e.g. specified in an HTTP header) the best
> guess is still to use the locale settings.  I usually convert any
> files I intend to access often to utf-8, unless they have some
> builtin means of indicating what encoding they use.  It generally
> simplifies things having all files in the same encoding.

Does it make sense to try to use the file as UTF-8 when iconv tells you 
it is not possible? Does it make sense to force the user into using the 
advanced file opening dialog, and go through the advanced subtitles 
setting to define its subtitles encoding manually, while we can simply 
try UTF-8 and fallback to CP1252, given we know his/her locale is one 
of some western language from some pretty finite list?

The average user expect its subtitle to work provided it is in his/her 
language and it only differs from the video by its extension. That's 
how it works on Windows!

> The problem is that there is little correlation between the encoding
> of the files and the system locale setting. 

This is untrue. Most if not all text files will be encoded according 
either to the locale setting, or to the Windows ACP for its language 
(or to whatever standard said Windows ACP is derived from, and is 
compatible with).

> Your method will still fail if my locale is en_US (or sv_SE or de_DE)
> and I try to watch a movie with sjis subtitles (not that I'm very
> likely to do that).

In that rare particular case, you can go through the complicated 
encoding setting. Is that a reason for forcing you to do it in the most 
common case, though?

> A better idea might be to guess the encoding based on the language of
> the subtitles if this is known.

Provided we have an AI for language recognition...

> And an override option should be present, whatever other methods are
> used.

An override option *is* present. Did I ever say I wanted to remove it? 
I'm only arguing we should have more clever defaults.

> > And then, we might also consider UTF-8 autodetection (à la
> > irssi>=0.8.10), though I'm yet to find any UTF-8 subtitle file.

> Why don't you just create one with iconv?

Because VLC can use iconv() automatically for me and all other users 
that don't know about “iconv -f cp1252 < myMovie.srt > myMovie.srt.utf8 
&& mv -f myMovie.srt{.utf8,}”.

Rémi Denis-Courmont
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20060312/9e4d99aa/attachment.sig>

More information about the vlc-devel mailing list