[vlc-devel] Re: Non-western character encoding

Rémi Denis-Courmont rem at videolan.org
Sun Mar 12 16:45:04 CET 2006


Le Dimanche 12 Mars 2006 15:49, Måns Rullgård a écrit :
> How is "the local character encoding" determined?

It comes from LC_ALL, LC_CTYPE or LANG. The mapping is 
in /usr/share/i18n/SUPPORTED.

> If LC_ALL, LC_CTYPE or LANG (checked in that order) specifies an
> encoding, that should be used.  If none is specified, the best that
> can be done is to choose a default for each locale.  The user should
> always have an options to override the default should s/he wish to.

I have to disagree here. I don't believe japanese subtitles 
automagically change from Shift-JIS to EUC-JP as they are downloaded on 
a Linux system. Japanese Windows users use CP932 variant of Shift-JIS, 
so Japanese subtitles are in Shift-JIS/CP932. There is no point in 
trying to decode these as EUC-JP, even if that is the encoding for the 
ja_JP C library locale on Linux.

And I *know* that French subtitles don't automagically get converted 
from Latin-1/CP1252 to UTF-8 juste because my system's LC_CTYPE is 
fr_FR.UTF-8 instead of fr_FR.

What I do believe is that we get a much bigger rate of matching encoding 
by looking at the local system language (ie. the first part of LC_ALL 
or LANG), rather than by using the local system charset. In fact, it 
makes almost all subtitles work, while they would otherwise almost 
always fail. If you don't believe me, just try to use subtitles from 
some western language that has lots of accents (French, German, 
Swedish...) with a pre-[14724] VLC on a Linux system using a UTF-8 (as 
in LANG=??_??.UTF-8) locale variant for said language. And feel the 
pain.

The current approach is just utterly broken (except on Windows). The 
proposed approach brings VLC subtitles decoding on Linux & company to 
the same, much higher, “success” rate of VLC on Windows.

And then, we might also consider UTF-8 autodetection (à la 
irssi>=0.8.10), though I'm yet to find any UTF-8 subtitle file.

-- 
Rémi Denis-Courmont
http://www.simphalempin.com/home/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: not available
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20060312/404cce98/attachment.sig>


More information about the vlc-devel mailing list