[vlc-devel] Re: Non-western character encoding
Måns Rullgård
mru at inprovide.com
Sun Mar 12 14:26:12 CET 2006
Rémi Denis-Courmont <rem at videolan.org> writes:
> Le Samedi 11 Mars 2006 20:35, Subversion daemon a écrit :
>> r14724 | courmisch | 2006-03-11 20:35:22 +0100 (Sat, 11 Mar 2006) | 3
>> lines Changed paths:
>> M /trunk/modules/codec/subsdec.c
>>
>> * Use run-time detection of UTF-8 as current charset instead of
>> hard-coding to be if and only if Mac OS X
>
> Character encoding autoselection is way too simplistic on Unix variants;
> it simply assumes it is the same as that of the C library.
>
> For one thing, we should use CP1252 instead of Latin-1, and probably
> even Latin-9. Nobody uses Latin-9 in the real^H^H^H^Hdominant^WWindows
> world.
>
> We should probably do the same kind of mapping toward the other CP125x
> variants depending on the ISO-8859-x that is locally used. However I
> personnaly know no non-Western languages.
KOI-8 is commonly used for Russian. It might be what Windows uses,
though I'm not sure.
> Worst yet, we select EUC variants for Asian countries, while it is
> hardly used at all outside the Unix world.
Korean Windows uses EUC-KR if possible. If you mix ISO-8859-1 and
Hangul in the same file it uses UTF-16LE. SJIS seems to be the most
common encoding of Japanese.
> Unfortunately, I don't really what we should use there though: the
> local ISO-2022 variant, the local Windows codepage, or
> who-knows-whatever.
Unfortunately, the Asian scripts all commonly use several encodings.
Deciding on a single encoding to use would be a mistake.
--
Måns Rullgård
mru at inprovide.com
--
This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://developers.videolan.org/lists.html
More information about the vlc-devel
mailing list