[vlc-devel] Re: Non-western character encoding

Sun Mar 12 14:26:12 CET 2006

Rémi Denis-Courmont <rem at videolan.org> writes:

> Le Samedi 11 Mars 2006 20:35, Subversion daemon a écrit :
>> r14724 | courmisch | 2006-03-11 20:35:22 +0100 (Sat, 11 Mar 2006) | 3
>> lines Changed paths:
>>    M /trunk/modules/codec/subsdec.c
>>
>>  * Use run-time detection of UTF-8 as current charset instead of
>> hard-coding to be if and only if Mac OS X
>
> Character encoding autoselection is way too simplistic on Unix variants; 
> it simply assumes it is the same as that of the C library.
>
> For one thing, we should use CP1252 instead of Latin-1, and probably 
> even Latin-9. Nobody uses Latin-9 in the real^H^H^H^Hdominant^WWindows 
> world.
>
> We should probably do the same kind of mapping toward the other CP125x 
> variants depending on the ISO-8859-x that is locally used. However I 
> personnaly know no non-Western languages.

KOI-8 is commonly used for Russian.  It might be what Windows uses,
though I'm not sure.

> Worst yet, we select EUC variants for Asian countries, while it is 
> hardly used at all outside the Unix world.

Korean Windows uses EUC-KR if possible.  If you mix ISO-8859-1 and
Hangul in the same file it uses UTF-16LE.  SJIS seems to be the most
common encoding of Japanese.

> Unfortunately, I don't really what we should use there though: the
> local ISO-2022 variant, the local Windows codepage, or
> who-knows-whatever.

Unfortunately, the Asian scripts all commonly use several encodings.
Deciding on a single encoding to use would be a mistake.

-- 
Måns Rullgård
mru at inprovide.com

-- 
This is the vlc-devel mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://developers.videolan.org/lists.html