[Translators] Translation errors for subtitle files

Dean Lee xslidian at gmail.com
Thu May 13 05:27:50 CEST 2010


Hi,

Just tell what I know:

CP950 "BIG5" Chinese Traditional, mostly used in Hong Kong, Macau & Taiwan
CP950 "BIG5-HKSCS" Chinese Traditional, added with the Hong Kong
Supplementary CharSet

CP936 "GB2312" "GBK" Chinese Simplified
CP54936 (Windows-54936) "GB18030" Chinese Simplified, compatible with
"GB2312", includes all characters in CP936
(the three all start with 'GB', short for 'national standard' in
Mandarin Chinese)

CP950 & CP936 are the most widely used charsets for Trad & Simp
Chinese subtitles besides UTF-8.

Best wishes,
Dean


2010/5/13 Rémi Denis-Courmont <remi at remlab.net>:
>        Hello,
>
> VLC 1.1.0 includes a very special translated string "CP1252". This is not
> shown to the user. Instead, VLC tries to use it as the default character
> encoding for subtitle files.
>
> If it is not translated correctly, subtitle file will be interpreted as
> Western European Latin alphabet text. This will show mojibake, in other words
> complete non-sense if the subtitles are not from a Western European language.
>
> If in doubt, please ask and we will try to clarify what the correct
> translation is for your specific language. As a general rule, the value
> depends on your alphabet/script. For Latin alphabet, it depends on accentuated
> characters.
>
> I think the most common values are as follow. However, I do not speak any non-
> Latin language, so it is possible that I am wrong:
>
> "BIG5"          Chinese Traditional, used on Taiwan island
> "BIG5-HKSCS"    Chinese Traditional, used in Hong-Kong (and Macau??)
> "CP949"         Korean (or maybe "EUC-KR"??)
>
> "CP1250"                Latin for Slavic languages
> "CP1251"                Cyrillic
> "CP1252"                Latin for western European languages
> "CP1253"                Hellenic
> "CP1254"                Latin for Turkish
> "ISO-8858-8"    Jewish (or maybe "CP1255" ??)
> "CP1256"                Arabic
> "CP1257"                Latin for Baltic countries (or maybe "CP1251" ??)
> "CP1258"                Latin for Vietnamese
>
> "GP18030"       Chinese Simplified, used on mainland and in Singapore
> "PT154"         Kazahk (??)
> "SHIFT-JIS"     Japanese
> "TIS-620"       Thai
>
> I know some entries are missing, for instance Indian scripts, because I have
> absolutely no clue about them.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
> http://fi.linkedin.com/in/remidenis
> _______________________________________________
> Translators mailing list
> Translators at videolan.org
> http://mailman.videolan.org/listinfo/translators
>


More information about the Translators mailing list