[vlc-devel] [RFC] EIT character sets conversion

Sigmund Augdal sigmund.augdal at gmail.com
Fri Aug 31 14:08:35 CEST 2007


On 8/30/07, Rémi Denis-Courmont <rem at videolan.org> wrote:
>
>         Hello,
>
> I have a few doubts concerning EITConvertToUTF8 (from
> modules/demux/ts.c). I have no access to the relevant specifications,
> neither to real-life streams using that.
>
> First, if the "string" starts with \x10\x00, it appears we assume the
> third byte codes the number of an ISO_8859 character set. Is there any
> reason why this is limited to the range 1-15? As of now, there is also
> ISO_8859-16 (a.k.a. "Latin-10"), and who knows if more will not be
> added.

The specs say strings starting with \x10 are "ISO/IEC 8859", and in this
case the two
following bytes are said to be a unsigned big endian 16 bit integer to be
used as index
into a table listed in the spec. This table has 16 entries where 0 is listed
as reserved and 1-15
is listed as meaning the corresponding 8859 variant. No entries are given
above this. I assume
this is in order to make backwards compatible extensions to the spec
possible in the future, but
no such thing exists today to the best of my knowledge.


Second, if the string starts with \x11, we assume the rest is a sequence
> of UTF-16. That being noted, iconv reckons three different kind of
> UTF-16. I am not sure, but I believe "UTF-16" needs a Byte-Order-Mark at
> the beginning, otherwise "UTF-16LE" and "UTF16-BE" must be used when
> the byte endianess is arbitrarily specified.


Strings starting with \x11 are speced to mean ISO/IEC 10646-1 - Basic
Multilingual Plane. I don't
know what that means, but apperantly I have earlier interpreted it as
meaning UTF-16, feel free to
correct if my assumption was wrong.

Furthermore the spec defines strings starting with values all the way up to
\x15, these are as follows:
\x12: KSC5601-1987 - Korean Character Set
\x13: GB2312-1980 - Simplified Chinese Character
\x14: Big5 subset of ISO/IEC 10646-1 - Traditional Chinese
\x15: UTF-8 encoding of ISO/IEC 10646-1 - Basic Multilingual Plane

I think maybe I didn't have the most resent spec when I wrote that code, so
some of these might be missing.

If interested you can download the spec from www.etsi.org. It's called en
300 468, and the relevant section is
in annex A.

Regards

Sigmund

> Help wanted.
>
> --
> Rémi Denis-Courmont
> http://www.remlab.net/
>
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> http://mailman.videolan.org/listinfo/vlc-devel
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20070831/a4a6b11f/attachment.html>


More information about the vlc-devel mailing list