[vlc-devel] [RFC] EIT character sets conversion

Fri Aug 31 21:30:44 CEST 2007

Le vendredi 31 août 2007, Sigmund Augdal a écrit :
> The specs say strings starting with \x10 are "ISO/IEC 8859", and in
> this case the two following bytes are said to be a unsigned big
> endian 16 bit integer to be used as index into a table listed in the
> spec. This table has 16 entries where 0 is listed as reserved and
> 1-15 is listed as meaning the corresponding 8859 variant. No entries
> are given above this. I assume this is in order to make backwards
> compatible extensions to the spec possible in the future, but no such
> thing exists today to the best of my knowledge. 

So basically, Latin-10/ISO_8859-16 is currently not allowed. Not that 
anybody uses it in real life anyway ;-)

(...)
> Strings starting with \x11 are speced to mean ISO/IEC 10646-1 - Basic
> Multilingual Plane. I don't know what that means, but apperantly I
> have earlier interpreted it as meaning UTF-16, feel free to correct
> if my assumption was wrong.

BMP is the subset of Unicode code points that fit into a single word 
when using UTF-16. This is commonly known as "UCS-2" (but some people 
say UCS-2 = UTF-16). However, that does not say what the endianness is 
supposed to be, which is what I am interested in. Maybe the spec 
assumes Big Endian all over the place? In that case, we should 
pass "UTF-16BE" rather than "UTF-16" to iconv.

(...)
> \x15: UTF-8 encoding of ISO/IEC 10646-1 - Basic Multilingual Plane
                                            ^^^^^^^^^^^^^^^^^^^^^^^^
BMP again. When reading, it's ok to use UTF-8, of which UTF-8 BMP is a 
subset. In theory, if we were writing such a string, we would have to 
discard non-BMP code points.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: <http://mailman.videolan.org/pipermail/vlc-devel/attachments/20070831/9fa71682/attachment.sig>