[vlc] Re: I'm implementing subtitle charset-detector by using firefox's library

Tsai Dung-Bang dbtsai at gmail.com
Tue Feb 20 18:58:41 CET 2007


Rémi Denis-Courmont 写道:
> Well, I don't know. It's enough to discriminate UTF-8 from the local 
> character encoding in any case that I've dealt with. The underlying 
> assumption is that valid UTF-8 byte sequences are indeed UTF-8 - and in 
> particular valid US-ASCII sequences are US-ASCII (lets assume nobody 
> uses QP, base64 or UTF-7 subtitles).
>   
It seems that there are some local charset words in the uft-8 1920 
codes. (110yyyyy 10zzzzzz). So if we do not have enough data, we could 
not distinguish a local charset from raw data.

And in this architecture, how can we add the support of BOM(Byte Order 
Mark)? Lots of unicode subtitles have this.

I think the behavior of option in the GUI of "Input/Codec->Other 
Codec->Subtitles" UTF-8 subtitles autodetection should be when you set 
Subitles text encoding into Big5, but you load an UTF-8 subtitles, vlc 
could autodetect it by using the codepage range algorithm. But it seems 
not like what I thought.



> I think this works fine for UTF-8 against the entire ISO-8859 series, 
> and it definitely works for UTF-8 against latin character sets. It also 
> seems quite good for Shift-JIS, EUC-KR, GB18030 and Big5 on Asian side.
>
> Reading more than one line at a time because it makes the whole stuff A 
> LOT more complicated (you have to probe multiple lines ahead of what 
> you need). Moreover we have to support subtitles for streaming content 
> too.
>
> I would start trying to fix the GetFallbackLanguage() implementation 
> rather than start a big gas factory if I were you. You can always build 
> the factory later if it is really required

Big gas factory?? Sorry for my poor english, I do not really get this. 
There is a library wapper for the mozilla detector, I would like to 
implementation it to vlc.

Thanks for your help

Tsai Dung-Bang

-- 
This is the vlc mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://www.videolan.org/support/lists.html



More information about the vlc mailing list