[vlc] Re: I'm implementing subtitle charset-detector by using firefox's library
Tsai Dung-Bang
dbtsai at gmail.com
Tue Feb 20 18:58:41 CET 2007
Rémi Denis-Courmont 写道:
> Well, I don't know. It's enough to discriminate UTF-8 from the local
> character encoding in any case that I've dealt with. The underlying
> assumption is that valid UTF-8 byte sequences are indeed UTF-8 - and in
> particular valid US-ASCII sequences are US-ASCII (lets assume nobody
> uses QP, base64 or UTF-7 subtitles).
>
It seems that there are some local charset words in the uft-8 1920
codes. (110yyyyy 10zzzzzz). So if we do not have enough data, we could
not distinguish a local charset from raw data.
And in this architecture, how can we add the support of BOM(Byte Order
Mark)? Lots of unicode subtitles have this.
I think the behavior of option in the GUI of "Input/Codec->Other
Codec->Subtitles" UTF-8 subtitles autodetection should be when you set
Subitles text encoding into Big5, but you load an UTF-8 subtitles, vlc
could autodetect it by using the codepage range algorithm. But it seems
not like what I thought.
> I think this works fine for UTF-8 against the entire ISO-8859 series,
> and it definitely works for UTF-8 against latin character sets. It also
> seems quite good for Shift-JIS, EUC-KR, GB18030 and Big5 on Asian side.
>
> Reading more than one line at a time because it makes the whole stuff A
> LOT more complicated (you have to probe multiple lines ahead of what
> you need). Moreover we have to support subtitles for streaming content
> too.
>
> I would start trying to fix the GetFallbackLanguage() implementation
> rather than start a big gas factory if I were you. You can always build
> the factory later if it is really required
Big gas factory?? Sorry for my poor english, I do not really get this.
There is a library wapper for the mozilla detector, I would like to
implementation it to vlc.
Thanks for your help
Tsai Dung-Bang
--
This is the vlc mailing-list, see http://www.videolan.org/vlc/
To unsubscribe, please read http://www.videolan.org/support/lists.html
More information about the vlc
mailing list