[vlc-devel] [PATCH 2/2] taglib: detect charset when ID3v2 Latin-1 parser finds invalid character

Rémi Denis-Courmont remi at remlab.net
Fri Oct 23 16:59:46 CEST 2020


Le perjantaina 23. lokakuuta 2020, 13.46.56 EEST sojulibra at gmail.com a écrit :
> From: Souju TANAKA <sojulibra at gmail.com>
> 
> Changed TagLib Latin-1 parser to check whether a ISO 8859-1 encoded ID3v2
> tag is a valid byte sequence. If invalid Latin-1 character is found,

Well, considering that any octet sequence is valid ISO 8859-1, that sentence 
makes no sense.

> try to
> detect charset and convert the tag into UTF-8 to avoid Mojibake.
> 
> Some encoder embeds ID3v2 in unexpected charset, though it is againt the
> spec. TagLib allows to overide TagLib::ID3v2::Latin1StringHandler::parse()
> to deal with this practical situation.
> ---
>  include/vlc_charset.h           | 20 +++++++++++
>  modules/meta_engine/Makefile.am |  2 +-
>  modules/meta_engine/taglib.cpp  | 63 +++++++++++++++++++++++++++++++++
>  3 files changed, 84 insertions(+), 1 deletion(-)
> 
> diff --git a/include/vlc_charset.h b/include/vlc_charset.h
> index 0ec1734dc9..311856913e 100644
> --- a/include/vlc_charset.h
> +++ b/include/vlc_charset.h
> @@ -93,6 +93,26 @@ VLC_USED static inline const char *IsASCII(const char
> *str) return str;
>  }
> 
> +/**
> + * Checks ISO/IEC 8859-1 validity.
> + *
> + * Checks whether a null-terminated string is a valid ISO/IEC 8859-1 bytes
> sequence + *
> + * \param str string to check
> + *
> + * \retval str the string is a valid null-terminated ISO/IEC 8859-1
> sequence + * \retval NULL the string is not an ISO/IEC 8859-1 sequence
> + */
> +VLC_USED static inline const char *IsLatin1(const char *str)
> +{
> +    unsigned char c;
> +
> +    for (const char *p = str; (c = *p) != '\0'; p++)
> +        if (unlikely(c < 0x20 || (c > 0x7e && c < 0xa0)))
> +            return NULL;
> +    return str;

Oh come on. Eliminating C1 codes is questionable, but you obviously can't just 
blindly reject all C0 codes.

-- 
レミ・デニ-クールモン
http://www.remlab.net/





More information about the vlc-devel mailing list