[vlc-devel] [PATCH 3/8] stream_ReadLine: properly discard incomplete UTF-16 sequences at EOF
Rémi Denis-Courmont
remi at remlab.net
Wed Oct 21 12:36:15 CEST 2020
Le keskiviikkona 21. lokakuuta 2020, 6.24.40 EEST Pierre Ynard via vlc-devel a
écrit :
> Lone-byte incomplete UTF-16 sequences before EOF, in some cases such as
> a final line consisting only of it, would never get actually consumed
> from the stream, preventing it from ever properly reaching EOF.
>
> This also avoids flooding the logs with one warning per stream line
> towards the end of the stream, and then printing an unspecific
> conversion error: those are replaced by one clear and explicit error
> message.
>
>
> diff --git a/src/input/stream.c b/src/input/stream.c
> index 15d52da..c34b2ee 100644
> --- a/src/input/stream.c
> +++ b/src/input/stream.c
> @@ -245,15 +245,22 @@ char *vlc_stream_ReadLine( stream_t *s )
> }
> }
>
> - if( i_data % priv->text.char_width )
> + /* Deal here with incomplete multibyte sequences at EOF
> + that we won't be able to process anyway */
> + if( i_data < priv->text.char_width )
I don't know if it's only a problem with the comment or also with the code,
but this won't actually handle incomplete sequences ending on a non-BMP code
point.
And I'm not sure why this is needed at all. In fact, it seems wrong as EOF is
not necessarily at a fixed offset. The missing byte(s) could show up on the next
read attempt.
> {
> - /* keep i_char_width boundary */
> - i_data = i_data - ( i_data % priv->text.char_width );
> - msg_Warn( s, "the read is not i_char_width compatible");
> + assert( priv->text.char_width == 2 );
> + uint8_t inc;
> + ssize_t i_inc = vlc_stream_Read( s, &inc, priv->text.char_width
> ); + assert( i_inc == i_data );
> + if( i_inc > 0 )
> + msg_Err( s, "discarding incomplete UTF-16 sequence at EOF:
> 0x%02x", inc ); + break;
> }
>
> - if( i_data == 0 )
> - break;
> + /* Keep to text encoding character width boundary */
> + if( i_data % priv->text.char_width )
> + i_data = i_data - ( i_data % priv->text.char_width );
>
> /* Check if there is an EOL */
> if( priv->text.char_width == 1 )
> @@ -313,10 +320,10 @@ char *vlc_stream_ReadLine( stream_t *s )
>
> /* Read data (+1 for easy \0 append) */
> p_line = realloc_or_free( p_line,
> - i_line + STREAM_PROBE_LINE +
> priv->text.char_width ); + i_line + i_data +
> priv->text.char_width ); if( !p_line )
> goto error;
> - i_data = vlc_stream_Read( s, &p_line[i_line], STREAM_PROBE_LINE );
> + i_data = vlc_stream_Read( s, &p_line[i_line], i_data );
> if( i_data <= 0 ) break; /* Hmmm */
> i_line += i_data;
> i_read += i_data;
--
Реми Дёни-Курмон
http://www.remlab.net/
More information about the vlc-devel
mailing list