[vlc-devel] [PATCH 3/8] stream_ReadLine: properly discard incomplete UTF-16 sequences at EOF

Rémi Denis-Courmont remi at remlab.net
Wed Oct 21 12:36:15 CEST 2020


Le keskiviikkona 21. lokakuuta 2020, 6.24.40 EEST Pierre Ynard via vlc-devel a 
écrit :
> Lone-byte incomplete UTF-16 sequences before EOF, in some cases such as
> a final line consisting only of it, would never get actually consumed
> from the stream, preventing it from ever properly reaching EOF.
> 
> This also avoids flooding the logs with one warning per stream line
> towards the end of the stream, and then printing an unspecific
> conversion error: those are replaced by one clear and explicit error
> message.
> 
> 
> diff --git a/src/input/stream.c b/src/input/stream.c
> index 15d52da..c34b2ee 100644
> --- a/src/input/stream.c
> +++ b/src/input/stream.c
> @@ -245,15 +245,22 @@ char *vlc_stream_ReadLine( stream_t *s )
>              }
>          }
> 
> -        if( i_data % priv->text.char_width )
> +        /* Deal here with incomplete multibyte sequences at EOF
> +           that we won't be able to process anyway */
> +        if( i_data < priv->text.char_width )

I don't know if it's only a problem with the comment or also with the code, 
but this won't actually handle incomplete sequences ending on a non-BMP code 
point.

And I'm not sure why this is needed at all. In fact, it seems wrong as EOF is 
not necessarily at a fixed offset. The missing byte(s) could show up on the next 
read attempt.

>          {
> -            /* keep i_char_width boundary */
> -            i_data = i_data - ( i_data % priv->text.char_width );
> -            msg_Warn( s, "the read is not i_char_width compatible");
> +            assert( priv->text.char_width == 2 );
> +            uint8_t inc;
> +            ssize_t i_inc = vlc_stream_Read( s, &inc, priv->text.char_width
> ); +            assert( i_inc == i_data );
> +            if( i_inc > 0 )
> +                msg_Err( s, "discarding incomplete UTF-16 sequence at EOF:
> 0x%02x", inc ); +            break;
>          }
> 
> -        if( i_data == 0 )
> -            break;
> +        /* Keep to text encoding character width boundary */
> +        if( i_data % priv->text.char_width )
> +            i_data = i_data - ( i_data % priv->text.char_width );
> 
>          /* Check if there is an EOL */
>          if( priv->text.char_width == 1 )
> @@ -313,10 +320,10 @@ char *vlc_stream_ReadLine( stream_t *s )
> 
>          /* Read data (+1 for easy \0 append) */
>          p_line = realloc_or_free( p_line,
> -                          i_line + STREAM_PROBE_LINE +
> priv->text.char_width ); +                        i_line + i_data +
> priv->text.char_width ); if( !p_line )
>              goto error;
> -        i_data = vlc_stream_Read( s, &p_line[i_line], STREAM_PROBE_LINE );
> +        i_data = vlc_stream_Read( s, &p_line[i_line], i_data );
>          if( i_data <= 0 ) break; /* Hmmm */
>          i_line += i_data;
>          i_read += i_data;


-- 
Реми Дёни-Курмон
http://www.remlab.net/





More information about the vlc-devel mailing list