[vlc-devel] [PATCH 5a/5] stream_ReadLine: control maximum length with object variable
Pierre Ynard
linkfanel at yahoo.fr
Mon Sep 7 13:48:31 CEST 2020
> Which is already insanely high. For comparison, Kate limits lines at
> 4,096 characters by default...
Great! Then with these patches, something like Kate can now actually
lower the limit to a more appropriate 4096 bytes if it wants to, so it
goes in the direction you want.
> There are no valid use cases for doing this in Lua. You have not
> exhibited any line-based file formats that needs larger lines. As
> François and I already reminded you multiples times, HTML is NOT
> line-based.
You're invoking the semantic argument here, that HTML is not lines so
stream_ReadLine() shouldn't be used. But you're trying to have it both
ways: the current API you defend is semantically broken.
It doesn't even honor what it says it does - it's called ReadLine(),
not ReadBoundLine(). The hardcoded limit isn't documented, neither its
existence nor its length. As I think we've already established, the
limit's concern for memory starvation is irrelevant at that order of
magnitude, and its concern for what the consumer can handle is out of
place. If we're talking proper semantics, the API should read and return
a line, any line, that's all.
Rejection semantics are completely wrong too. The API doesn't
differentiate between actual failure, purposefully rejecting lines over
the limit, and EOF. It doesn't let the caller know which one it is. It
doesn't even print any error message about invalid input and instead
fails silently, I have to add one! Come to think of it, maybe the caller
should keep retrying on NULL an indeterminate number of times, because
it can't know which it is between failure, EOF and line too long: I've
actually seen out-of-tree scripts doing that.
It also consumes the stream data and then trashes it if the line is
incomplete. The caller can never recover that data, it's gone. So it
prevents any kind of fallback to sized reads (which it sounds like you
might favor), and precludes any kind of mixed content scenario (which I
hope you won't qualify as stupid anyway).
Proper semantics should never consume any data from the stream that
isn't successfully returned as a line; all the more that doing otherwise
is irrecoverable. What about a patch that does that? The only difference
would be that data would be buffered within the stream layer facilities
instead of one big continuous reallocating string buffer, which sounds
actually even better.
Proper semantics should also let the caller know whether a line was too
long. Printing an error log isn't something the caller can use. Proper
semantics imply either removing the length limit, or taking it from the
caller; adding a maximum length argument is also a step towards proper
management of oversized lines.
So HTML may not be line-based. But it's not because of that or the
validity of lua features and use cases if the current stream_ReadLine()
API is broken and unsuitable.
Also, you can make that point about line-based formats all you want, and
it is indeed a valid and interesting design point, but that's not going
to change its relevance or relation to reality. We've discussed this
before, but I hadn't even got to the real reason why what you suggest is
a bad idea; so I'm getting to it now.
Pretty much all lua playlist scripts use readline(), because it's
the only usable interface available to parse pages. They rely on the
line-based model to parse such things as HTML; the rare ones that don't,
instead just slurp the whole file before beginning to parse it - which
is really not going in the right direction and makes for some really
problematic code, so I hope you won't feel satisfied with it saying HTML
is indeed a file-based format.
Due to the pervasiveness of the readline() and line-based model, we're
going to have to keep and maintain it anyway. You're pushing for also
investing a lot of time and efforts trying to implement something that
won't even be done right or solve any problem because it's intractable
to really do properly. Even if that happens and lua scripts are provided
with a second API, what's going to happen?
People aren't going to start using it. Inertia is one factor. But
what I believe is that people will simply prefer to continue using
readline(). People generally don't want to use XML parsing APIs, what
with constantly checking the node type for start, leaf, end, or going
through a chained list of node attributes... People press Ctrl+U in
their web browser to view the source code of the website, they see lines
with their eyes containing the URLs and bits of information they seek,
they understand that. The lua playlist script feature is a popular
success, and I believe the simplicity and immediateness of the model I
just expressed is a big part of it.
All the more that switching to the new model doesn't actually solve
any real problem. The only problem it solves is the one artificially
created by keeping the stream_ReadLine() API broken. And even then,
people already have ways to work around it, and it makes for hacky,
poorly designed and inefficient code, but they'll probably keep doing
it.
So for all these efforts and deflections, what will we have? Instead of
one broken API, we'll have one broken API and one broken and unused API.
And that is the real reason why pushing for lua scripts not to use
readline() to read HTML is but a bad idea.
--
Pierre Ynard
"Une âme dans un corps, c'est comme un dessin sur une feuille de papier."
More information about the vlc-devel
mailing list