[vlc-devel] [RFC 0/3] Improving vlc_array_t

Romain Vimont rom1v at videolabs.io
Sat Jul 21 13:35:23 CEST 2018


On Sat, Jul 21, 2018 at 08:18:25AM +0300, Rémi Denis-Courmont wrote:
> Le vendredi 20 juillet 2018, 14:29:07 EEST Romain Vimont a écrit :
> > On Fri, Jul 20, 2018 at 01:44:25PM +0300, Rémi Denis-Courmont wrote:
> > > Le jeudi 19 juillet 2018, 18:57:15 EEST Romain Vimont a écrit :
> > > > There are 3 arrays API in vlc_arrays.h:
> > > >  1. TAB_* macros
> > > >  2. ARRAY_* macros
> > > >  3. vlc_array_t + functions
> > > > 
> > > > When we want a dynamic array of pointers, the vlc_array_t seems more
> > > > appropriate.
> > > 
> > > The benefit of the TAB_* macros is that they are type-agnostic.
> > > vlc_array_t
> > > does not accomodate all the TAB_* use cases, and even when it does, it
> > > sacrifices type-safety.
> > > 
> > > That is not to say that TAB_* is good. Not only it lacks error handling as
> > > you noted, but in many cases, another structure, e.g. singly linked list,
> > > doubly linked list, or binary tree would work better than an array.
> > > 
> > > > Moreover, the macro versions abort() on allocation failure.
> > > 
> > > Yes but vlc_array_t comes with its own issues, such as type safety. And
> > > handling arrays manually is often easy enough that we have a lot of
> > > hand-coded type-safe _and_ error-safe implementation (not only from me).
> > 
> > In short, they all have benefits and drawbacks. We agree.
> 
> We agree that the helpers all have drawbacks.
> 
> In higher level languages with templates/generics, one can write nice vector 
> data structures (though that would probably come with the standard library).

I agree that templates/generics allow _one_ version to match all cases,
which is not possible in C.

> But in C, the normal way to handle vectors is a typed pointer and an unsigned 
> counter (or an error-value terminator). And it turns out that VLC actually 
> does that a lot - just not with the hopeless helpers.

I don't think that the helpers are hopeless. Each one may cover many
cases. For example, vlc_array_t is suitable when we want to store an
array of pointers (at the cost of using void* instead of the real type,
but providing other benefits). For non-pointers types, ARRAY_* or TAB_*
help.
 
> IMO, what we really should have are helpers for allocating and reallocating an 
> array (using mul_overflow) while exposing errors where unavoidable, and not 
> losing type safety.

In addition, we can still have an API which expose common functions
(find(), append(), remove(), swap_remove(), shrink(), etc., having
suitable behavior), so that we don't need to reimplement them
everywhere.

> > > > To make it more user-friendly, I suggest to rename:
> > > >  - vlc_array_item_at_index() by vlc_array_get() (patch 1)
> > > >  - vlc_array_index_of_item() by vlc_array_find() (patch 2)
> > > 
> > > TBH, if you need to look-up an item by index, you are probably using the
> > > wrong data structure. Binary tree is probably better suited (logarithmic
> > > time). Or if the lookup is only for random deletion, then doubly linked
> > > list is probably even better suited (constant time).
> > 
> > It depends. For example, if you often use random access in O(1), and
> > rarely need to find an item, it may be appropriate. There are often
> > trade-offs depending on the specific case.
> 
> If you often look up by random index-based access, and rarely insert/delete, 
> then by all means use a table. My point is that many (probably most) of the 
> TAB_* and vlc_array_t use cases fall outside of that scenario.
>
> But TBH, it is quite seldom that we do index-based random access. Usually, 
> it's either sequential access or look-up by some key.

I start working on some playlist/shuffle stuff, and my first choice
is to use a vector-like tool. Maybe in the end, for some reason, I will
switch to another structure, but a vector-like API would still be useful
IMO.

> > But at least the basic structures and common functions must be exposed,
> > so that we can choose what is more appropriate.
> 
> And that's basically never vlc_array_t because it's not type safe.

In some cases, ARRAY_* and TAB_* are useful. In some others, vlc_array_t
is useful. IMO, it would be usable in even more cases with some changes.

> > > > To make it more general (suitable for more cases), I suggest to change
> > > > its growth strategy so that we can append items in amortized constant
> > > > time (similar to what ARRAY_* macros use, or, FWIW, C++ std::vector,
> > > > Java ArrayList, Rust std::vec, etc.). (patch 3)
> > > 
> > > For potential large datasets, we already have trees supporting logarithmic
> > > time for insertion, look-up, replacement and deletion, and linear time for
> > > enumeration.
> > > 
> > > All that with only bound object sizes.
> > > 
> > > So I'm really unconvinced that we should push vlc_array_t.
> > 
> > IIUC, you are saying that since there are trees, we don't need vectors?
> 
> No. I am saying that if you are concerned about element insertion and/or 
> deletion time, you really should not use an array to begin with, because 
> performance with an array will definitely suck.

I disagree. Other structures have their own drawbacks too. It depends on
the concrete case.

> Insertion and deletion in an array will always be linear (assuming that heap 
> allocation is linear or less). The only thing you can hope to optimize is 
> appending and tail deletion - and even then a LIFO is usually a better choice.

If order need not be preserved, deletion may be O(1) (swap_remove()).


More information about the vlc-devel mailing list