[vlc-devel] [PATCH 1/3] vlc_list: helpers for doubly linked lists

Romain Vimont rom1v at videolabs.io
Tue Jun 12 13:35:02 CEST 2018


On Mon, Jun 11, 2018 at 08:14:41PM +0300, Rémi Denis-Courmont wrote:
> Le maanantaina 11. kesäkuuta 2018, 18.14.47 EEST Romain Vimont a écrit :
> > On Mon, Jun 11, 2018 at 01:27:12PM +0300, Rémi Denis-Courmont wrote:
> > > I am storing the head so that the head expression is expansion-safe. In
> > > practice, the compiler should be able to optimize it away. Likewise, it
> > > should be able to optimize the constant offset away.
> > Since the offset is compile-time constant, I would not store it at all.
> > 
> > > Point is that an iterator is unavoidable in the general case. With that in
> > > mind, I'd rather leverage the iterator to make the code as safe as
> > > possible.
> > > 
> > > I already replaced goal with head to make the iteration safe against
> > > removal.
> > OK, so you want the safe foreach version only.
> 
> As expected, GCC is perfectly able to optimize the head and the offset:
> 
> ----8<--------8<--------8<--------8<--------8<--------8<--------8<----
> % cat test.c
> #include "include/vlc_list.h"
> 
> struct datum {
>         int i;
>         struct vlc_list node;
> };
> 
> extern void process(int d);
> 
> int card(struct vlc_list *head)
> {
>         struct vlc_list_it it;
>         struct datum *d;
>         int count = 0;
> 
>         vlc_list_foreach(d, it, head, struct datum, node)
>                 count++;
> 
>         return count;
> }
> 
> int sum(struct vlc_list *head)
> {
>         struct vlc_list_it it;
>         struct datum *d;
>         int sum = 0;
> 
>         vlc_list_foreach(d, it, head, struct datum, node)
>                 sum += d->i;
> 
>         return sum;
> }
> 
> void iterate(struct vlc_list *head)
> {
>         struct vlc_list_it it;
>         struct datum *d;
> 
>         vlc_list_foreach(d, it, head, struct datum, node)
>                 process(d->i);
> }
> % aarch64-linux-gnu-gcc -ffunction-sections -O3 -c test.c -o test.o
> % aarch64-linux-gnu-objdump -d test.o
> 
> test.o:     format de fichier elf64-littleaarch64
> 
> 
> Déassemblage de la section .text.card :
> 
> 0000000000000000 <card>:
>    0:   f9400402        ldr     x2, [x0, #8]
>    4:   52800000        mov     w0, #0x0                        // #0
>    8:   f9400441        ldr     x1, [x2, #8]
>    c:   eb01005f        cmp     x2, x1
>   10:   540000a0        b.eq    24 <card+0x24>  // b.none
>   14:   f9400421        ldr     x1, [x1, #8]
>   18:   11000400        add     w0, w0, #0x1
>   1c:   eb01005f        cmp     x2, x1
>   20:   54ffffa1        b.ne    14 <card+0x14>  // b.any
>   24:   d65f03c0        ret
> 
> Déassemblage de la section .text.sum :
> 
> 0000000000000000 <sum>:
>    0:   f9400404        ldr     x4, [x0, #8]
>    4:   52800000        mov     w0, #0x0                        // #0
>    8:   d1002082        sub     x2, x4, #0x8
>    c:   f9400481        ldr     x1, [x4, #8]
>   10:   eb01009f        cmp     x4, x1
>   14:   540000e0        b.eq    30 <sum+0x30>  // b.none
>   18:   b9400043        ldr     w3, [x2]
>   1c:   d1002022        sub     x2, x1, #0x8
>   20:   f9400421        ldr     x1, [x1, #8]
>   24:   0b030000        add     w0, w0, w3
>   28:   eb01009f        cmp     x4, x1
>   2c:   54ffff61        b.ne    18 <sum+0x18>  // b.any
>   30:   d65f03c0        ret
> 
> Déassemblage de la section .text.iterate :
> 
> 0000000000000000 <iterate>:
>    0:   a9be7bfd        stp     x29, x30, [sp, #-32]!
>    4:   910003fd        mov     x29, sp
>    8:   f9400401        ldr     x1, [x0, #8]
>    c:   f9000bf3        str     x19, [sp, #16]
>   10:   f9400433        ldr     x19, [x1, #8]
>   14:   eb13003f        cmp     x1, x19
>   18:   54000180        b.eq    48 <iterate+0x48>  // b.none
>   1c:   f9000fb4        str     x20, [x29, #24]
>   20:   aa0003f4        mov     x20, x0
>   24:   d1002020        sub     x0, x1, #0x8
>   28:   b9400000        ldr     w0, [x0]
>   2c:   94000000        bl      0 <process>
>   30:   d1002260        sub     x0, x19, #0x8
>   34:   f9400681        ldr     x1, [x20, #8]
>   38:   f9400673        ldr     x19, [x19, #8]
>   3c:   eb01027f        cmp     x19, x1
>   40:   54ffff41        b.ne    28 <iterate+0x28>  // b.any
>   44:   f9400fb4        ldr     x20, [x29, #24]
>   48:   f9400bf3        ldr     x19, [sp, #16]
>   4c:   a8c27bfd        ldp     x29, x30, [sp], #32
>   50:   d65f03c0        ret
> 
> ---->8-------->8-------->8-------->8-------->8-------->8-------->8----
> 
> Or if you are not fluent in AArch64 (wut mate?), the first two correspond to:
> 
> int card(struct list_node *x0)
> {
>     int w0;
>     struct vlc_list *x1, *x2;
>     
>     x2 = x0->next;
>     w0 = 0;
>     x1 = x2->next;
>     if (x2 == x1)
>         goto card_24;

Thank you for the decoding.

Here, it compares x0->next (x2) with x0->next->next (x1).

This made me realize that, unless the vlc_list head is embedded into the
same struct type as the nodes (e.g in your patch 3/3, the vlc_list
infos of info_t is embedded into info_category_t), and if the list is
empty, then vlc_list_it_start() returns a pointer to random memory.

If the list is empty, then head->next == head (by definition). So this
function:

> static inline void *vlc_list_it_start(struct vlc_list_it *restrict it,
>                                       const struct vlc_list *head, size_t offset)
> {
>     struct vlc_list *first = head->next;
> 
>     it->head = head;
>     it->next = first->next;
>     it->node_offset = -offset;
>     return ((char *)first) + it->node_offset;
> }

returns container_of(head, type, member). But head (contrary to the
other nodes) is typically not embedded into "type".

This probably have no consequences, at least in vlc_list_foreach(),
since it tests vlc_list_it_done(), which will return true, since
(head->next == head->next->next) implies (head == head->next).

> card_14:
>     x1 = x1->next;
>     w0++;
>     if (x2 != x1)
>         goto card_14;
> card_24:
>     return w0;
> }
> 
> int sum(struct list_node *x0)
> {
>     int w0, w3;
>     struct vlc_list *x1, *x4;
>     struct data *x2;
> 
>     x4 = x0->next;
>     w0 = 0;
>     x2 = container_of(x4, struct data, node);
>     x1 = x4->next;
>     if (x4 == x1)
>         goto _sum30;
> _sum18:
>     w3 = x2->i;
>     x2 = container_of(x1, struct data, node);
>     x1 = x1->next;
>     w0 += w3;
>     if (x4 != x1)
>         goto _sum18;
> _sum30:
>     return w0;
> }
> And the last one is left as an exercise.

I am not fluent in AArch64 at all :) But just for comparison, the
"unsafe" version (using the macro I posted earlier) generates 2 ldr
less.

0000000000000000 <iterate>:
   0:	a9be7bfd 	stp	x29, x30, [sp, #-32]!
   4:	910003fd 	mov	x29, sp
   8:	f9000ff4 	str	x20, [sp, #24]
   c:	aa0003f4 	mov	x20, x0
  10:	f9400400 	ldr	x0, [x0, #8]
  14:	eb00029f 	cmp	x20, x0
  18:	54000140 	b.eq	40 <iterate+0x40>  // b.none
  1c:	f9000bb3 	str	x19, [x29, #16]
  20:	d1002013 	sub	x19, x0, #0x8
  24:	b9400260 	ldr	w0, [x19]
  28:	94000000 	bl	0 <process>
  2c:	f9400a60 	ldr	x0, [x19, #16]
  30:	d1002013 	sub	x19, x0, #0x8
  34:	eb00029f 	cmp	x20, x0
  38:	54ffff61 	b.ne	24 <iterate+0x24>  // b.any
  3c:	f9400bb3 	ldr	x19, [x29, #16]
  40:	f9400ff4 	ldr	x20, [sp, #24]
  44:	a8c27bfd 	ldp	x29, x30, [sp], #32
  48:	d65f03c0 	ret

Anyway, my remark about the safe/unsafe version was not about
performance, but more about passing an additional argument that could be
avoided in most cases (a foreach with 5 arguments is quite complex to
use).

But still, LGTM.

> 
> The offset is optimized away. The head (actually head->next) is retained in a 
> register, not copied into the iterator. In the first case, even the cursor is 
> optimized away. The iterator is never written to memory.
> 
> With that noted, I don't really see the point in adding an unsafe iterator 
> variant, except for developers to shoot themselves in the foot. Likewise, I 
> don't see the point in removing the head copy from the iterator, except to 
> make the head expression no longer expansion-safe.
> 
> With -O2, the iterator functions are put out-of-line(!), and the iterator is 
> fully written to/read from memory, of course.
> 
> -- 
> Реми Дёни-Курмон
> http://www.remlab.net/
> 
> 
> _______________________________________________
> vlc-devel mailing list
> To unsubscribe or modify your subscription options:
> https://mailman.videolan.org/listinfo/vlc-devel


More information about the vlc-devel mailing list