https://lwn.net/SubscriberLink/887097/7ca69c6bfa3584c0/

LWN.net Logo LWN
.net News from the source LWN

  * Content
      + Weekly Edition
      + Archives
      + Search
      + Kernel
      + Security
      + Distributions
      + Events calendar
      + Unread comments
      + -------------------------------------------------------------
      + LWN FAQ
      + Write for us

User: [        ] Password: [        ] [Log in]
|
[Subscribe]
|
[Register]
Subscribe / Log in / New account

Toward a better list iterator for the kernel

[LWN subscriber-only content]

    Welcome to LWN.net

    The following subscription-only content has been made available
    to you by an LWN subscriber. Thousands of subscribers depend on
    LWN for the best news from the Linux and free software
    communities. If you enjoy this article, please consider
    subscribing to LWN. Thank you for visiting LWN.net!

By Jonathan Corbet
March 10, 2022
Linked lists are conceptually straightforward; they tend to be taught
toward the beginning of entry-level data-structures classes. It might
thus be surprising that the kernel community is concerned about its
longstanding linked-list implementation and is not only looking for
ways to solve some problems, but has been struggling to find that
solution. It now appears that some improvements might be at hand:
after more than 30 years, the kernel developers may have found a
better way to safely iterate through a linked list.

Kernel linked lists

C, of course, makes the creation of linked lists relatively easy.
What it does not do, though, is help in the creation of generic
linked lists that can contain any type of structure. By its nature, C
lends itself to the creation of ad hoc linked lists in every
situation where they are needed, resulting in boilerplate code and
duplicated definitions. Every linked-list implementation must be
reviewed for correctness. It would be far nicer to have a single
implementation that was known to work so that kernel developers could
more profitably use their time introducing bugs into code that is
truly unique to their problem area.

The kernel, naturally, has a few solutions for linked lists, but the
most commonly used is struct list_head:

    struct list_head {
        struct list_head *next, *prev;
    };

This structure can be employed in the obvious way to create doubly
linked lists; a portion of such a list might look like:

    [list-head]

struct list_head can represent a linked list nicely, but has one
significant disadvantage: it cannot hold any other information.
Usually, this kind of data structure is needed to link some other
data of interest; the list structure by itself isn't the point.
C does not make it easy to create a linked list with an arbitrary
payload, but it is easy to embed struct list_head inside the
structure that the developer actually wants to organize into a list:

    [list-head2]

This is how linked lists are typically constructed in the kernel.
Macros like container_of() can be used to turn a pointer to a
list_head structure into a pointer to the containing structure. Code
that works with linked lists will almost always use this macro (often
indirectly) to gain access to the larger payload.

One final detail that is worthy of note is that the actual head of
the list tends to be a list_head structure that is not embedded
within the structure type of interest:

    [list-head3]

For a real-world example of how this infrastructure is used, consider
struct inode, which represents a file within a filesystem. Inodes can
be on a lot of lists simultaneously, so struct inode contains no less
than five separate list_head structures; unfortunately, your editor's
meager artistic skills are not up to the task of showing what the
resulting data structure looks like. One of those list_head
structures, i_sb_list, is used to associate the inode with the
superblock of the filesystem it belongs to. The list_head structure
that anchors this list is the s_inodes field of struct super_block.
That is the one list_head structure in this particular list that is
not embedded within an instance of struct inode.

Traversal of a linked list will typically begin at the anchor and
follow the next pointers until the head is found again. One can, of
course, open-code this traversal, but the kernel also provides a long
list of functions and macros for this purpose. One of those is
list_for_each_entry(), which will go through the entire list,
providing a pointer to the containing structure at each node. Typical
code using this macro looks like this:

    struct inode *inode;

    /* ... */
    list_for_each_entry(inode, &sb->s_inodes, i_sb_list) {
        /* Process each inode here */
    }
    /* Should not use the iterator here */

Within the loop, the macro uses container_of() to point inode at the
containing inode structure for each list entry. The problem is: what
is the value of inode on exit from the loop? If code exited the loop
with a break statement, inode will point to the element under
consideration at that time. If, however, execution passes through the
entire list, inode will be the result of using container_of() on the
separate list head, which is not contained within an inode structure.
That puts the kernel deeply into undefined-behavior territory and
could lead to any of a number of bad things.

For this reason, the rule for macros like list_for_each_entry() is
that the iterator variable should not be used outside of the loop. If
a value needs to be accessed after the loop, it should be saved in a
separate variable for that purpose. It's an implicit rule, though;
nobody felt the need to actually note this restriction in the
documentation for the macros themselves. Unsurprisingly, this rule is
thus more of a guideline at best; the kernel is full of code that
does, indeed, use the iterator variable after the loop.

The search for a safer iterator

When we last looked at this issue, Jakob Koschel had posted patches
fixing some of these sites; he continued this project afterward.
Linus Torvalds, however, thought that this approach was inadequate
because it did nothing to prevent future problems from being
introduced:

    So if we have the basic rule being "don't use the loop iterator
    after the loop has finished, because it can cause all kinds of
    subtle issues", then in _addition_ to fixing the existing code
    paths that have this issue, I really would want to (a) get a
    compiler warning for future cases and (b) make it not actually
    _work_ for future cases.

    Because otherwise it will just happen again.

Along the way, the developers came to the realization that moving to
a newer version of the C standard might help, since it would allow
the declaration of the iterator variable within the loop itself (thus
making it invisible outside of the loop). Torvalds made an initial
attempt at a solution that looked like this:

    #define list_for_each_entry(pos, head, member)                              \
        for (typeof(pos) __iter = list_first_entry(head, typeof(*pos), member); \
             !list_entry_is_head(__iter, head, member) && (((pos)=__iter),1);   \
             __iter = list_next_entry(__iter, member))

This version of the macro still accepts the iterator variable as an
argument, keeping the same prototype as before; this is important,
since there are thousands of instances of this macro in the kernel
source. But it declares a new variable to do the actually iteration,
and only sets the passed-in iterator within the loop itself. Since
the loop itself may never be executed (if the list is empty), the
possibility exists that it will not set the iterator, so it could be
uninitialized afterward.

This version was quickly followed by a second attempt, described as "
a work of art":

    #define list_for_each_entry(pos, head, member)                              \
        for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member);    \
             !list_entry_is_head(pos, head, member);                            \
             pos = list_next_entry(pos, member))

Now the loop-scope iteration variable is declared with the same name
as the outer variable, shadowing it. With this version, the iterator
variable declared in the outer scope will never be used within the
loop at all.

Torvalds's hope with both of these attempts was that this would cause
the compiler to generate warnings if the (outer) iterator was used
outside the loop, since it will no longer have been initialized by
the loop itself. That did not work, though; there are places in the
code that explicitly initialize the iterator now and, in any case,
the "use of uninitialized variable" warning is disabled in the kernel
due to excessive false positives.

James Bottomley suggested a different approach:

    #define list_for_each_entry(pos, head, member)                              \
        for (pos = list_first_entry(head, typeof(*pos), member);                \
             !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL;   \
             pos = list_next_entry(pos, member))

This version would explicitly set the iterator variable to NULL on
exit from the loop, causing any code that uses it to (presumably)
fail. Torvalds pointed out the obvious problem with this attempt: it
changes the semantics of a macro that is widely used throughout the
kernel and would likely introduce bugs. It would also make life
difficult for developers backporting patches to stable kernels that
didn't have the newer semantics.

Yet another approach was proposed by Xiaomeng Tong:

    #define list_for_each_entry_inside(pos, type, head, member)         \
        for (type * pos = list_first_entry(head, type, member);         \
             !list_entry_is_head(pos, head, member);                    \
             pos = list_next_entry(pos, member))

Tong's patch set created a new set of macros, with new names, with
the idea that existing code could be converted over one usage at a
time. There would be no externally declared iterator at all; instead,
the name and type of the iterator are passed as arguments, and the
iterator is declared within the scope of the loop itself. Torvalds,
however, disliked this approach as well. Its use leads to long,
difficult-to-read lines of code in almost every use and, he said,
puts the pain in the wrong place: "We should strive for the *bad*
cases to have to do extra work, and even there we should really
strive for legibility".

A solution at last?

After having rejected various solutions, Torvalds went off to think
about what a good solution might look like. Part of the problem, he
concluded, is that the type of the containing structure is separate
from the list_head structure, making the writing of iterator macros
harder. If those two types could be joined somehow, things would be
easier. Shortly thereafter, he came up with a solution that
implements this idea. It starts with a new declaration macro:

     #define list_traversal_head(type,name,target_member) \
        union { struct list_head name; type *name##_traversal_type; }

This macro would be used to declare the real head of the list -- not
the list_head entries contained within other structures.
Specifically, it declares a variable of this new union type
containing a list_head structure called name, and a pointer to the
containing structure type called name_traversal_type. The pointer is
never used as such; it is just a way of tying the type of the
containing structure to the list_head variable.

Then, there is a new iterator:

    #define list_traverse(pos, head, member) \
        for (typeof(*head##_traversal_type) pos = list_first_entry(head, typeof(*pos), member);\
            !list_entry_is_head(pos, head, member);     \
            pos = list_next_entry(pos, member))

Code can walk through a list by using list_traverse() instead of
list_for_each_entry(). The iterator variable will be pos; it will
only exist within the loop itself. The anchor of the list is passed
as head, while member is the name of the list_head structure within
the containing structure. The patch includes a couple of conversions
to show what the usage would look like.

This, Torvalds thinks, is "the way forward". Making this change is
probably a years-long project; there are over 15,000 uses of
list_for_each_entry() (and variants) within the kernel. Each of those
will eventually need to be changed, and the declaration of the list
anchor must also change at the same time. So it is not a quick fix,
but it could lead to a safer linked-list implementation in the kernel
in the long run.

One might argue that all of this is self-inflicted pain caused by the
continued use of C in the kernel. That may be true, but better
alternatives are in somewhat short supply. For example, since the
Rust language, for all of its merits, does not make life easy for
anybody wanting to implement a linked list, a switch to that language
would not automatically solve the problem. So kernel developers seem
likely to have to get by with this kind of tricky infrastructure for
some time yet.

[Send a free link]


-----------------------------------------
(Log in to post comments)

Toward a better list iterator for the kernel

Posted Mar 10, 2022 16:06 UTC (Thu) by pj (subscriber, #4506) [Link]

I read part of that thread and what came to mind to me was that the
devs were fighting the C preprocessor, and I was reminded of Zig
trying to basically be (arguably) C without the cpp. Maybe once it's
done Zig will manage to make it into the kernel - it's a better
candidate than Rust, IMO, being much more C-like and having full C
binary compatibility.

[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 20:04 UTC (Thu) by pankyraghv (subscriber, #
153275) [Link]

I know many people in the kernel community hates cpp but I would
recommend everyone to take a look at SerenityOS. They have done a
fantastic job of rolling out an in-house standard library that
actually makes coding in cpp enjoyable (mostly) while reaping the
benefits of all the modern features that can catch bugs at compile
time. IMO a subset of cpp with a good custom standard library would
be a better fit to writing kernel drivers than use something like
Rust which has a very different programming paradigm. I also feel
something light like Zig might also be a good option.
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 21:47 UTC (Thu) by bartoc (subscriber, #124262) [
Link]

I'm concerned about how long zig tends to defer typechecking of code.
Maybe it doesn't matter in the real world though.

I really love how zig does generics; you write type constructors as
just .... normal functions that return a type!
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 22:47 UTC (Thu) by roc (subscriber, #30627) [Link
]

The main point of Rust in the kernel is to get much-improved safety
properties. Zig doesn't provide that.
[Reply to this comment]
My linked lists

Posted Mar 10, 2022 16:15 UTC (Thu) by abatters ( supporter , #
6932) [Link]

For non-circular linked lists in my userspace code, my approach is to
define a dedicated macro for each specific list, defining the name of
its head, tail, forward, and back pointers. Then that specific macro
can be used in combination with one of ~25 generic doubly-linked list
macros I have defined. Works well for having a single object in
multiple lists, and having multiple lists in a single container. No
iteration macro is necessary, since you have pointers to the real
types (no container_of needed) and it terminates with NULL.

// Generic doubly-linked list macro (one of ~25)
#define DLIST_ADD_TAIL(head, tail, forw, back, item) \
   do { \
      if ((tail) == NULL) { \
         (head) = (item); \
      } else { \
         (item)->back = (tail); \
         (tail)->forw = (item); \
      } \
      (tail) = (item); \
   } while (0)

// Example linked list structs
struct item {
  struct item *iforw, *iback;
};
struct container {
  struct item *ihead, *itail;
};

// Example macro defining a specific list
#define CONTAINER_ITEM_LIST(container, func, args...) \
  func((container)->ihead, (container)->itail, iforw, iback , ## args)

void container_add_item_tail(struct container *container, struct *item)
{
   CONTAINER_ITEM_LIST(container, DLIST_ADD_TAIL, item);
}

[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 17:08 UTC (Thu) by jengelh (subscriber, #33263) [
Link]

>One might argue that all of this is self-inflicted pain caused by
the continued use of C in the kernel. That may be true, but better
alternatives are in somewhat short supply.

I disagree; alternatives are not in short supply, but they have
tradeoffs that some people may be unwilling to go with. The Linux
kernel linked list implementation has two properties:

- the linked list metadata (prev/next) pointers is not separated from
the struct => layering violation
- an object can be part of multiple linked list => who's the owner
responsible for cleanup?
(- the iterator interface of list_head is just a consequence of the
two)

And these are approached with

- inode is, in terms of responsibility, strictly a child of the list
and/or its internal metadata
- only one owner is allowed so there is no doubt who has to clean up

Using C++ as a concrete example now, one would arrive at:

1. Use std::list<std::shared_ptr<inode>>. Safe from leaks, safe from
dangling pointers, but it needs refcounting to do the job, and some
people may not like the performance characteristics of that
refcounting.
2. Use std::list<inode> for the main list, and std::list<inode *> for
the secondaries. Cleanup is still guaranteed, but you traded
refcounting for the potential danger of dangling pointers.
(3. Keep on allocating your inodes and lists manually like before...
as C++ is almost source-compatible with C.)
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 17:52 UTC (Thu) by EnigmaticSeraph (subscriber, #
50582) [Link]

The kernel generally tolerates no performance hits, esp. with such a
widely-used structure. And while there exists e.g.:
https://www.boost.org/doc/libs/1_78_0/doc/html/intrusive/...
, which can be configured to be as would've been at the C level, but
generic. However, kernel devs tend to be wary of C++ in the kernel,
for not-unfounded reasons.
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 21:55 UTC (Thu) by bartoc (subscriber, #124262) [
Link]

Note that what the list iterator does is pretty far from how C++
iterators work. This style of iterator is "I write the loop, and I'll
insert your body with the right item all set up", where C++ is "you
write the loop, and ask me for each item"

There's nothing really preventing you from doing the C++ approach in
C, although you need to be careful about inlining, you just write an
"iterator init from structure" "iterator next" and "is iterator done"
function and call them in the appropriate places in a for loop.
However I think the more intrusive way of doing things is quite a lot
clearer when reading the iterator implementation, esp for more
complicated iterators.
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 22:13 UTC (Thu) by ppisa (subscriber, #67307) [
Link]

I have need for poor STL relative in C year ago and uLUt library was
a result. Because it has been intended even for use in kernel at the
times of RTLinux I have decide to be compatible/reuse kernel list but
provide type-safe interfaces for each relation

https://sourceforge.net/p/ulan/ulut/ci/master/tree/ulut/u...

This way you never use container_of directly and risk to use
incorrect offset between list node location and start of containing
structure is suppressed. For AVL, hashes and sorted arrays I have
used other structures than kernel. The library provides even
iterators which works over all of these basic types.

But macros which expands to the static inline functions are really
huge. But reasonable compilers generate code better for these
variants with CUSTom prefix than when generic functions with void
pointers are used.

The library is base in many project, our RS485 multimaster project
http://ulan.sourceforge.net/ , CAN and CANopen http://
ortcan.sourceforge.net/ more instruments using inhouse our
SmallToolKit graphic library https://gitlab.com/pikron/sw-base/suitk/
-/wikis/home which has been used even is safety sensitive medical
devices etc.
[Reply to this comment]
Toward a better list iterator for the kernel

Posted Mar 10, 2022 22:50 UTC (Thu) by roc (subscriber, #30627) [Link
]

> the "use of uninitialized variable" warning is disabled in the
kernel due to excessive false positives.

Wow, this is terrible. I had no idea.

Given there are hardening patches to force zeroing of local
variables, why not just initialize them in the source as necessary to
eliminate the existing warnings and turn that warning on?
[Reply to this comment]

                  Copyright (c) 2022, Eklektix, Inc.
   Comments and public postings are copyrighted by their creators.
          Linux is a registered trademark of Linus Torvalds