https://lwn.net/SubscriberLink/887097/7ca69c6bfa3584c0/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Distributions + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Toward a better list iterator for the kernel [LWN subscriber-only content] Welcome to LWN.net The following subscription-only content has been made available to you by an LWN subscriber. Thousands of subscribers depend on LWN for the best news from the Linux and free software communities. If you enjoy this article, please consider subscribing to LWN. Thank you for visiting LWN.net! By Jonathan Corbet March 10, 2022 Linked lists are conceptually straightforward; they tend to be taught toward the beginning of entry-level data-structures classes. It might thus be surprising that the kernel community is concerned about its longstanding linked-list implementation and is not only looking for ways to solve some problems, but has been struggling to find that solution. It now appears that some improvements might be at hand: after more than 30 years, the kernel developers may have found a better way to safely iterate through a linked list. Kernel linked lists C, of course, makes the creation of linked lists relatively easy. What it does not do, though, is help in the creation of generic linked lists that can contain any type of structure. By its nature, C lends itself to the creation of ad hoc linked lists in every situation where they are needed, resulting in boilerplate code and duplicated definitions. Every linked-list implementation must be reviewed for correctness. It would be far nicer to have a single implementation that was known to work so that kernel developers could more profitably use their time introducing bugs into code that is truly unique to their problem area. The kernel, naturally, has a few solutions for linked lists, but the most commonly used is struct list_head: struct list_head { struct list_head *next, *prev; }; This structure can be employed in the obvious way to create doubly linked lists; a portion of such a list might look like: [list-head] struct list_head can represent a linked list nicely, but has one significant disadvantage: it cannot hold any other information. Usually, this kind of data structure is needed to link some other data of interest; the list structure by itself isn't the point. C does not make it easy to create a linked list with an arbitrary payload, but it is easy to embed struct list_head inside the structure that the developer actually wants to organize into a list: [list-head2] This is how linked lists are typically constructed in the kernel. Macros like container_of() can be used to turn a pointer to a list_head structure into a pointer to the containing structure. Code that works with linked lists will almost always use this macro (often indirectly) to gain access to the larger payload. One final detail that is worthy of note is that the actual head of the list tends to be a list_head structure that is not embedded within the structure type of interest: [list-head3] For a real-world example of how this infrastructure is used, consider struct inode, which represents a file within a filesystem. Inodes can be on a lot of lists simultaneously, so struct inode contains no less than five separate list_head structures; unfortunately, your editor's meager artistic skills are not up to the task of showing what the resulting data structure looks like. One of those list_head structures, i_sb_list, is used to associate the inode with the superblock of the filesystem it belongs to. The list_head structure that anchors this list is the s_inodes field of struct super_block. That is the one list_head structure in this particular list that is not embedded within an instance of struct inode. Traversal of a linked list will typically begin at the anchor and follow the next pointers until the head is found again. One can, of course, open-code this traversal, but the kernel also provides a long list of functions and macros for this purpose. One of those is list_for_each_entry(), which will go through the entire list, providing a pointer to the containing structure at each node. Typical code using this macro looks like this: struct inode *inode; /* ... */ list_for_each_entry(inode, &sb->s_inodes, i_sb_list) { /* Process each inode here */ } /* Should not use the iterator here */ Within the loop, the macro uses container_of() to point inode at the containing inode structure for each list entry. The problem is: what is the value of inode on exit from the loop? If code exited the loop with a break statement, inode will point to the element under consideration at that time. If, however, execution passes through the entire list, inode will be the result of using container_of() on the separate list head, which is not contained within an inode structure. That puts the kernel deeply into undefined-behavior territory and could lead to any of a number of bad things. For this reason, the rule for macros like list_for_each_entry() is that the iterator variable should not be used outside of the loop. If a value needs to be accessed after the loop, it should be saved in a separate variable for that purpose. It's an implicit rule, though; nobody felt the need to actually note this restriction in the documentation for the macros themselves. Unsurprisingly, this rule is thus more of a guideline at best; the kernel is full of code that does, indeed, use the iterator variable after the loop. The search for a safer iterator When we last looked at this issue, Jakob Koschel had posted patches fixing some of these sites; he continued this project afterward. Linus Torvalds, however, thought that this approach was inadequate because it did nothing to prevent future problems from being introduced: So if we have the basic rule being "don't use the loop iterator after the loop has finished, because it can cause all kinds of subtle issues", then in _addition_ to fixing the existing code paths that have this issue, I really would want to (a) get a compiler warning for future cases and (b) make it not actually _work_ for future cases. Because otherwise it will just happen again. Along the way, the developers came to the realization that moving to a newer version of the C standard might help, since it would allow the declaration of the iterator variable within the loop itself (thus making it invisible outside of the loop). Torvalds made an initial attempt at a solution that looked like this: #define list_for_each_entry(pos, head, member) \ for (typeof(pos) __iter = list_first_entry(head, typeof(*pos), member); \ !list_entry_is_head(__iter, head, member) && (((pos)=__iter),1); \ __iter = list_next_entry(__iter, member)) This version of the macro still accepts the iterator variable as an argument, keeping the same prototype as before; this is important, since there are thousands of instances of this macro in the kernel source. But it declares a new variable to do the actually iteration, and only sets the passed-in iterator within the loop itself. Since the loop itself may never be executed (if the list is empty), the possibility exists that it will not set the iterator, so it could be uninitialized afterward. This version was quickly followed by a second attempt, described as " a work of art": #define list_for_each_entry(pos, head, member) \ for (typeof(pos) pos = list_first_entry(head, typeof(*pos), member); \ !list_entry_is_head(pos, head, member); \ pos = list_next_entry(pos, member)) Now the loop-scope iteration variable is declared with the same name as the outer variable, shadowing it. With this version, the iterator variable declared in the outer scope will never be used within the loop at all. Torvalds's hope with both of these attempts was that this would cause the compiler to generate warnings if the (outer) iterator was used outside the loop, since it will no longer have been initialized by the loop itself. That did not work, though; there are places in the code that explicitly initialize the iterator now and, in any case, the "use of uninitialized variable" warning is disabled in the kernel due to excessive false positives. James Bottomley suggested a different approach: #define list_for_each_entry(pos, head, member) \ for (pos = list_first_entry(head, typeof(*pos), member); \ !list_entry_is_head(pos, head, member) && ((pos = NULL) == NULL; \ pos = list_next_entry(pos, member)) This version would explicitly set the iterator variable to NULL on exit from the loop, causing any code that uses it to (presumably) fail. Torvalds pointed out the obvious problem with this attempt: it changes the semantics of a macro that is widely used throughout the kernel and would likely introduce bugs. It would also make life difficult for developers backporting patches to stable kernels that didn't have the newer semantics. Yet another approach was proposed by Xiaomeng Tong: #define list_for_each_entry_inside(pos, type, head, member) \ for (type * pos = list_first_entry(head, type, member); \ !list_entry_is_head(pos, head, member); \ pos = list_next_entry(pos, member)) Tong's patch set created a new set of macros, with new names, with the idea that existing code could be converted over one usage at a time. There would be no externally declared iterator at all; instead, the name and type of the iterator are passed as arguments, and the iterator is declared within the scope of the loop itself. Torvalds, however, disliked this approach as well. Its use leads to long, difficult-to-read lines of code in almost every use and, he said, puts the pain in the wrong place: "We should strive for the *bad* cases to have to do extra work, and even there we should really strive for legibility". A solution at last? After having rejected various solutions, Torvalds went off to think about what a good solution might look like. Part of the problem, he concluded, is that the type of the containing structure is separate from the list_head structure, making the writing of iterator macros harder. If those two types could be joined somehow, things would be easier. Shortly thereafter, he came up with a solution that implements this idea. It starts with a new declaration macro: #define list_traversal_head(type,name,target_member) \ union { struct list_head name; type *name##_traversal_type; } This macro would be used to declare the real head of the list -- not the list_head entries contained within other structures. Specifically, it declares a variable of this new union type containing a list_head structure called name, and a pointer to the containing structure type called name_traversal_type. The pointer is never used as such; it is just a way of tying the type of the containing structure to the list_head variable. Then, there is a new iterator: #define list_traverse(pos, head, member) \ for (typeof(*head##_traversal_type) pos = list_first_entry(head, typeof(*pos), member);\ !list_entry_is_head(pos, head, member); \ pos = list_next_entry(pos, member)) Code can walk through a list by using list_traverse() instead of list_for_each_entry(). The iterator variable will be pos; it will only exist within the loop itself. The anchor of the list is passed as head, while member is the name of the list_head structure within the containing structure. The patch includes a couple of conversions to show what the usage would look like. This, Torvalds thinks, is "the way forward". Making this change is probably a years-long project; there are over 15,000 uses of list_for_each_entry() (and variants) within the kernel. Each of those will eventually need to be changed, and the declaration of the list anchor must also change at the same time. So it is not a quick fix, but it could lead to a safer linked-list implementation in the kernel in the long run. One might argue that all of this is self-inflicted pain caused by the continued use of C in the kernel. That may be true, but better alternatives are in somewhat short supply. For example, since the Rust language, for all of its merits, does not make life easy for anybody wanting to implement a linked list, a switch to that language would not automatically solve the problem. So kernel developers seem likely to have to get by with this kind of tricky infrastructure for some time yet. [Send a free link] ----------------------------------------- (Log in to post comments) Toward a better list iterator for the kernel Posted Mar 10, 2022 16:06 UTC (Thu) by pj (subscriber, #4506) [Link] I read part of that thread and what came to mind to me was that the devs were fighting the C preprocessor, and I was reminded of Zig trying to basically be (arguably) C without the cpp. Maybe once it's done Zig will manage to make it into the kernel - it's a better candidate than Rust, IMO, being much more C-like and having full C binary compatibility. [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 20:04 UTC (Thu) by pankyraghv (subscriber, # 153275) [Link] I know many people in the kernel community hates cpp but I would recommend everyone to take a look at SerenityOS. They have done a fantastic job of rolling out an in-house standard library that actually makes coding in cpp enjoyable (mostly) while reaping the benefits of all the modern features that can catch bugs at compile time. IMO a subset of cpp with a good custom standard library would be a better fit to writing kernel drivers than use something like Rust which has a very different programming paradigm. I also feel something light like Zig might also be a good option. [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 21:47 UTC (Thu) by bartoc (subscriber, #124262) [ Link] I'm concerned about how long zig tends to defer typechecking of code. Maybe it doesn't matter in the real world though. I really love how zig does generics; you write type constructors as just .... normal functions that return a type! [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 22:47 UTC (Thu) by roc (subscriber, #30627) [Link ] The main point of Rust in the kernel is to get much-improved safety properties. Zig doesn't provide that. [Reply to this comment] My linked lists Posted Mar 10, 2022 16:15 UTC (Thu) by abatters ( supporter , # 6932) [Link] For non-circular linked lists in my userspace code, my approach is to define a dedicated macro for each specific list, defining the name of its head, tail, forward, and back pointers. Then that specific macro can be used in combination with one of ~25 generic doubly-linked list macros I have defined. Works well for having a single object in multiple lists, and having multiple lists in a single container. No iteration macro is necessary, since you have pointers to the real types (no container_of needed) and it terminates with NULL. // Generic doubly-linked list macro (one of ~25) #define DLIST_ADD_TAIL(head, tail, forw, back, item) \ do { \ if ((tail) == NULL) { \ (head) = (item); \ } else { \ (item)->back = (tail); \ (tail)->forw = (item); \ } \ (tail) = (item); \ } while (0) // Example linked list structs struct item { struct item *iforw, *iback; }; struct container { struct item *ihead, *itail; }; // Example macro defining a specific list #define CONTAINER_ITEM_LIST(container, func, args...) \ func((container)->ihead, (container)->itail, iforw, iback , ## args) void container_add_item_tail(struct container *container, struct *item) { CONTAINER_ITEM_LIST(container, DLIST_ADD_TAIL, item); } [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 17:08 UTC (Thu) by jengelh (subscriber, #33263) [ Link] >One might argue that all of this is self-inflicted pain caused by the continued use of C in the kernel. That may be true, but better alternatives are in somewhat short supply. I disagree; alternatives are not in short supply, but they have tradeoffs that some people may be unwilling to go with. The Linux kernel linked list implementation has two properties: - the linked list metadata (prev/next) pointers is not separated from the struct => layering violation - an object can be part of multiple linked list => who's the owner responsible for cleanup? (- the iterator interface of list_head is just a consequence of the two) And these are approached with - inode is, in terms of responsibility, strictly a child of the list and/or its internal metadata - only one owner is allowed so there is no doubt who has to clean up Using C++ as a concrete example now, one would arrive at: 1. Use std::list>. Safe from leaks, safe from dangling pointers, but it needs refcounting to do the job, and some people may not like the performance characteristics of that refcounting. 2. Use std::list for the main list, and std::list for the secondaries. Cleanup is still guaranteed, but you traded refcounting for the potential danger of dangling pointers. (3. Keep on allocating your inodes and lists manually like before... as C++ is almost source-compatible with C.) [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 17:52 UTC (Thu) by EnigmaticSeraph (subscriber, # 50582) [Link] The kernel generally tolerates no performance hits, esp. with such a widely-used structure. And while there exists e.g.: https://www.boost.org/doc/libs/1_78_0/doc/html/intrusive/... , which can be configured to be as would've been at the C level, but generic. However, kernel devs tend to be wary of C++ in the kernel, for not-unfounded reasons. [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 21:55 UTC (Thu) by bartoc (subscriber, #124262) [ Link] Note that what the list iterator does is pretty far from how C++ iterators work. This style of iterator is "I write the loop, and I'll insert your body with the right item all set up", where C++ is "you write the loop, and ask me for each item" There's nothing really preventing you from doing the C++ approach in C, although you need to be careful about inlining, you just write an "iterator init from structure" "iterator next" and "is iterator done" function and call them in the appropriate places in a for loop. However I think the more intrusive way of doing things is quite a lot clearer when reading the iterator implementation, esp for more complicated iterators. [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 22:13 UTC (Thu) by ppisa (subscriber, #67307) [ Link] I have need for poor STL relative in C year ago and uLUt library was a result. Because it has been intended even for use in kernel at the times of RTLinux I have decide to be compatible/reuse kernel list but provide type-safe interfaces for each relation https://sourceforge.net/p/ulan/ulut/ci/master/tree/ulut/u... This way you never use container_of directly and risk to use incorrect offset between list node location and start of containing structure is suppressed. For AVL, hashes and sorted arrays I have used other structures than kernel. The library provides even iterators which works over all of these basic types. But macros which expands to the static inline functions are really huge. But reasonable compilers generate code better for these variants with CUSTom prefix than when generic functions with void pointers are used. The library is base in many project, our RS485 multimaster project http://ulan.sourceforge.net/ , CAN and CANopen http:// ortcan.sourceforge.net/ more instruments using inhouse our SmallToolKit graphic library https://gitlab.com/pikron/sw-base/suitk/ -/wikis/home which has been used even is safety sensitive medical devices etc. [Reply to this comment] Toward a better list iterator for the kernel Posted Mar 10, 2022 22:50 UTC (Thu) by roc (subscriber, #30627) [Link ] > the "use of uninitialized variable" warning is disabled in the kernel due to excessive false positives. Wow, this is terrible. I had no idea. Given there are hardening patches to force zeroing of local variables, why not just initialize them in the source as necessary to eliminate the existing warnings and turn that warning on? [Reply to this comment] Copyright (c) 2022, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds