https://lwn.net/SubscriberLink/885941/01fdc39df2ecc25f/ LWN.net Logo LWN .net News from the source LWN * Content + Weekly Edition + Archives + Search + Kernel + Security + Distributions + Events calendar + Unread comments + ------------------------------------------------------------- + LWN FAQ + Write for us User: [ ] Password: [ ] [Log in] | [Subscribe] | [Register] Subscribe / Log in / New account Moving the kernel to modern C [LWN subscriber-only content] Welcome to LWN.net Free trial subscription The following subscription-only Try LWN for free for 0 content has been made available to month: no payment or you by an LWN subscriber. Thousands credit card required. of subscribers depend on LWN for Activate your trial the best news from the Linux and subscription now and see free software communities. If you why thousands of readers enjoy this article, please consider subscribe to LWN.net. accepting the trial offer on the right. Thank you for visiting LWN.net! By Jonathan Corbet February 24, 2022 Despite its generally fast-moving nature, the kernel project relies on a number of old tools. While critics like to focus on the community's extensive use of email, a possibly more significant anachronism is the use of the 1989 version of the C language standard for kernel code -- a standard that was codified before the kernel project even began over 30 years ago. It is looking like that longstanding practice could be coming to an end as soon as the 5.18 kernel, which can be expected in May of this year. Linked-list concerns The discussion started with this patch series from Jakob Koschel, who is trying to prevent speculative-execution vulnerabilities tied to the kernel's linked-list primitives. The kernel makes extensive use of doubly-linked lists defined by struct list_head: struct list_head { struct list_head *next, *prev; }; This structure is normally embedded into some other structure; in this way, linked lists can be made with any structure type of interest. Along with the type, the kernel provides a vast array of functions and macros that can be used to traverse and manipulate linked lists. One of those is list_for_each_entry(), which is a macro masquerading as a sort of control structure. To see how this macro is used, imagine that the kernel included a structure like this: struct foo { int fooness; struct list_head list; }; The list member can be used to create a doubly-linked list of foo structures; a separate list_head structure is usually declared as the beginning of such a list; assume we have one called foo_list. Traversing this list is possible with code like: struct foo *iterator; list_for_each_entry(iterator, &foo_list, list) { do_something_with(iterator); } /* Should not use iterator here */ The list parameter tells the macro what the name of the list_head structure is within the foo structure. This loop will be executed once for each element in the list, with iterator pointing to that element. Koschel included a patch fixing a bug in the USB subsystem where the iterator passed to this macro was used after the exit from the macro, which is a dangerous thing to do. Depending on what happens within the list, the contents of that iterator could be something surprising, even in the absence of speculative execution. Koschel fixed the problem by reworking the code in question to stop using the iterator after the loop. The plot twists Linus Torvalds didn't much like the patch and didn't see how it related to speculative-execution vulnerabilities. After Koschel explained the situation further, though, Torvalds agreed that "this is just a regular bug, plain and simple" and said it should be fixed independently of the larger series. But then he wandered into the real source of the problem: that the iterator passed to the list-traversal macros must be declared in a scope outside of the loop itself: The whole reason this kind of non-speculative bug can happen is that we historically didn't have C99-style "declare variables in loops". So list_for_each_entry() - and all the other ones - fundamentally always leaks the last HEAD entry out of the loop, simply because we couldn't declare the iterator variable in the loop itself. If it were possible to write a list-traversal macro that could declare its own iterator, then that iterator would not be visible outside of the loop and this kind of problem would not arise. But, since the kernel is stuck on the C89 standard, declaring variables within the loop is not possible. Torvalds said that perhaps the time had come to look to moving to the C99 standard -- it is still over 20 years old, but is at least recent enough to allow block-level variable declarations. As he noted, this move hasn't been done in the past "because we had some odd problem with some ancient gcc versions that broke documented initializers". But, in the meantime, the kernel has moved its minimum GCC requirement to version 5.1, so perhaps those bugs are no longer relevant. Arnd Bergmann, who tends to keep a close eye on cross-architecture compiler issues, agreed that it should be possible for the kernel to move forward. Indeed, he suggested that it would be possible to go as far as the C11 standard (from 2011) while the change was being made, though he wasn't sure that C11 would bring anything new that would be useful to the kernel. It might even be possible to move to C17 or even the yet-unfinished C2x version of the language. That, however, has a downside in that it "would break gcc-5/6/7 support", and the kernel still supports those versions currently. Raising the minimum GCC version to 8.x would likely be more of a jump than the user community would be willing to accept at this point. Moving to C11 would not require changing the minimum GCC version, though, and thus might be more readily doable. Torvalds was in favor of that idea: "I really would love to finally move forward on this, considering that it's been brewing for many many years". After Bergmann confirmed that it should be possible to do so, Torvalds declared: "Ok, somebody please remind me, and let's just try this early in the 5.18 merge window". The 5.18 merge window is less than one month away, so this is a change that could happen in the near future. It is worth keeping in mind, though, that a lot of things can happen between the merge window and the 5.18 release. Moving to a new version of the language standard could reveal any number of surprises in obscure places in the kernel; it would not take many of those to cause the change to be reverted for now. But, if all goes well, the shift to C11 will happen in the next kernel release. Converting all of the users of list_for_each_entry() and variants (of which there are well over 15,000 in the kernel) to a new version that doesn't expose the internal iterator seems likely to take a little longer, though. Index entries for this article Kernel Build system Kernel GCC [Send a free link] Did you like this article? Please accept our trial subscription offer to be able to see more content like it and to participate in the discussion. ----------------------------------------- (Log in to post comments) Moving the kernel to modern C Posted Feb 24, 2022 15:12 UTC (Thu) by ballombe (subscriber, #9523) [ Link] Note that, if for some reason you need to stay with c89, you can always add a block around the for() statement to hold the loop variable. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 15:51 UTC (Thu) by smurf (subscriber, #17840) [ Link] You'd need to do that to each caller, which is a *lot* of code churn. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 16:22 UTC (Thu) by pbonzini ( supporter , # 60935) [Link] Is "documented initializers" Linus's typo for "designated initializers"? [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 17:36 UTC (Thu) by iabervon (subscriber, #722) [ Link] Instead of leaking a not-necessarily-valid pointer, couldn't the macro set it to NULL at the end? Actually, I'm surprised there isn't a standard trick for doing an assignment that will be an error unless the compiler eliminates it as dead code. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 17:59 UTC (Thu) by Paf (subscriber, #91811) [Link ] Cost. Pretty significant cost in some cases for something that shouldn't even be necessary. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 20:09 UTC (Thu) by nybble41 (subscriber, #55106) [Link] It should be cost-free in any case where the iteration variable isn't accessed after the loop, since the compiler would eliminate the dead store. The code change is also fairly trivial: just edit the condition from "&pos->member != (head)" to "(&pos->member != (head)) || ((pos = NULL))". Unfortunately this alone doesn't handle loops which exit early due to "break" or "goto". The "goto" case is unavoidable, but the "break" case can be dealt with by wrapping the macro in a second, trivial loop as shown in this example[0]. Note that the generated code (for gcc 5.1 with -O2) is *identical* between the version with the extra loop (traverse1) and the original version which does not set the iterator to NULL after the loop (traverse2). The initialization of the iterator to the flag state (-1), the condition for the outer loop, and the store of NULL to the iterator after the loop are all successfully eliminated. [0] https://godbolt.org/z/4obYManzc [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 20:48 UTC (Thu) by iabervon (subscriber, #722) [ Link] It might work to have: extern unsigned long list_iterator_live_after_loop; and "|| ((pos = (void *) list_iterator_live_after_loop), 0)" I didn't try changing the kernel macro that way, but my little test code doesn't link if the iterator is used after the loop, but does link and work if it's not used. As I recall, the kernel is already using that sort of trick to use compiler optimization to remove an error message only if the compiler can disprove it. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 20:26 UTC (Thu) by iabervon (subscriber, #722) [ Link] Oh, I meant to imply that the compiler would eliminate all of those writes except for ones that expose bugs, but then I got side-tracked by wondering if you could make the kernel not even link unless the compiler eliminated the write. Anyway, it wouldn't affect the generated code unless the compiler can't tell the code is correct. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 22:12 UTC (Thu) by Paf (subscriber, #91811) [Link ] Very good point. God I'd sure love to get to a newer C standard though... [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 21:11 UTC (Thu) by abatters ( supporter , # 6932) [Link] It would break code that does this: list_for_each_entry(iterator, &foo_list, list) { if (do_something_with(iterator)) { break; } } if (list_entry_is_head(iterator, &foo_list, list)) { // iteration finished } else { do_something_else_with(iterator); } All this "compare to head" nonsense is why I prefer regular NULL-terminated linked lists to the kernel's circular linked lists. Insert/delete may take more instructions but iteration is much easier. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 21:55 UTC (Thu) by nybble41 (subscriber, #55106) [Link] Yes, and you also have macros like for_each_list_entry_continue() which depend on the value being left in the iterator. All of these would also break if the macro was changed to declare the iterator inside the `for` statement, C99-style. One way to work around the problem in your example would be to move the condition inside the loop, like this: list_for_each_entry(iterator, &foo_list, list) { // ... if (do_something_with(iterator)) { do_something_else_with(iterator); break; } // ... if (&iterator->list == &foo_list) { // this is the last entry; iteration finished } } The compiler should be smart enough to avoid checking the end condition twice in each iteration. Of course this becomes much less convenient if there is more than one break statement. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 17:46 UTC (Thu) by flussence (subscriber, #85566) [Link] My thought is that it's silly to require compatibility with standards old enough that even software implementing their newer versions are falling out of long-term support. There are other cases where arguably there's a risk of causing a flag day, but moving off of C89 isn't one of them. [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 18:47 UTC (Thu) by adobriyan (subscriber, #30858) [Link] Yay! Don't forget this one too: # warn about C99 declaration after statement KBUILD_CFLAGS += -Wdeclaration-after-statement [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 19:37 UTC (Thu) by zuzzurro (subscriber, #61118) [Link] If the plan is to make this move at the beginning of the next cycle, shouldn't the -next kernel adopt it right now? [Reply to this comment] Moving the kernel to modern C Posted Feb 24, 2022 20:30 UTC (Thu) by marcH (subscriber, #57642) [ Link] Yes please, finally! Combined declarations and initializations like every other programming language. More 'const' and fewer "this variable 'may' be used uninitialized" guessing/silliness. No more reverse Christmas trees. In even more advanced languages 'const' is the default but let's not get carried away; too much maths that could scare hardware engineers emotionally attached to their registers. [Reply to this comment] Copyright (c) 2022, Eklektix, Inc. Comments and public postings are copyrighted by their creators. Linux is a registered trademark of Linus Torvalds