[HN Gopher] Moving the Linux Kernel to Modern C
       ___________________________________________________________________
        
       Moving the Linux Kernel to Modern C
        
       Author : chmaynard
       Score  : 237 points
       Date   : 2022-02-24 20:10 UTC (2 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | charcircuit wrote:
       | >Raising the minimum GCC version to 8.x would likely be more of a
       | jump than the user community would be willing to accept at this
       | point.
       | 
       | If you are using a 0 day old kernel. Why would you be using GCC
       | 5.x still. 5 1 is almost 7 years old now. 8.1 is almost 4. Can't
       | people just use apt / yum / rpm / whatever to upgrade to the
       | latest gcc? Is that really too much to ask of the community?
        
         | lazide wrote:
         | For an unstable branch maybe, but if people need to actually
         | use this for real work they will want it compiling on something
         | as well understood and baked as possible - while not being
         | completely obsolete.
        
           | charcircuit wrote:
           | People doing real work are using a kernel compiled with a
           | more up to date GCC. From what I can tell online even RHEL is
           | up to date enough by compiling its kernel with GCC 8.x.
        
             | jahlove wrote:
             | RHEL7 is still supported [0] by Red Hat through 2026, and
             | is on GCC 4.8.5 [1]
             | 
             | [0] https://access.redhat.com/product-life-
             | cycles/?product=Red%2...
             | 
             | [1] https://distrowatch.com/table.php?distribution=redhat
        
               | charcircuit wrote:
               | Correct me if I'm wrong but doesn't RHEL stick to a
               | single kernel version and then backport patches?
        
               | yjftsjthsd-h wrote:
               | Officially yes, although we can argue about how much you
               | can "backport" and still count. But even so - if new code
               | requires a newer C version and thus newer compiler, it's
               | harder to backport.
        
         | mwcremer wrote:
         | It's not just a question of what the new version fixes, there
         | is also the question of what it breaks. Sometimes new versions
         | trigger latent bugs no one knew about that luckily happened to
         | work fine on the old tools. Finding and fixing those can be
         | difficult and time-consuming.
        
           | charcircuit wrote:
           | This argument would have merit if everyone still used 5.x to
           | compile Linux, but that's simply not the truth. Most people
           | are using a Linux compiled by a somewhat up to date version
           | of GCC.
        
         | syncsynchalt wrote:
         | You'd be surprised how much downstream grief a bump like this
         | can cause in an ecosystem as broad as the Linux kernel. Dealing
         | with the results can be unexpectedly overwhelming.
         | 
         | It makes sense to play it safe and do this in smaller
         | incremental jumps.
        
         | pm215 wrote:
         | I get the impression that a big part of it is just a general
         | choice to be very conservative about forcing newer minimum
         | versions. So to raise the version bar you have to make the
         | positive case for why it's worthwhile -- merely "gcc 5 is
         | ancient" doesn't suffice. The only reason it moved up from 4.9
         | is actual data-loss-provoking codegen bugs in 4.9 (see
         | discussion in this lkml thread:
         | https://lore.kernel.org/lkml/CAHk-=wjqGRXUp6KOdx-eHYEotGvY=a...
         | )...
         | 
         | I do think they could move up a bit further, but it's useful to
         | be able to just build kernels with the distro compiler and
         | repology thinks that for instance Debian stretch (still an LTS
         | supported version) is only gcc 6.3.
        
         | bombcar wrote:
         | The Linux community is _way way way_ larger than just the sum
         | of distro user bases.
         | 
         | Even if all distros have long ago moved to GCC 8+, you still
         | have other chips and systems running, and who knows what buried
         | somewhere depends on GCC 5.
        
           | charcircuit wrote:
           | You can compile Linux and a user space program with different
           | versions of GCC.
        
             | Koshkin wrote:
             | As well as with Tiny C (fast).
        
           | spicybright wrote:
           | You'll have to upgrade the compiler eventually though, right?
           | I don't know what that process would look like, but generally
           | the longer you wait the more pain you'll face later.
        
       | synergy20 wrote:
       | hmm I submitted the same article 10 hours ago and not got picked
       | up, where is the algorithm.
        
         | chmaynard wrote:
         | Welcome to HN. This happens all the time. I added 'Linux' to
         | the title, which attracted attention.
        
         | tomcam wrote:
         | This has happened to me a number of times! I think time of day
         | has a heavier influence than we might think. Submitting is only
         | part of the algorithm. Both vote count and velocity of votes
         | are really important too.
        
       | progbits wrote:
       | From [1]:
       | 
       | > You are not "introducing" a new macro for this, you are
       | modifying the existing one such that all users of it now have the
       | select_nospec() call in it.
       | 
       | > Is that intentional? This is going to hit a _lot_ of existing
       | entries that probably do not need it at all.
       | 
       | > Why not just create list_for_each_entry_nospec()?
       | 
       | Let's ignore whether this patch is needed or works for now - I
       | don't feel competent to comment on that. But this is such a bad
       | suggestion. Instead of fixing the default to be safe, and
       | possibly having a _unsafe_but_fast() variant for the places where
       | it makes sense they want to keep the broken version and require
       | user to explicitly opt in into safety.
       | 
       | Same as the infamous PHP mysql_real_escape_string (don't make the
       | mistake of using mysql_escape_string!) [2][3] or whole host of C
       | stdlib footguns like strcpy/strncpy [4].
       | 
       | The default, easy, obvious option should be safe. The unsafe but
       | faster option should be hard to use by accident and obviously
       | marked as such from the name.
       | 
       | [1] https://lwn.net/ml/linux-kernel/Yg6iCS0XZB6EtMP7@kroah.com/
       | [2] https://www.php.net/manual/en/function.mysql-escape-
       | string.p... [3] https://www.php.net/manual/en/function.mysql-
       | real-escape-str... [4]
       | https://en.cppreference.com/w/cpp/header/cstring
        
         | quadcore wrote:
         | > Cost.
        
         | masklinn wrote:
         | > Same as the infamous PHP mysql_real_escape_string (don't make
         | the mistake of using mysql_escape_string!) [2][3]
         | 
         | TBF that is really a straight bridge to the mysql C API, which
         | is why it's also in mysqli.
         | 
         | MySQL actually has a third one called
         | mysql_real_escape_string_quote, because if the sql mode
         | NO_BACKSLASH_ESCAPES is set the escaping function needs to know
         | what context it's used in, and thus what quote to double up,
         | and mysql_real_escape_string will fail.
        
         | toast0 wrote:
         | > Same as the infamous PHP mysql_real_escape_string (don't make
         | the mistake of using mysql_escape_string!) [2][3] or whole host
         | of C stdlib footguns like strcpy/strncpy
         | 
         | In all of those, the enhanced function takes more paramaters.
         | Switching the existing name to require a new parameter will
         | break existing code, and people will not want to update. Making
         | the new parameter optional is possible, but IMHO is messier
         | than a new function that has required parameters and then
         | deprecating the old function (and eventually removing it).
        
         | klodolph wrote:
         | This is just how you work with large code bases. Updating
         | 15,000 call sites is far from trivial. These changes happen in
         | multiple phases, and in each phase, you still want to have a
         | working kernel.
         | 
         | The unsafe version can be deprecated and eventually removed,
         | but that is down the road.
        
           | xiphias2 wrote:
           | I'm not sure how it's done with the Linux Kernel (and git
           | definately makes it hard), but at Google (using Perfoce
           | model) there are tools for mass renaming that try to
           | guarantee that there are no regressions. Of course in Linux
           | the amount of C macros make it much harder to work on the
           | syntax trees, but comparision after the preprocessor step
           | should be possible.
        
             | klodolph wrote:
             | IIRC Google works the same way. The automated refactoring
             | tools (Rosie?) may make it go faster, but you generally
             | don't fix up 15,000 call sites in a single change... you
             | break it up into smaller batches spread across the tree.
        
               | acidbaseextract wrote:
               | It's been a long time since I was there, but I thought
               | there were plenty of 15,000 call site refactorings done
               | in a single final CL. Not that the Linux kernel should do
               | the same!
        
               | skybrian wrote:
               | When I was there (several years ago) it was rare to do
               | that across many projects. You don't want to have to roll
               | back everyone due to a problem that affects one project.
               | Also, there's a risk that changes that are happening in
               | the meantime might mean the patch doesn't apply.
        
       | eatbitseveryday wrote:
       | > if all goes well, the shift to C11 will happen in the next
       | kernel release
        
         | stjohnswarts wrote:
         | I'll be pinning that for a couple releases unless there's a
         | major security bug O_O
        
           | yjftsjthsd-h wrote:
           | Just stick to a LTS branch; no reason to give up bugfixes.
        
       | wiineeth wrote:
       | Why modern C when you can move to Rust
        
         | yjftsjthsd-h wrote:
         | Because we want a kernel that works today, and make
         | improvements that can ship this year, not spend a decade
         | rewriting working software.
        
       | [deleted]
        
       | the__alchemist wrote:
       | I hope this isn't too much of a derail. I'm not proficient in C,
       | and am translating a C module to Rust. My biggest complaint is
       | there's so much DRY! Functions, variables etc are all prefixed
       | with the module names, presumably due to lack of namespaces. The
       | `.c` vs `.h` dichotomy is also DRY. The subset language that uses
       | `#` is also a bit of a mess.
        
       | quelsolaar wrote:
       | In may ways you cant avoid more modern C because more modern
       | standards make thing that were ambiguous more clear, and then
       | compilers follow that even if you select an older standard.
       | 
       | I think C99 introduces some bad things that should be keep out of
       | any code base. Variable length arrays being the biggest one.
       | Generics is another one. Being able declare variables anywhere
       | doesn't make the code better but it makes styles diverge and
       | that's a problem.
       | 
       | If Linux wants to adopt C99 they should define what parts of c99
       | should be adopted in their style guide.
        
         | nybble41 wrote:
         | > Being able declare variables anywhere doesn't make the code
         | better...
         | 
         | It does, though. The ability to declare variables "anywhere"
         | and allows a reduction in the portion of the code where a
         | variable exists uninitialized, or initialized with a dummy
         | value which should never be used, or holding an obsolete value
         | which is not intended to be used. It also allows more variables
         | to be declared `const` (which requires them to be initialized
         | in the declaration, which must occur _after_ the initial value
         | is computed) which helps to detect accidental assignment and
         | also makes the code easier to review.
         | 
         | It's worth noting that you can more-or-less declare variables
         | "anywhere" in C89 too--you just have to wrap the variable's
         | scope in a compound statement. From that point of view, C99
         | doesn't change where variables can be declared. It just lets
         | you remove some syntactic clutter.
        
         | jstimpfle wrote:
         | I for one have read a lot of code with C89 declare-first style
         | that was subjectively a lot less clear. This applies in
         | particular to assign-once variables that depend on other
         | calculations.
         | 
         | stuff like this: [0]. I mean, what's the point of declaring
         | variables like _cp_ or _err_ at the beginning when they 're
         | assigned first only halfway down? What is the point of having
         | _path_ around for the whole function, when it is a super
         | temporary assign-once variable with a scope of 3 lines? I 'm
         | having a really hard time coming up with a justification for
         | this, other than "this is the way we've always done it and we
         | like it" (you can find actual rants in defense of this style
         | from about 10 years ago).
         | 
         | For-loop declarations are another instance of this that make
         | the code more fluent IMO. And they help reducing the scope of
         | variables to a strictly smaller block. I would find it hard to
         | argue that this does not remove some bugs or improve
         | readability.
         | 
         | The only thing that declare-first has going for it is that the
         | variables that are in use can be seen at once, a little bit
         | like in a struct declaration. Looking at how optimizers butcher
         | those variables depending on liveness makes this feature seem
         | less valuable, though.
         | 
         | [0]
         | https://github.com/torvalds/linux/blob/2729cfdcfa1cc49bef5a9...
        
       | LAC-Tech wrote:
       | They're discussing the move to C99. Not sure that counts as
       | modern C at this point.
        
         | clhodapp wrote:
         | Given that C99 is much closer in vintage to their current C89
         | than it is to today's date and also just straight up really
         | old, it does seem inappropriate to distinguish that as "modern
         | C".
         | 
         | That said they do seem to be discussing both C99 and C11 so it
         | does seem like going with something that _would_ qualify as
         | modern (especially in light of Linux 's need to be
         | conservative) is actually on the table.
        
           | dahfizz wrote:
           | You also have to look at the contents of the revisions. C99
           | was a massive update for C, and C11 was tiny in comparison
           | (and C17 was basically nothing at all).
           | 
           | Writing C89, without variable declaration in a for loop, and
           | without // comments, etc, feels ancient. C99 on the other
           | hand is what defines how C looks today.
           | 
           | This is also why the attitude is: if they jump to c99, they
           | may as well jump to c11 because they are basically the same.
        
         | flyingfences wrote:
         | They're discussing a move to C11, which counts in my book.
        
           | tombert wrote:
           | I haven't touched C in any serious capacity in quite awhile;
           | how often does C get new revisions, because isn't C11 still
           | more than a decade old?
           | 
           | I understand why the linux kernel folks aren't moving to the
           | bleeding edge (Linux is old, you have to do these ports
           | incrementally), but I'm curious why the pace of language is
           | so slow. I guess it's because C has more or less stabilized
           | and thus further revisions are less necessary?
        
             | LukeShu wrote:
             | > how often does C get new revisions
             | 
             | 1989, 1995, 1999, 2011, 2017.
             | 
             | The 1995 one was an "amendment", not a full revision (and
             | is mostly additions to libc, which Linux doesn't get to
             | use; the only language change is the addition of digraphs).
             | 
             | C17 was a "bugfix" revision; compilers will have applied
             | these fixes to their C11 implementations, so in effect the
             | only difference between C11 and C17 is the value of
             | __STDC_VERSION__. So for most intents and purposes C11 is
             | still the most recent revision.
        
               | tombert wrote:
               | Fair enough! As I said, I suppose C doesn't need to
               | change that much; people use C _because_ they know what
               | they 're getting, and for that to be the case, it needs
               | to be stable.
        
             | dahfizz wrote:
             | > but I'm curious why the pace of language is so slow. I
             | guess it's because C has more or less stabilized and thus
             | further revisions are less necessary?
             | 
             | That is a big part of it. C is an old, stable, complete
             | language.
             | 
             | There is also the existence of gnu extensions, which bridge
             | the gap between language revisions. For example, anonymous
             | unions were added in C11 but they have been a gnu extension
             | since forever.
        
             | syncsynchalt wrote:
             | > I'm curious why the pace of language is so slow
             | 
             | This is a strength of the language.
             | 
             | If you've ever done CI/CD for a large project in a fluid
             | development ecosystem (e.g. nodeJS), you can understand why
             | it might be refreshing to develop your operating system in
             | a language where standards are measured by the decade.
        
           | LAC-Tech wrote:
           | yeah fair enough should've read the rest of the article
           | before my snide comment
        
       | kazinator wrote:
       | C99 has one nice feature compared to C89 that is little talked
       | about: you can initialize local aggregates using expressions that
       | can't be calculated at load-time.                  void fun(int
       | x)        {          struct foo = { x };       // not allowed in
       | C89          int bar[3] = { 0, x };    // ditto        }
       | 
       | Chances are the kernel does this because, I think, it's also a
       | GNU89 extension.
        
       | kwijibob wrote:
       | Does C standard naming have a Y2K problem? :)
        
         | teddyh wrote:
         | Not until 2089.
        
           | qiskit wrote:
           | The C standards aren't published at a regular interval of 10
           | years. They are published as needed or as agreed upon. So it
           | is possible there may be a C standard published in 2089, but
           | highly unlikely.
           | 
           | It's similar to but not quite the same thing as the Y2K
           | problem.
        
         | syncsynchalt wrote:
         | Given C's ubiquity and (likely) longevity it'll have a Y2k89
         | problem: standards released in 2089 and beyond will have to
         | change their naming schema and/or avoid being standardized in
         | particular years. ;)
        
       | thetic wrote:
       | Though C89 does not support declaration of a scoped loop variable
       | like C99 does:                   for(int i = 0; i < 10; i++) {
       | // ...         }
       | 
       | C89 does support declaration of variables at the top of braced
       | blocks whose scope is limited to that block:                   {
       | int i;             for(i = 0; i < 10; i++) {                 //
       | ...             }         }
        
         | spc476 wrote:
         | Then you get dinged by static analysis tools like SonarQube
         | because you introduced a useless scope or increased the
         | "cognitive complexity" to the code and have to explain it to a
         | team leader who might not understand why you did that.
        
           | jrockway wrote:
           | I guess don't use a C99 linter on a C89 codebase?
        
       | dale_glass wrote:
       | One thing I've wondered for some time is why don't the Linux
       | developers modify the compiler to suit their needs better.
       | 
       | At that project scale, wouldn't it start making sense to start
       | solving problems like "If it were possible to write a list-
       | traversal macro that could declare its own iterator [...]" by
       | adding the functionality you want to GCC?
        
         | Jtsummers wrote:
         | That would make the code (and the compiler) non-C89 compliant.
         | At which point, why not move to a different standard rather
         | than bork both the kernel and the compiler if it can meet the
         | needs better?
        
           | electroly wrote:
           | The Linux kernel requires GNU extensions; it's already not
           | ANSI C89.
        
             | [deleted]
        
             | Jtsummers wrote:
             | Alright, making it _more_ non-compliant, then. Still
             | creates problems for everyone and also necessitates a
             | larger jump in the GCC version to support (unless you 're
             | going to back port the new extensions to every version that
             | the kernel currently works with). Currently you can compile
             | the Linux kernel with GCC 5.1. If you changed the compiler
             | to support this one feature (or any other) you'd either
             | need to back port it several versions (potentially back to
             | 5) or abandon all of them. Which is probably a no-go with a
             | project that's overall as conservative as the Linux kernel
             | project.
        
         | initplus wrote:
         | Maybe that's helpful today, but by remaining (relatively)
         | standards compliant the kernel codebase has a more long-term
         | resiliency. The more tightly wedded to a single compiler
         | implementation the greater the long term risk for the project.
         | 
         | If some shenanigans were to happen upstream in GCC it would not
         | be the end of Linux.
        
           | colejohnson66 wrote:
           | But they're _not_ standards compliant. They rely on many GCC
           | extensions that make compiling on something else not
           | possible.
        
             | tinalumfoil wrote:
             | Other compilers support GCC extensions.
             | 
             | > The Linux kernel has always traditionally been compiled
             | with GNU toolchains such as GCC and binutils. Ongoing work
             | has allowed for Clang and LLVM utilities to be used as
             | viable substitutes. Distributions such as Android,
             | ChromeOS, and OpenMandriva use Clang built kernels. LLVM is
             | a collection of toolchain components implemented in terms
             | of C++ objects. Clang is a front-end to LLVM that supports
             | C and the GNU C extensions required by the kernel, and is
             | pronounced "klang," not "see-lang."
             | 
             | https://www.kernel.org/doc/html/latest/kbuild/llvm.html
        
         | jancsika wrote:
         | Current situation-- do nothing but yell at the compiler devs.
         | Benefit: sometimes they listen. Cost: you cannot always (or
         | maybe even often) get the compiler to behave as you think it
         | should because you don't control it.
         | 
         | Deathtrap situation-- maintain an operating system _and_ a fork
         | of a compiler. Benefit: you can get more control over the
         | compiler. Cost: you _still_ cannot always get the compiler to
         | behave as you think due to time constraints. Death cost: your
         | first cost is multiplied by the fact that you 're now
         | maintaining a goddamned compiler.
         | 
         | I rankly speculate Linux is an extant project at its scale
         | _because_ it has refused to fight on two fronts like this. (And
         | yelling across a border isn 't the same as crossing it.)
         | 
         | Edit: clarifications
        
         | kllrnohj wrote:
         | GCC already has that functionality, it's called C++. Like that
         | macro is just a crappy version of std::for_each.
         | 
         | It doesn't seem useful to make a "C+" instead of switching to a
         | language that just has the feature set the kernel needs. Like
         | Rust, which is gaining some support within the Linux kernel
         | already.
        
           | stjohnswarts wrote:
           | *for drivers primarily
        
             | Gigachad wrote:
             | That's basically the trial period. Nothing depends on
             | drivers so they are pretty safe to try new things with and
             | the hardware they are used for is limited so you can know
             | you the hardware the driver is used for is supported by the
             | Rust compiler. If we go a few years and its all good and
             | the rust compiler supports everything linux targets, I can
             | imagine more core parts becoming rust.
        
           | dale_glass wrote:
           | C++ was discussed as having too many undesirable
           | characteristics.
           | 
           | But that's exactly why I think a customized language for the
           | kernel is an interesting idea. You could get pretty much
           | exactly what you want for the kernel. Add features you need,
           | remove any undesirable behavior or features.
           | 
           | For most programs that would be too much complexity, but the
           | kernel has very particular needs. And I think a similar in
           | spirit approach has worked very well with Qt.
        
             | Koshkin wrote:
             | The desirable characteristics of C++ overweigh, by far, its
             | "undesirable" ones (whatever those may be, and which by the
             | way you can for the most part safely ignore, if you wish).
        
               | chaxor wrote:
               | I don't know anything about C++ really, but I have been
               | interested in learning it several times. The biggest
               | deterrent for me has been the huge number of languages
               | that you have to know to learn c++ it seems. Every new
               | version comes with so many different features and
               | seemingly (by looking for example repos on git) different
               | paradigms that it essentially feels like many different
               | languages.
               | 
               | This is probably one of the biggest reasons I may turn to
               | rust over C++ - simply less features creep due to less
               | time being around.
               | 
               | Can you comment on how wrong I am about my feelings this
               | way?
        
               | Koshkin wrote:
               | Yes, I understand, I would probably feel the same if I
               | wanted to learn C++ all at once and for its own sake.
               | Luckily, I have never had to do such a thing. Indeed, I
               | may never learn C++ in its entirety, and I may even miss
               | some important pieces, but I am OK with that. I am very
               | comfortable using C++, and I still continue picking up
               | pieces of wisdom here and there - as I go. My advice,
               | learn by example, start with a small project and go from
               | there.
        
             | jcelerier wrote:
             | > C++ was discussed as having too many undesirable
             | characteristics.
             | 
             | C++ allowed the Serenity OS people to produce an entire OS
             | with a GUI stack able to play Diablo and their own web
             | browser in something like two years. It's depressing to
             | think where we could be today in terms of OS if the Linux
             | and GNU people weren't as insistent on their hate of C++
        
               | tayo42 wrote:
               | Why is c++ necessary? Shouldn't any programing language
               | be able to do the same things. Maybe with different
               | amounts of code
        
         | stjohnswarts wrote:
         | because currently gcc maintenance is almost "free" (in
         | comparison to maintain gcc which is as big a project as linux)
         | and they just have to work with the powers that be over at gcc.
         | That's much better than forking it and having yet one more HUGE
         | project to maintain independently. The advantages of sticking
         | to the current policy FAR outweigh the disadvantages.
        
         | [deleted]
        
         | na85 wrote:
         | I'm not a compiler expert here but when clang was new and one
         | of the BSD distros was making a lot of hay about moving to it,
         | I remember one of the arguments in favor being that the GCC
         | codebase is a mess and not easily extended.
        
       | pjmlp wrote:
       | The reference to moving into C99 is kind of interesting, given
       | the security work paid by Google to remove all uses of VLAs, I
       | thought they were already using it anyway.
        
         | dahfizz wrote:
         | VLAs exist as a gnu extension even for -std=c89, unless
         | -pedantic is also specified.
        
       | userbinator wrote:
       | I'm one of those people who think OS kernels should stay as
       | portable and simple as possible (i.e. C89 or some other easily-
       | bootstrappable language, to avoid Ken Thompson attacks), so this
       | isn't great news to see "the ladder being pulled up another
       | rung". Then again, Linux has already become immensely complex.
        
         | tomcam wrote:
         | > so this isn't great news to see "the ladder being pulled up
         | another rung"
         | 
         | Man I so vibe with that. But the new Cs do have a ton of new
         | features and, more to the point, I trust Linus to make this
         | kind of decision more than just about anyone.
        
         | thestoicattack wrote:
         | The article does mention at least one advantage, so it's not as
         | if the ladder is being pulled up for no reason.
        
         | pcwalton wrote:
         | The advantages of using a newer language greatly outweigh the
         | disadvantages of theoretical "reflections on trusting trust"
         | attacks. By mandating C89, you're condemning thousands of
         | kernel developers to use a language that's over 30 years old
         | because of a theoretical attack that has never happened and
         | seems practically implausible. Does anyone really think there's
         | a backdoor in both GCC _and_ Clang (remember, Linux can be
         | compiled on either)?
        
           | gmadsen wrote:
           | I am ignorant on nearly all compiler related issues, but
           | there was an article on here not too long ago that was
           | arguing that nearly all os development required old C because
           | choices of the committee would break use cases required under
           | the guise of undefined behavior.
           | 
           | is the benefits of "modern C" worth compile times 2-3x times
           | longer?
        
             | aw1621107 wrote:
             | > but there was an article on here not too long ago that
             | was arguing that nearly all os development required old C
             | because choices of the committee would break use cases
             | required under the guise of undefined behavior.
             | 
             | I would guess that you're referring to either "How ISO C
             | became unusable for operating systems development" ([0]) or
             | "How One Word Broke C" ([1]).
             | 
             | [0]: https://arxiv.org/abs/2201.07845 , most recent HN
             | discussion at https://news.ycombinator.com/item?id=30022022
             | 
             | [1]: https://web.archive.org/web/20210307213745/https://new
             | s.quel... , HN discussion at
             | https://news.ycombinator.com/item?id=22589657
        
             | pcwalton wrote:
             | Times like these I wish I were allowed to say exactly how
             | much money big companies save by using the newest versions
             | of GCC and Clang. The economic value of modern compiler
             | optimizations is staggering.
        
               | tomcam wrote:
               | I hereby give you permission to say exactly how much
               | money big companies save by using the newest versions of
               | GCC and Clang.
        
               | bch wrote:
               | > wish I were allowed to say exactly how much money big
               | companies save[...]
               | 
               | Are you able to give hints that would guide us in thought
               | exercises?
        
               | bluGill wrote:
               | Facebook has hinted that they employ smart C++ people
               | because some core optimizations can save on the order of
               | several hundred thousand dollars per year (possibly in
               | the millions). Most of that is off because they don't
               | have to buy as much power to cool the server rooms, some
               | of it is buying less servers as what they have can do
               | more. Facebook doesn't actually say what they save, but
               | they have said they measure it, and we know how much it
               | costs to employ an engineer full time working only on
               | optimization, and we know who some of those engineers
               | are.
               | 
               | You have to be very large to notice it, but the likes of
               | Facebook, amazon, and Google have massive warehouses
               | almost entirely filled with computers. It doesn't take
               | much to see how their power bill can add up.
        
               | pertymcpert wrote:
               | One avenue is to think about it in terms of datacenter
               | compute. If modern compiler optimizations improve
               | performance by 10%, aggregate the perf improvements
               | across the entire world's compute and you get a huge
               | saving.
        
               | steveklabnik wrote:
               | The parent works at Facebook/Meta leading their Rust
               | team, focusing on compiler and ecosystem improvements. So
               | that in and of itself is a sort of hint into what that
               | info could be like, even if it's not actual details of
               | the order of magnitude or anything.
        
             | bonzini wrote:
             | Why would a different standard cause longer compile times?
        
             | mananaysiempre wrote:
             | Most of the really annoying UB is technically in C89
             | though, it's just that compilers haven't really treated it
             | with such contempt for the first two decades or so. I can't
             | even recall any new kinds of (non-library) UB in C99 or C11
             | (though there have to be some).
             | 
             | So "old C" in such a case would need to mean "an old C
             | _implementation_ " (or possibly a new one, but simple or
             | configured to behave like an old one), something like GCC
             | 2.8 maybe, and nobody's using that on desktop. So the
             | language standard version should be mostly immaterial, and
             | it's not like the C89-to-C11 difference is anything like
             | the yawning C++98-to-C++20 chasm. (This is a carefully
             | phrased statement: C99 had complex numbers, which are
             | annoying, and variable-length arrays, which are a
             | significant change, but C11 demoted both to optional
             | features.)
        
             | neysofu wrote:
             | > is the benefits of "modern C" worth compile times 2-3x
             | times longer?
             | 
             | I mean, you get more security by default and security is
             | pretty darn important for kernels...
        
         | pm215 wrote:
         | Linux has always relied heavily on GCC extensions, though -- it
         | makes no attempt to actually be C89-compliant portable code.
         | What it actually has is a minimum supported gcc version, which
         | in turn governs whether particular features can be used. In
         | this case the minimum gcc version has for other reasons finally
         | got big enough that C99 and C11 support is definitely present
         | -- the ladder was already this high.
         | 
         | (You can also build with clang, but only because clang
         | deliberately aims to support most gcc extensions.)
        
         | Dylan16807 wrote:
         | Declaring a variable in the middle of a function takes
         | basically no effort to support. Especially compared to all the
         | ridiculous things the kernel gets up to.
         | 
         | The ladder's being moved a millimeter.
        
         | syncsynchalt wrote:
         | IMO the C standards are conservative enough that even a
         | standard level "only" a decade old is still fine.
         | 
         | With that said, we still have options even when moving to a
         | newer standard. Many new language features can be machine-
         | translated to C89 if needed (similar to how we have protoize /
         | unprotoize, though not always as seamless). If we need to keep
         | a bridge to C89 the kernel authors could hold to a subset of
         | new language features that are amenable to machine translation.
        
         | jcranmer wrote:
         | Linux is not written in C89. It is written in gnu C89, which is
         | a mixture of C89, C99, and a panoply of often poorly-defined
         | gcc-specific features. The number of compilers that can
         | successfully compile Linux is one; not even clang is able to
         | fully do so yet, I believe.
         | 
         | Actually, as a compiler writer, I'd go a little bit further and
         | point out that Linux itself isn't even written to the gnu C89
         | very well; it's often written to a "C is portable assembly"
         | view of the language, which results in nasty grams and
         | invective being hurled at compiler writers if they compile the
         | C specification correctly and not according to the "proper"
         | assembly the code author thought they were getting.
         | 
         | One of the benefits of more modern language revisions is that
         | they actually tighten the wording on a lot of the more
         | ambiguous parts of the specification--C11 in particular adds a
         | much more comprehensive memory model that's very _shrug_ in the
         | older revisions of C.
        
           | nikanj wrote:
           | My understanding is that Linux has often faced issues where
           | "compiling the specification correctly" means "Ha-ha gotcha,
           | we can actually throw half your code out of the window
           | because you misread the deep aliasing rules on page 8432 of
           | the spec"
           | 
           | Their hesitancy with newer standards is understandable, when
           | viewed against that backdrop
        
             | jcranmer wrote:
             | One of these days, I will get around to writing my post as
             | to why that take on undefined behavior is completely wrong.
             | 
             | More to the point, though, the only changes to undefined
             | behavior in the C specification in newer versions (compared
             | to C89) are either clarifying things that _were already
             | undefined behavior_ (e.g., INT_MIN % -1) or actually
             | _making some undefined behavior well-defined_ (e.g.,
             | allowing type punning via unions).
        
           | electroly wrote:
           | Clang can do it. Google ships clang-built Linux kernels in
           | Android and ChromeOS.
        
         | mwint wrote:
         | What's a Ken Thompson attack?
        
           | [deleted]
        
           | nl wrote:
           | _Ken Thompson 's "cc hack" - Presented in the journal,
           | Communication of the ACM, Vol. 27, No. 8, August 1984, in a
           | paper entitled "Reflections on Trusting Trust", Ken Thompson,
           | co-author of UNIX, recounted a story of how he created a
           | version of the C compiler that, when presented with the
           | source code for the "login" program, would automatically
           | compile in a backdoor to allow him entry to the system._
           | 
           | https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html
           | 
           | The paper's a pretty entertaining read:
           | 
           | > First we compile the modified source with the normal C
           | compiler to produce a bugged binary. We install this binary
           | as the official C. We can now remove the bugs from the source
           | of the compiler and the new binary will reinsert the bugs
           | whenever it is compiled. Of course, the login command will
           | remain bugged with no trace in source anywhere.
        
           | ksec wrote:
           | Reflections on Trusting Trust - Ken Thompson
           | 
           | https://wiki.c2.com/?TheKenThompsonHack
        
           | Jtsummers wrote:
           | https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
           | ..
           | 
           | Ken Thompson, _Reflections on Trusting Trust_.
           | 
           | EDIT: I don't think I've seen 5 nearly simultaneous replies
           | sharing the same link before.
        
             | ksec wrote:
             | >EDIT: I don't think I've seen 5 nearly simultaneous
             | replies sharing the same link before.
             | 
             | LOL I was searching for a non-PDF link and delayed the
             | reply. It would have been 6 simultaneous answer.
        
           | bombcar wrote:
           | https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
           | ..
           | 
           | How do you prove your stack is secure?
        
           | mindcrime wrote:
           | I believe the person you are replying to is using that as an
           | allusion to the issues discussed in the famous "Reflections
           | on Trusting Trust" paper[1] by Ken Thompson.
           | 
           | [1]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984
           | _Ref...
        
           | [deleted]
        
           | febstar wrote:
           | Most likely referring to "Reflections on Trusting Trust" [0];
           | i.e. when the compiler is itself the attack vector.
           | 
           | [0]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984
           | _Ref...
        
           | esarc wrote:
           | https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
           | ..
        
         | pinephoneguy wrote:
         | Yes but Linux won't even build with gcc 4 much less weird
         | compilers like tcc or msvcc. That ship has unfortunately
         | already sailed.
        
           | tiahura wrote:
           | Sounds like GNU/Linux is probably warranted.
        
             | yjftsjthsd-h wrote:
             | Kind of, but you can also use clang. On the other other
             | hand, that's clang with GNU extensions, so /shrug
        
         | neysofu wrote:
         | Ken Thompson's hack is purely theoretical, especially in the
         | modern computing landscape where the diversification of
         | software supply chains would make such hack much 1. less
         | effective and 2. easier to detect. If we're talking about
         | kernel security, there are other lower hanging exploits, many
         | of which are enabled by outdated language design decisions and
         | unsafe memory models.
         | 
         | As compiler technologies evolve, it becomes not only a
         | necessary evil but rather a better course of action to trust
         | compilers rather than humans tiptoeing around security risks
         | masquerading as language idiosyncrasies.
        
           | TheDesolate0 wrote:
        
           | yjftsjthsd-h wrote:
           | > Ken Thompson's hack is purely theoretical
           | 
           | Erm. I was always under the impression that he had actually
           | done it, and that his presentation was a historical anecdote.
           | Is that not the case?
        
             | neysofu wrote:
             | Yes, he did, but that just shows that it's possible to
             | inject self-replicating code in a compiler. It doesn't
             | actually tell us how resistant such code is to compiler
             | source changes, audits, binary analysis, etc. which are all
             | required for a successful exploit.
        
             | steveklabnik wrote:
             | On a technical level, the hack has been demonstrated to
             | work, but actually hacking someone via the technique has
             | not, in my understanding.
        
       | baby wrote:
       | I find it scary that most modern infrastructure eventually relies
       | on C, and C from decades ago at that.
        
         | lazide wrote:
         | You shouldn't - it isn't perfect, but it is well understood. So
         | far we don't have anything else that works better and all the
         | edge cases and weirdness is understood by enough people that
         | someone could build something like Linux on.
         | 
         | It is unwise to build a bridge on something untested with
         | unknown failure modes, and it is equally unwise to do a rewrite
         | or create new core infrastructure in a language without knowing
         | it as well as a civil engineer knows concrete.
        
         | Koshkin wrote:
         | I hear you. And a huge codebase at that. Unfortunately, the
         | choice is limited; D or Ada would probably be better
         | alternatives today.
        
         | freedomben wrote:
         | The amount of fear I see from people around C is so surprising
         | to me. Not saying you fit into this, but so far my anecdata
         | suggests that it's largely people who don't know (or know very
         | little) C. Especially among CS grads whose main exposure to C
         | was in the stack smashing exercise in a security class, all
         | they know about C is the part that was intentionally made
         | vulnerable so it would be easy to exploit. Unless you're being
         | wildly negligent and reckless with your programming, C is
         | really not that scary.
        
           | viraptor wrote:
           | And yet, the most significant part of the C code issues we
           | find is memory corruption. Which is either significantly
           | harder to cause or impossible-by-design in many alternatives.
           | Unless you can realistically say "people working daily, for
           | years, on huge C projects write those bugs, but they're
           | reckless and I'm better than them" - yes, C should be scary
           | to you these days.
        
           | enneff wrote:
           | What's scary is the huge number of bugs in the Linux kernel
           | that wouldn't exist if it were written in a safer language.
           | 
           | https://syzkaller.appspot.com/upstream
        
           | jimbob45 wrote:
           | https://stackoverflow.com/questions/50724726/why-didnt-
           | gcc-o...
           | 
           | We can't even agree on safe string functions for C, half a
           | century later. You shouldn't have security bugs baked into
           | the standard library and you shouldn't have to do a mountain
           | of research to know which functions are safe and in which
           | cases.
           | 
           | However, for most things non-string, non-pointer, and non-
           | array, I agree with you.
        
           | baby wrote:
           | My fear comes from the fact that I was a security consultant,
           | and reviewed a lot of C applications containing nasty bugs.
        
             | yjftsjthsd-h wrote:
             | What percentage of non-C applications did you review? I
             | suspect C has enough footguns to be an issue, but its
             | popularity, especially in low-level software (kernels,
             | codecs, firmware) ensures that it'll show up in security
             | issues regardless of how bad the languages itself is.
        
             | kwertyoowiyop wrote:
             | Definitely. Though Linux code has probably been tested more
             | thoroughly, and run through more static analysis, than any
             | other C code base. That does help me sleep a little better
             | at night.
        
             | freedomben wrote:
             | Interesting, I had the same job in mid to late 00s,
             | although I wasn't a consultant so my sample was the
             | company's codebases (of which there were a lot because we
             | built a lot of embedded systems on top of vxworks that did
             | a lot of network communications, sometimes in very niche
             | protocols), not necessarily the codebases of company's that
             | are worried enough that they hire a consultant. That was
             | right around the time when compilers and security tools
             | were becoming available that could flag nearly every
             | possible problem. At that point false positives became a
             | big challenge.
             | 
             | What years were you a consultant reviewing C applications?
        
           | nikanj wrote:
           | This reminds me of
           | https://www.usenix.org/system/files/1311_05-08_mickens.pdf
        
         | ylk wrote:
         | This isn't the Linux kernel but I'd say it's fair to assume
         | that the same likely applies to it:
         | 
         | > Most of our memory bugs occur in new or recently modified
         | code, with about 50% being less than a year old.
         | 
         | > [...] we've found that old code is not where we most urgently
         | need improvement.
         | 
         | https://security.googleblog.com/2021/04/rust-in-android-plat...
        
           | tombert wrote:
           | That makes some intuitive sense, right? The fact that it got
           | "old" in the first place indicates that it's not being
           | touched a lot, and if it isn't being touched a lot that means
           | that bugs haven't been found, meaning that the bugs that are
           | in there are especially sneaky edge cases, or there simply
           | aren't any large bugs to begin with.
        
         | tombert wrote:
         | I haven't really touched non-GC'd languages in quite awhile,
         | but I feel like modern C isn't _that_ unsafe, at least from the
         | bits I 've played with it; it can even have a garbage collector
         | if you want it [1](which I usually do).
         | 
         | It's worth giving it another try if you haven't in awhile, if
         | for no other reason to understand what's going on behind the
         | scenes of your abstractions in Java/C#/JavaScript/etc.
         | 
         | [1] https://en.wikipedia.org/wiki/Boehm_garbage_collector
        
           | Koshkin wrote:
           | > _isn 't that unsafe_
           | 
           | It is as unsafe as you let it be, consciously or by mistake.
        
           | jahlove wrote:
           | What language is more unsafe than C? C++? ASM?
        
         | CyberRabbi wrote:
         | Wouldn't it be the normal course of events that infrastructure
         | would be based on decades old established technology...?
        
       | TheDesolate0 wrote:
        
       ___________________________________________________________________
       (page generated 2022-02-24 23:00 UTC)