[HN Gopher] Moving the Linux Kernel to Modern C
___________________________________________________________________
Moving the Linux Kernel to Modern C
Author : chmaynard
Score : 237 points
Date : 2022-02-24 20:10 UTC (2 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| charcircuit wrote:
| >Raising the minimum GCC version to 8.x would likely be more of a
| jump than the user community would be willing to accept at this
| point.
|
| If you are using a 0 day old kernel. Why would you be using GCC
| 5.x still. 5 1 is almost 7 years old now. 8.1 is almost 4. Can't
| people just use apt / yum / rpm / whatever to upgrade to the
| latest gcc? Is that really too much to ask of the community?
| lazide wrote:
| For an unstable branch maybe, but if people need to actually
| use this for real work they will want it compiling on something
| as well understood and baked as possible - while not being
| completely obsolete.
| charcircuit wrote:
| People doing real work are using a kernel compiled with a
| more up to date GCC. From what I can tell online even RHEL is
| up to date enough by compiling its kernel with GCC 8.x.
| jahlove wrote:
| RHEL7 is still supported [0] by Red Hat through 2026, and
| is on GCC 4.8.5 [1]
|
| [0] https://access.redhat.com/product-life-
| cycles/?product=Red%2...
|
| [1] https://distrowatch.com/table.php?distribution=redhat
| charcircuit wrote:
| Correct me if I'm wrong but doesn't RHEL stick to a
| single kernel version and then backport patches?
| yjftsjthsd-h wrote:
| Officially yes, although we can argue about how much you
| can "backport" and still count. But even so - if new code
| requires a newer C version and thus newer compiler, it's
| harder to backport.
| mwcremer wrote:
| It's not just a question of what the new version fixes, there
| is also the question of what it breaks. Sometimes new versions
| trigger latent bugs no one knew about that luckily happened to
| work fine on the old tools. Finding and fixing those can be
| difficult and time-consuming.
| charcircuit wrote:
| This argument would have merit if everyone still used 5.x to
| compile Linux, but that's simply not the truth. Most people
| are using a Linux compiled by a somewhat up to date version
| of GCC.
| syncsynchalt wrote:
| You'd be surprised how much downstream grief a bump like this
| can cause in an ecosystem as broad as the Linux kernel. Dealing
| with the results can be unexpectedly overwhelming.
|
| It makes sense to play it safe and do this in smaller
| incremental jumps.
| pm215 wrote:
| I get the impression that a big part of it is just a general
| choice to be very conservative about forcing newer minimum
| versions. So to raise the version bar you have to make the
| positive case for why it's worthwhile -- merely "gcc 5 is
| ancient" doesn't suffice. The only reason it moved up from 4.9
| is actual data-loss-provoking codegen bugs in 4.9 (see
| discussion in this lkml thread:
| https://lore.kernel.org/lkml/CAHk-=wjqGRXUp6KOdx-eHYEotGvY=a...
| )...
|
| I do think they could move up a bit further, but it's useful to
| be able to just build kernels with the distro compiler and
| repology thinks that for instance Debian stretch (still an LTS
| supported version) is only gcc 6.3.
| bombcar wrote:
| The Linux community is _way way way_ larger than just the sum
| of distro user bases.
|
| Even if all distros have long ago moved to GCC 8+, you still
| have other chips and systems running, and who knows what buried
| somewhere depends on GCC 5.
| charcircuit wrote:
| You can compile Linux and a user space program with different
| versions of GCC.
| Koshkin wrote:
| As well as with Tiny C (fast).
| spicybright wrote:
| You'll have to upgrade the compiler eventually though, right?
| I don't know what that process would look like, but generally
| the longer you wait the more pain you'll face later.
| synergy20 wrote:
| hmm I submitted the same article 10 hours ago and not got picked
| up, where is the algorithm.
| chmaynard wrote:
| Welcome to HN. This happens all the time. I added 'Linux' to
| the title, which attracted attention.
| tomcam wrote:
| This has happened to me a number of times! I think time of day
| has a heavier influence than we might think. Submitting is only
| part of the algorithm. Both vote count and velocity of votes
| are really important too.
| progbits wrote:
| From [1]:
|
| > You are not "introducing" a new macro for this, you are
| modifying the existing one such that all users of it now have the
| select_nospec() call in it.
|
| > Is that intentional? This is going to hit a _lot_ of existing
| entries that probably do not need it at all.
|
| > Why not just create list_for_each_entry_nospec()?
|
| Let's ignore whether this patch is needed or works for now - I
| don't feel competent to comment on that. But this is such a bad
| suggestion. Instead of fixing the default to be safe, and
| possibly having a _unsafe_but_fast() variant for the places where
| it makes sense they want to keep the broken version and require
| user to explicitly opt in into safety.
|
| Same as the infamous PHP mysql_real_escape_string (don't make the
| mistake of using mysql_escape_string!) [2][3] or whole host of C
| stdlib footguns like strcpy/strncpy [4].
|
| The default, easy, obvious option should be safe. The unsafe but
| faster option should be hard to use by accident and obviously
| marked as such from the name.
|
| [1] https://lwn.net/ml/linux-kernel/Yg6iCS0XZB6EtMP7@kroah.com/
| [2] https://www.php.net/manual/en/function.mysql-escape-
| string.p... [3] https://www.php.net/manual/en/function.mysql-
| real-escape-str... [4]
| https://en.cppreference.com/w/cpp/header/cstring
| quadcore wrote:
| > Cost.
| masklinn wrote:
| > Same as the infamous PHP mysql_real_escape_string (don't make
| the mistake of using mysql_escape_string!) [2][3]
|
| TBF that is really a straight bridge to the mysql C API, which
| is why it's also in mysqli.
|
| MySQL actually has a third one called
| mysql_real_escape_string_quote, because if the sql mode
| NO_BACKSLASH_ESCAPES is set the escaping function needs to know
| what context it's used in, and thus what quote to double up,
| and mysql_real_escape_string will fail.
| toast0 wrote:
| > Same as the infamous PHP mysql_real_escape_string (don't make
| the mistake of using mysql_escape_string!) [2][3] or whole host
| of C stdlib footguns like strcpy/strncpy
|
| In all of those, the enhanced function takes more paramaters.
| Switching the existing name to require a new parameter will
| break existing code, and people will not want to update. Making
| the new parameter optional is possible, but IMHO is messier
| than a new function that has required parameters and then
| deprecating the old function (and eventually removing it).
| klodolph wrote:
| This is just how you work with large code bases. Updating
| 15,000 call sites is far from trivial. These changes happen in
| multiple phases, and in each phase, you still want to have a
| working kernel.
|
| The unsafe version can be deprecated and eventually removed,
| but that is down the road.
| xiphias2 wrote:
| I'm not sure how it's done with the Linux Kernel (and git
| definately makes it hard), but at Google (using Perfoce
| model) there are tools for mass renaming that try to
| guarantee that there are no regressions. Of course in Linux
| the amount of C macros make it much harder to work on the
| syntax trees, but comparision after the preprocessor step
| should be possible.
| klodolph wrote:
| IIRC Google works the same way. The automated refactoring
| tools (Rosie?) may make it go faster, but you generally
| don't fix up 15,000 call sites in a single change... you
| break it up into smaller batches spread across the tree.
| acidbaseextract wrote:
| It's been a long time since I was there, but I thought
| there were plenty of 15,000 call site refactorings done
| in a single final CL. Not that the Linux kernel should do
| the same!
| skybrian wrote:
| When I was there (several years ago) it was rare to do
| that across many projects. You don't want to have to roll
| back everyone due to a problem that affects one project.
| Also, there's a risk that changes that are happening in
| the meantime might mean the patch doesn't apply.
| eatbitseveryday wrote:
| > if all goes well, the shift to C11 will happen in the next
| kernel release
| stjohnswarts wrote:
| I'll be pinning that for a couple releases unless there's a
| major security bug O_O
| yjftsjthsd-h wrote:
| Just stick to a LTS branch; no reason to give up bugfixes.
| wiineeth wrote:
| Why modern C when you can move to Rust
| yjftsjthsd-h wrote:
| Because we want a kernel that works today, and make
| improvements that can ship this year, not spend a decade
| rewriting working software.
| [deleted]
| the__alchemist wrote:
| I hope this isn't too much of a derail. I'm not proficient in C,
| and am translating a C module to Rust. My biggest complaint is
| there's so much DRY! Functions, variables etc are all prefixed
| with the module names, presumably due to lack of namespaces. The
| `.c` vs `.h` dichotomy is also DRY. The subset language that uses
| `#` is also a bit of a mess.
| quelsolaar wrote:
| In may ways you cant avoid more modern C because more modern
| standards make thing that were ambiguous more clear, and then
| compilers follow that even if you select an older standard.
|
| I think C99 introduces some bad things that should be keep out of
| any code base. Variable length arrays being the biggest one.
| Generics is another one. Being able declare variables anywhere
| doesn't make the code better but it makes styles diverge and
| that's a problem.
|
| If Linux wants to adopt C99 they should define what parts of c99
| should be adopted in their style guide.
| nybble41 wrote:
| > Being able declare variables anywhere doesn't make the code
| better...
|
| It does, though. The ability to declare variables "anywhere"
| and allows a reduction in the portion of the code where a
| variable exists uninitialized, or initialized with a dummy
| value which should never be used, or holding an obsolete value
| which is not intended to be used. It also allows more variables
| to be declared `const` (which requires them to be initialized
| in the declaration, which must occur _after_ the initial value
| is computed) which helps to detect accidental assignment and
| also makes the code easier to review.
|
| It's worth noting that you can more-or-less declare variables
| "anywhere" in C89 too--you just have to wrap the variable's
| scope in a compound statement. From that point of view, C99
| doesn't change where variables can be declared. It just lets
| you remove some syntactic clutter.
| jstimpfle wrote:
| I for one have read a lot of code with C89 declare-first style
| that was subjectively a lot less clear. This applies in
| particular to assign-once variables that depend on other
| calculations.
|
| stuff like this: [0]. I mean, what's the point of declaring
| variables like _cp_ or _err_ at the beginning when they 're
| assigned first only halfway down? What is the point of having
| _path_ around for the whole function, when it is a super
| temporary assign-once variable with a scope of 3 lines? I 'm
| having a really hard time coming up with a justification for
| this, other than "this is the way we've always done it and we
| like it" (you can find actual rants in defense of this style
| from about 10 years ago).
|
| For-loop declarations are another instance of this that make
| the code more fluent IMO. And they help reducing the scope of
| variables to a strictly smaller block. I would find it hard to
| argue that this does not remove some bugs or improve
| readability.
|
| The only thing that declare-first has going for it is that the
| variables that are in use can be seen at once, a little bit
| like in a struct declaration. Looking at how optimizers butcher
| those variables depending on liveness makes this feature seem
| less valuable, though.
|
| [0]
| https://github.com/torvalds/linux/blob/2729cfdcfa1cc49bef5a9...
| LAC-Tech wrote:
| They're discussing the move to C99. Not sure that counts as
| modern C at this point.
| clhodapp wrote:
| Given that C99 is much closer in vintage to their current C89
| than it is to today's date and also just straight up really
| old, it does seem inappropriate to distinguish that as "modern
| C".
|
| That said they do seem to be discussing both C99 and C11 so it
| does seem like going with something that _would_ qualify as
| modern (especially in light of Linux 's need to be
| conservative) is actually on the table.
| dahfizz wrote:
| You also have to look at the contents of the revisions. C99
| was a massive update for C, and C11 was tiny in comparison
| (and C17 was basically nothing at all).
|
| Writing C89, without variable declaration in a for loop, and
| without // comments, etc, feels ancient. C99 on the other
| hand is what defines how C looks today.
|
| This is also why the attitude is: if they jump to c99, they
| may as well jump to c11 because they are basically the same.
| flyingfences wrote:
| They're discussing a move to C11, which counts in my book.
| tombert wrote:
| I haven't touched C in any serious capacity in quite awhile;
| how often does C get new revisions, because isn't C11 still
| more than a decade old?
|
| I understand why the linux kernel folks aren't moving to the
| bleeding edge (Linux is old, you have to do these ports
| incrementally), but I'm curious why the pace of language is
| so slow. I guess it's because C has more or less stabilized
| and thus further revisions are less necessary?
| LukeShu wrote:
| > how often does C get new revisions
|
| 1989, 1995, 1999, 2011, 2017.
|
| The 1995 one was an "amendment", not a full revision (and
| is mostly additions to libc, which Linux doesn't get to
| use; the only language change is the addition of digraphs).
|
| C17 was a "bugfix" revision; compilers will have applied
| these fixes to their C11 implementations, so in effect the
| only difference between C11 and C17 is the value of
| __STDC_VERSION__. So for most intents and purposes C11 is
| still the most recent revision.
| tombert wrote:
| Fair enough! As I said, I suppose C doesn't need to
| change that much; people use C _because_ they know what
| they 're getting, and for that to be the case, it needs
| to be stable.
| dahfizz wrote:
| > but I'm curious why the pace of language is so slow. I
| guess it's because C has more or less stabilized and thus
| further revisions are less necessary?
|
| That is a big part of it. C is an old, stable, complete
| language.
|
| There is also the existence of gnu extensions, which bridge
| the gap between language revisions. For example, anonymous
| unions were added in C11 but they have been a gnu extension
| since forever.
| syncsynchalt wrote:
| > I'm curious why the pace of language is so slow
|
| This is a strength of the language.
|
| If you've ever done CI/CD for a large project in a fluid
| development ecosystem (e.g. nodeJS), you can understand why
| it might be refreshing to develop your operating system in
| a language where standards are measured by the decade.
| LAC-Tech wrote:
| yeah fair enough should've read the rest of the article
| before my snide comment
| kazinator wrote:
| C99 has one nice feature compared to C89 that is little talked
| about: you can initialize local aggregates using expressions that
| can't be calculated at load-time. void fun(int
| x) { struct foo = { x }; // not allowed in
| C89 int bar[3] = { 0, x }; // ditto }
|
| Chances are the kernel does this because, I think, it's also a
| GNU89 extension.
| kwijibob wrote:
| Does C standard naming have a Y2K problem? :)
| teddyh wrote:
| Not until 2089.
| qiskit wrote:
| The C standards aren't published at a regular interval of 10
| years. They are published as needed or as agreed upon. So it
| is possible there may be a C standard published in 2089, but
| highly unlikely.
|
| It's similar to but not quite the same thing as the Y2K
| problem.
| syncsynchalt wrote:
| Given C's ubiquity and (likely) longevity it'll have a Y2k89
| problem: standards released in 2089 and beyond will have to
| change their naming schema and/or avoid being standardized in
| particular years. ;)
| thetic wrote:
| Though C89 does not support declaration of a scoped loop variable
| like C99 does: for(int i = 0; i < 10; i++) {
| // ... }
|
| C89 does support declaration of variables at the top of braced
| blocks whose scope is limited to that block: {
| int i; for(i = 0; i < 10; i++) { //
| ... } }
| spc476 wrote:
| Then you get dinged by static analysis tools like SonarQube
| because you introduced a useless scope or increased the
| "cognitive complexity" to the code and have to explain it to a
| team leader who might not understand why you did that.
| jrockway wrote:
| I guess don't use a C99 linter on a C89 codebase?
| dale_glass wrote:
| One thing I've wondered for some time is why don't the Linux
| developers modify the compiler to suit their needs better.
|
| At that project scale, wouldn't it start making sense to start
| solving problems like "If it were possible to write a list-
| traversal macro that could declare its own iterator [...]" by
| adding the functionality you want to GCC?
| Jtsummers wrote:
| That would make the code (and the compiler) non-C89 compliant.
| At which point, why not move to a different standard rather
| than bork both the kernel and the compiler if it can meet the
| needs better?
| electroly wrote:
| The Linux kernel requires GNU extensions; it's already not
| ANSI C89.
| [deleted]
| Jtsummers wrote:
| Alright, making it _more_ non-compliant, then. Still
| creates problems for everyone and also necessitates a
| larger jump in the GCC version to support (unless you 're
| going to back port the new extensions to every version that
| the kernel currently works with). Currently you can compile
| the Linux kernel with GCC 5.1. If you changed the compiler
| to support this one feature (or any other) you'd either
| need to back port it several versions (potentially back to
| 5) or abandon all of them. Which is probably a no-go with a
| project that's overall as conservative as the Linux kernel
| project.
| initplus wrote:
| Maybe that's helpful today, but by remaining (relatively)
| standards compliant the kernel codebase has a more long-term
| resiliency. The more tightly wedded to a single compiler
| implementation the greater the long term risk for the project.
|
| If some shenanigans were to happen upstream in GCC it would not
| be the end of Linux.
| colejohnson66 wrote:
| But they're _not_ standards compliant. They rely on many GCC
| extensions that make compiling on something else not
| possible.
| tinalumfoil wrote:
| Other compilers support GCC extensions.
|
| > The Linux kernel has always traditionally been compiled
| with GNU toolchains such as GCC and binutils. Ongoing work
| has allowed for Clang and LLVM utilities to be used as
| viable substitutes. Distributions such as Android,
| ChromeOS, and OpenMandriva use Clang built kernels. LLVM is
| a collection of toolchain components implemented in terms
| of C++ objects. Clang is a front-end to LLVM that supports
| C and the GNU C extensions required by the kernel, and is
| pronounced "klang," not "see-lang."
|
| https://www.kernel.org/doc/html/latest/kbuild/llvm.html
| jancsika wrote:
| Current situation-- do nothing but yell at the compiler devs.
| Benefit: sometimes they listen. Cost: you cannot always (or
| maybe even often) get the compiler to behave as you think it
| should because you don't control it.
|
| Deathtrap situation-- maintain an operating system _and_ a fork
| of a compiler. Benefit: you can get more control over the
| compiler. Cost: you _still_ cannot always get the compiler to
| behave as you think due to time constraints. Death cost: your
| first cost is multiplied by the fact that you 're now
| maintaining a goddamned compiler.
|
| I rankly speculate Linux is an extant project at its scale
| _because_ it has refused to fight on two fronts like this. (And
| yelling across a border isn 't the same as crossing it.)
|
| Edit: clarifications
| kllrnohj wrote:
| GCC already has that functionality, it's called C++. Like that
| macro is just a crappy version of std::for_each.
|
| It doesn't seem useful to make a "C+" instead of switching to a
| language that just has the feature set the kernel needs. Like
| Rust, which is gaining some support within the Linux kernel
| already.
| stjohnswarts wrote:
| *for drivers primarily
| Gigachad wrote:
| That's basically the trial period. Nothing depends on
| drivers so they are pretty safe to try new things with and
| the hardware they are used for is limited so you can know
| you the hardware the driver is used for is supported by the
| Rust compiler. If we go a few years and its all good and
| the rust compiler supports everything linux targets, I can
| imagine more core parts becoming rust.
| dale_glass wrote:
| C++ was discussed as having too many undesirable
| characteristics.
|
| But that's exactly why I think a customized language for the
| kernel is an interesting idea. You could get pretty much
| exactly what you want for the kernel. Add features you need,
| remove any undesirable behavior or features.
|
| For most programs that would be too much complexity, but the
| kernel has very particular needs. And I think a similar in
| spirit approach has worked very well with Qt.
| Koshkin wrote:
| The desirable characteristics of C++ overweigh, by far, its
| "undesirable" ones (whatever those may be, and which by the
| way you can for the most part safely ignore, if you wish).
| chaxor wrote:
| I don't know anything about C++ really, but I have been
| interested in learning it several times. The biggest
| deterrent for me has been the huge number of languages
| that you have to know to learn c++ it seems. Every new
| version comes with so many different features and
| seemingly (by looking for example repos on git) different
| paradigms that it essentially feels like many different
| languages.
|
| This is probably one of the biggest reasons I may turn to
| rust over C++ - simply less features creep due to less
| time being around.
|
| Can you comment on how wrong I am about my feelings this
| way?
| Koshkin wrote:
| Yes, I understand, I would probably feel the same if I
| wanted to learn C++ all at once and for its own sake.
| Luckily, I have never had to do such a thing. Indeed, I
| may never learn C++ in its entirety, and I may even miss
| some important pieces, but I am OK with that. I am very
| comfortable using C++, and I still continue picking up
| pieces of wisdom here and there - as I go. My advice,
| learn by example, start with a small project and go from
| there.
| jcelerier wrote:
| > C++ was discussed as having too many undesirable
| characteristics.
|
| C++ allowed the Serenity OS people to produce an entire OS
| with a GUI stack able to play Diablo and their own web
| browser in something like two years. It's depressing to
| think where we could be today in terms of OS if the Linux
| and GNU people weren't as insistent on their hate of C++
| tayo42 wrote:
| Why is c++ necessary? Shouldn't any programing language
| be able to do the same things. Maybe with different
| amounts of code
| stjohnswarts wrote:
| because currently gcc maintenance is almost "free" (in
| comparison to maintain gcc which is as big a project as linux)
| and they just have to work with the powers that be over at gcc.
| That's much better than forking it and having yet one more HUGE
| project to maintain independently. The advantages of sticking
| to the current policy FAR outweigh the disadvantages.
| [deleted]
| na85 wrote:
| I'm not a compiler expert here but when clang was new and one
| of the BSD distros was making a lot of hay about moving to it,
| I remember one of the arguments in favor being that the GCC
| codebase is a mess and not easily extended.
| pjmlp wrote:
| The reference to moving into C99 is kind of interesting, given
| the security work paid by Google to remove all uses of VLAs, I
| thought they were already using it anyway.
| dahfizz wrote:
| VLAs exist as a gnu extension even for -std=c89, unless
| -pedantic is also specified.
| userbinator wrote:
| I'm one of those people who think OS kernels should stay as
| portable and simple as possible (i.e. C89 or some other easily-
| bootstrappable language, to avoid Ken Thompson attacks), so this
| isn't great news to see "the ladder being pulled up another
| rung". Then again, Linux has already become immensely complex.
| tomcam wrote:
| > so this isn't great news to see "the ladder being pulled up
| another rung"
|
| Man I so vibe with that. But the new Cs do have a ton of new
| features and, more to the point, I trust Linus to make this
| kind of decision more than just about anyone.
| thestoicattack wrote:
| The article does mention at least one advantage, so it's not as
| if the ladder is being pulled up for no reason.
| pcwalton wrote:
| The advantages of using a newer language greatly outweigh the
| disadvantages of theoretical "reflections on trusting trust"
| attacks. By mandating C89, you're condemning thousands of
| kernel developers to use a language that's over 30 years old
| because of a theoretical attack that has never happened and
| seems practically implausible. Does anyone really think there's
| a backdoor in both GCC _and_ Clang (remember, Linux can be
| compiled on either)?
| gmadsen wrote:
| I am ignorant on nearly all compiler related issues, but
| there was an article on here not too long ago that was
| arguing that nearly all os development required old C because
| choices of the committee would break use cases required under
| the guise of undefined behavior.
|
| is the benefits of "modern C" worth compile times 2-3x times
| longer?
| aw1621107 wrote:
| > but there was an article on here not too long ago that
| was arguing that nearly all os development required old C
| because choices of the committee would break use cases
| required under the guise of undefined behavior.
|
| I would guess that you're referring to either "How ISO C
| became unusable for operating systems development" ([0]) or
| "How One Word Broke C" ([1]).
|
| [0]: https://arxiv.org/abs/2201.07845 , most recent HN
| discussion at https://news.ycombinator.com/item?id=30022022
|
| [1]: https://web.archive.org/web/20210307213745/https://new
| s.quel... , HN discussion at
| https://news.ycombinator.com/item?id=22589657
| pcwalton wrote:
| Times like these I wish I were allowed to say exactly how
| much money big companies save by using the newest versions
| of GCC and Clang. The economic value of modern compiler
| optimizations is staggering.
| tomcam wrote:
| I hereby give you permission to say exactly how much
| money big companies save by using the newest versions of
| GCC and Clang.
| bch wrote:
| > wish I were allowed to say exactly how much money big
| companies save[...]
|
| Are you able to give hints that would guide us in thought
| exercises?
| bluGill wrote:
| Facebook has hinted that they employ smart C++ people
| because some core optimizations can save on the order of
| several hundred thousand dollars per year (possibly in
| the millions). Most of that is off because they don't
| have to buy as much power to cool the server rooms, some
| of it is buying less servers as what they have can do
| more. Facebook doesn't actually say what they save, but
| they have said they measure it, and we know how much it
| costs to employ an engineer full time working only on
| optimization, and we know who some of those engineers
| are.
|
| You have to be very large to notice it, but the likes of
| Facebook, amazon, and Google have massive warehouses
| almost entirely filled with computers. It doesn't take
| much to see how their power bill can add up.
| pertymcpert wrote:
| One avenue is to think about it in terms of datacenter
| compute. If modern compiler optimizations improve
| performance by 10%, aggregate the perf improvements
| across the entire world's compute and you get a huge
| saving.
| steveklabnik wrote:
| The parent works at Facebook/Meta leading their Rust
| team, focusing on compiler and ecosystem improvements. So
| that in and of itself is a sort of hint into what that
| info could be like, even if it's not actual details of
| the order of magnitude or anything.
| bonzini wrote:
| Why would a different standard cause longer compile times?
| mananaysiempre wrote:
| Most of the really annoying UB is technically in C89
| though, it's just that compilers haven't really treated it
| with such contempt for the first two decades or so. I can't
| even recall any new kinds of (non-library) UB in C99 or C11
| (though there have to be some).
|
| So "old C" in such a case would need to mean "an old C
| _implementation_ " (or possibly a new one, but simple or
| configured to behave like an old one), something like GCC
| 2.8 maybe, and nobody's using that on desktop. So the
| language standard version should be mostly immaterial, and
| it's not like the C89-to-C11 difference is anything like
| the yawning C++98-to-C++20 chasm. (This is a carefully
| phrased statement: C99 had complex numbers, which are
| annoying, and variable-length arrays, which are a
| significant change, but C11 demoted both to optional
| features.)
| neysofu wrote:
| > is the benefits of "modern C" worth compile times 2-3x
| times longer?
|
| I mean, you get more security by default and security is
| pretty darn important for kernels...
| pm215 wrote:
| Linux has always relied heavily on GCC extensions, though -- it
| makes no attempt to actually be C89-compliant portable code.
| What it actually has is a minimum supported gcc version, which
| in turn governs whether particular features can be used. In
| this case the minimum gcc version has for other reasons finally
| got big enough that C99 and C11 support is definitely present
| -- the ladder was already this high.
|
| (You can also build with clang, but only because clang
| deliberately aims to support most gcc extensions.)
| Dylan16807 wrote:
| Declaring a variable in the middle of a function takes
| basically no effort to support. Especially compared to all the
| ridiculous things the kernel gets up to.
|
| The ladder's being moved a millimeter.
| syncsynchalt wrote:
| IMO the C standards are conservative enough that even a
| standard level "only" a decade old is still fine.
|
| With that said, we still have options even when moving to a
| newer standard. Many new language features can be machine-
| translated to C89 if needed (similar to how we have protoize /
| unprotoize, though not always as seamless). If we need to keep
| a bridge to C89 the kernel authors could hold to a subset of
| new language features that are amenable to machine translation.
| jcranmer wrote:
| Linux is not written in C89. It is written in gnu C89, which is
| a mixture of C89, C99, and a panoply of often poorly-defined
| gcc-specific features. The number of compilers that can
| successfully compile Linux is one; not even clang is able to
| fully do so yet, I believe.
|
| Actually, as a compiler writer, I'd go a little bit further and
| point out that Linux itself isn't even written to the gnu C89
| very well; it's often written to a "C is portable assembly"
| view of the language, which results in nasty grams and
| invective being hurled at compiler writers if they compile the
| C specification correctly and not according to the "proper"
| assembly the code author thought they were getting.
|
| One of the benefits of more modern language revisions is that
| they actually tighten the wording on a lot of the more
| ambiguous parts of the specification--C11 in particular adds a
| much more comprehensive memory model that's very _shrug_ in the
| older revisions of C.
| nikanj wrote:
| My understanding is that Linux has often faced issues where
| "compiling the specification correctly" means "Ha-ha gotcha,
| we can actually throw half your code out of the window
| because you misread the deep aliasing rules on page 8432 of
| the spec"
|
| Their hesitancy with newer standards is understandable, when
| viewed against that backdrop
| jcranmer wrote:
| One of these days, I will get around to writing my post as
| to why that take on undefined behavior is completely wrong.
|
| More to the point, though, the only changes to undefined
| behavior in the C specification in newer versions (compared
| to C89) are either clarifying things that _were already
| undefined behavior_ (e.g., INT_MIN % -1) or actually
| _making some undefined behavior well-defined_ (e.g.,
| allowing type punning via unions).
| electroly wrote:
| Clang can do it. Google ships clang-built Linux kernels in
| Android and ChromeOS.
| mwint wrote:
| What's a Ken Thompson attack?
| [deleted]
| nl wrote:
| _Ken Thompson 's "cc hack" - Presented in the journal,
| Communication of the ACM, Vol. 27, No. 8, August 1984, in a
| paper entitled "Reflections on Trusting Trust", Ken Thompson,
| co-author of UNIX, recounted a story of how he created a
| version of the C compiler that, when presented with the
| source code for the "login" program, would automatically
| compile in a backdoor to allow him entry to the system._
|
| https://www.win.tue.nl/~aeb/linux/hh/thompson/trust.html
|
| The paper's a pretty entertaining read:
|
| > First we compile the modified source with the normal C
| compiler to produce a bugged binary. We install this binary
| as the official C. We can now remove the bugs from the source
| of the compiler and the new binary will reinsert the bugs
| whenever it is compiled. Of course, the login command will
| remain bugged with no trace in source anywhere.
| ksec wrote:
| Reflections on Trusting Trust - Ken Thompson
|
| https://wiki.c2.com/?TheKenThompsonHack
| Jtsummers wrote:
| https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
| ..
|
| Ken Thompson, _Reflections on Trusting Trust_.
|
| EDIT: I don't think I've seen 5 nearly simultaneous replies
| sharing the same link before.
| ksec wrote:
| >EDIT: I don't think I've seen 5 nearly simultaneous
| replies sharing the same link before.
|
| LOL I was searching for a non-PDF link and delayed the
| reply. It would have been 6 simultaneous answer.
| bombcar wrote:
| https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
| ..
|
| How do you prove your stack is secure?
| mindcrime wrote:
| I believe the person you are replying to is using that as an
| allusion to the issues discussed in the famous "Reflections
| on Trusting Trust" paper[1] by Ken Thompson.
|
| [1]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984
| _Ref...
| [deleted]
| febstar wrote:
| Most likely referring to "Reflections on Trusting Trust" [0];
| i.e. when the compiler is itself the attack vector.
|
| [0]: https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984
| _Ref...
| esarc wrote:
| https://www.cs.cmu.edu/~rdriley/487/papers/Thompson_1984_Ref.
| ..
| pinephoneguy wrote:
| Yes but Linux won't even build with gcc 4 much less weird
| compilers like tcc or msvcc. That ship has unfortunately
| already sailed.
| tiahura wrote:
| Sounds like GNU/Linux is probably warranted.
| yjftsjthsd-h wrote:
| Kind of, but you can also use clang. On the other other
| hand, that's clang with GNU extensions, so /shrug
| neysofu wrote:
| Ken Thompson's hack is purely theoretical, especially in the
| modern computing landscape where the diversification of
| software supply chains would make such hack much 1. less
| effective and 2. easier to detect. If we're talking about
| kernel security, there are other lower hanging exploits, many
| of which are enabled by outdated language design decisions and
| unsafe memory models.
|
| As compiler technologies evolve, it becomes not only a
| necessary evil but rather a better course of action to trust
| compilers rather than humans tiptoeing around security risks
| masquerading as language idiosyncrasies.
| TheDesolate0 wrote:
| yjftsjthsd-h wrote:
| > Ken Thompson's hack is purely theoretical
|
| Erm. I was always under the impression that he had actually
| done it, and that his presentation was a historical anecdote.
| Is that not the case?
| neysofu wrote:
| Yes, he did, but that just shows that it's possible to
| inject self-replicating code in a compiler. It doesn't
| actually tell us how resistant such code is to compiler
| source changes, audits, binary analysis, etc. which are all
| required for a successful exploit.
| steveklabnik wrote:
| On a technical level, the hack has been demonstrated to
| work, but actually hacking someone via the technique has
| not, in my understanding.
| baby wrote:
| I find it scary that most modern infrastructure eventually relies
| on C, and C from decades ago at that.
| lazide wrote:
| You shouldn't - it isn't perfect, but it is well understood. So
| far we don't have anything else that works better and all the
| edge cases and weirdness is understood by enough people that
| someone could build something like Linux on.
|
| It is unwise to build a bridge on something untested with
| unknown failure modes, and it is equally unwise to do a rewrite
| or create new core infrastructure in a language without knowing
| it as well as a civil engineer knows concrete.
| Koshkin wrote:
| I hear you. And a huge codebase at that. Unfortunately, the
| choice is limited; D or Ada would probably be better
| alternatives today.
| freedomben wrote:
| The amount of fear I see from people around C is so surprising
| to me. Not saying you fit into this, but so far my anecdata
| suggests that it's largely people who don't know (or know very
| little) C. Especially among CS grads whose main exposure to C
| was in the stack smashing exercise in a security class, all
| they know about C is the part that was intentionally made
| vulnerable so it would be easy to exploit. Unless you're being
| wildly negligent and reckless with your programming, C is
| really not that scary.
| viraptor wrote:
| And yet, the most significant part of the C code issues we
| find is memory corruption. Which is either significantly
| harder to cause or impossible-by-design in many alternatives.
| Unless you can realistically say "people working daily, for
| years, on huge C projects write those bugs, but they're
| reckless and I'm better than them" - yes, C should be scary
| to you these days.
| enneff wrote:
| What's scary is the huge number of bugs in the Linux kernel
| that wouldn't exist if it were written in a safer language.
|
| https://syzkaller.appspot.com/upstream
| jimbob45 wrote:
| https://stackoverflow.com/questions/50724726/why-didnt-
| gcc-o...
|
| We can't even agree on safe string functions for C, half a
| century later. You shouldn't have security bugs baked into
| the standard library and you shouldn't have to do a mountain
| of research to know which functions are safe and in which
| cases.
|
| However, for most things non-string, non-pointer, and non-
| array, I agree with you.
| baby wrote:
| My fear comes from the fact that I was a security consultant,
| and reviewed a lot of C applications containing nasty bugs.
| yjftsjthsd-h wrote:
| What percentage of non-C applications did you review? I
| suspect C has enough footguns to be an issue, but its
| popularity, especially in low-level software (kernels,
| codecs, firmware) ensures that it'll show up in security
| issues regardless of how bad the languages itself is.
| kwertyoowiyop wrote:
| Definitely. Though Linux code has probably been tested more
| thoroughly, and run through more static analysis, than any
| other C code base. That does help me sleep a little better
| at night.
| freedomben wrote:
| Interesting, I had the same job in mid to late 00s,
| although I wasn't a consultant so my sample was the
| company's codebases (of which there were a lot because we
| built a lot of embedded systems on top of vxworks that did
| a lot of network communications, sometimes in very niche
| protocols), not necessarily the codebases of company's that
| are worried enough that they hire a consultant. That was
| right around the time when compilers and security tools
| were becoming available that could flag nearly every
| possible problem. At that point false positives became a
| big challenge.
|
| What years were you a consultant reviewing C applications?
| nikanj wrote:
| This reminds me of
| https://www.usenix.org/system/files/1311_05-08_mickens.pdf
| ylk wrote:
| This isn't the Linux kernel but I'd say it's fair to assume
| that the same likely applies to it:
|
| > Most of our memory bugs occur in new or recently modified
| code, with about 50% being less than a year old.
|
| > [...] we've found that old code is not where we most urgently
| need improvement.
|
| https://security.googleblog.com/2021/04/rust-in-android-plat...
| tombert wrote:
| That makes some intuitive sense, right? The fact that it got
| "old" in the first place indicates that it's not being
| touched a lot, and if it isn't being touched a lot that means
| that bugs haven't been found, meaning that the bugs that are
| in there are especially sneaky edge cases, or there simply
| aren't any large bugs to begin with.
| tombert wrote:
| I haven't really touched non-GC'd languages in quite awhile,
| but I feel like modern C isn't _that_ unsafe, at least from the
| bits I 've played with it; it can even have a garbage collector
| if you want it [1](which I usually do).
|
| It's worth giving it another try if you haven't in awhile, if
| for no other reason to understand what's going on behind the
| scenes of your abstractions in Java/C#/JavaScript/etc.
|
| [1] https://en.wikipedia.org/wiki/Boehm_garbage_collector
| Koshkin wrote:
| > _isn 't that unsafe_
|
| It is as unsafe as you let it be, consciously or by mistake.
| jahlove wrote:
| What language is more unsafe than C? C++? ASM?
| CyberRabbi wrote:
| Wouldn't it be the normal course of events that infrastructure
| would be based on decades old established technology...?
| TheDesolate0 wrote:
___________________________________________________________________
(page generated 2022-02-24 23:00 UTC)