[HN Gopher] The CompCert C Compiler
___________________________________________________________________
The CompCert C Compiler
Author : nequo
Score : 69 points
Date : 2024-05-26 18:15 UTC (4 hours ago)
(HTM) web link (compcert.org)
(TXT) w3m dump (compcert.org)
| lisper wrote:
| A cautionary tale for anyone relying on formal proofs of
| correctness:
|
| https://spinroot.com/spin/Doc/rax.pdf
|
| TL;DR: in the 1990s NASA flew an autonomous control system that
| was formally verified, tested out the wazoo, but which
| nonetheless failed in flight.
|
| [UPDATE] I did not mean to imply that the formal verification
| failed. It did not. The formal verification worked as advertised.
| The cause of the failure was much more subtle and nuanced.
| elteto wrote:
| You make it seem like the formal verification missed an issue.
| Not quite what happened:
|
| "The first effort consisted of analyzing part of the RA
| autonomous space craft software using the SPIN model checker.
| One of the errors found with SPIN, a missing critical section
| around a conditional wait statement, was in fact reintroduced
| in a different subsystem that was not verified in this first
| pre-flight effort. This error caused a real deadlock in the RA
| during flight in space."
|
| The model actually found an issue that was later reintroduced
| in a different part of the system that was not formally
| verified. If anything, it tells us we need _more_ formal
| verification!
| lisper wrote:
| > You make it seem like the formal verification missed an
| issue.
|
| Sorry, that was not my intent. I added a clarification.
| cwzwarich wrote:
| I'm confused. Wasn't the error actually in an unverified
| subsystem and isomorphic to an error caught by the model
| checker in a verified subsystem? Isn't this more of a
| cautionary tale for someone _not_ relying on formal
| verification?
| lisper wrote:
| > Wasn't the error actually in an unverified subsystem and
| isomorphic to an error caught by the model checker in a
| verified subsystem?
|
| No, it was quite a bit more subtle than that. The problem was
| that there was no mechanism to enforce the use of the
| formally verified API, and an application programmer put in a
| direct call to a system function that bypassed that API.
|
| Source: I was the technical lead on the RAX executive.
| extrabajs wrote:
| It sounds more like a cautionary tale against bypassing
| APIs. What part of this is related to formal verification?
| Quekid5 wrote:
| This is why friends don't let friends use unsafePerformIO
| ... or whatever the equivalent was here :)
|
| I'm still a bit confused about the point though. I feel
| like an adequate rejoinder would be to enforce formal
| methods at all the levels? I'm obviously not talking
| specifics (because I don't know them! ... and you do), but
| this seems like a failure of process or lack of enforcement
| of formal methods "all the way down" as it were. I dunno,
| color me confused...
| wucke13 wrote:
| Which almost reads as a cautionary tale about mechanisms
| like Dust's `unsafe`. Not necessarily the specifics of
| the Rust, but the overall idea of having a safe (by
| whatever means) sunset of operations and and additional
| unsafe operations, which eases code analysis
| tremendously. You can't got without unsafe in most
| embedded systems. But it's good to very explicitly mark
| in the code wherever the unknown depths of UB lurk if not
| the most attention is exercised.
| tombert wrote:
| Honestly, I don't think you're wrong, and this is from a guy
| who is getting a PhD in formal methods.
|
| Formal methods are super cool, and formal verification is even
| cool, but it can be really easy to think it's always perfect.
| It gives you such amazing guardrails most of the time that it
| can be easy to stop looking for when there aren't any.
|
| For example, model checking is cool and useful, but sort of by
| definition it is exhaustive and as a result you end up having
| to restrict scope in a lot of cases (For example, redefining
| the INTEGERS set to {-1000...1000} in TLA+). This is _almost
| always_ perfectly fine and can catch a lot of weird cases, but
| it 's easy to forget that "nearly everything" is not the same
| thing as "everything".
|
| Obviously I still think formal methods are worth it, but
| they're not a substitute for being careful.
| lisper wrote:
| I have never understood why the C community does not rally around
| what seems to me to be the obvious answer to the main problem
| with C: stop conflating pointers and arrays, or at least
| deprecate it. Make arrays a separate data type, not a synonym for
| a pointer, and require the system to track the size of arrays
| either at compile time or at run time (or both). Make a[x] be a
| bounds-checked array reference rather than a synonym for an
| unchecked dereference of a+x. It seems stupidly obvious to me
| that this is the Right Thing. Yes, it would be a non-backwards-
| compatible change, but the benefits seem to me to vastly outweigh
| the costs.
| elteto wrote:
| Because you can't do that and still call the resulting language
| C. It would be a different language incompatible will _all_
| existing C code.
|
| Pointer decay is a fundamental mechanic in C. If you use arrays
| in C, and you pass said arrays as an argument to a function,
| then you are dealing with pointer decay.
| lisper wrote:
| > Because you can't do that and still call the resulting
| language C.
|
| Says who? Non-backwards-compatible changes are made to
| language standards all the time. It's not pain-free, but
| neither is the status quo.
|
| Besides, who cares what the language is called? Change the
| name if that's what it takes, but stop conflating pointers
| and arrays. The cost of that has been literally billions of
| dollars in losses due to buffer overflows over the decades.
| usrnm wrote:
| > The cost of that has been literally billions of dollars
| in losses due to buffer overflows over the decades.
|
| How much, do you think, would rewriting all existing C code
| cost?
| junon wrote:
| Nothing, if new standards are opted in via a #pragma.
| lisper wrote:
| Why do you think that would be necessary?
| candiodari wrote:
| The whole reason C code is used is that it can be used
| for free. In other words, infinitely more than they're
| spending now. More than even the CCP is willing to spend
| to protect state secrets.
| colonwqbang wrote:
| There are many good "better C" alternatives already. Rust,
| zig, D. They address other common types of bug like
| resource leaks, overflows, use after free. If you're ok
| with rewriting code, you have terrific options.
|
| And of course there is C++, the most famous attempt to fix
| C in a somewhat compatible way by adding more features. It
| is debatable whether this effort resulted in a better
| language. C++ has all the bits needed to check array bounds
| by default but chooses not to do so...
|
| The problem is that the huge existing stock of C code is
| written in C and not rust, zig, D, etc. The same would be
| true for your proposed "better C" language and any other
| incompatible iteration of C.
|
| If you can come up with a way to add these guarantees to C
| without needing significant rewriting, I can assure you
| that most C programmers would be very interested.
| lisper wrote:
| > There are many good "better C" alternatives already.
| Rust, zig, D.
|
| All of these are very different from C. What I'm
| proposing is just one small change to the existing C
| language.
|
| > If you can come up with a way to add these guarantees
| to C without needing significant rewriting, I can assure
| you that most C programmers would be very interested.
|
| I can pretty much guarantee that they would not because
| this is easy: phase in the changes. Start by turning
| array-pointer conflation into a mandatory warning rather
| than undefined behavior or whatever it is now. Then wait
| a few years. Then turn it into an error that you can
| muffle using -C2024 or whatever.
|
| I actually don't know whether array-pointer conflation is
| required by the standard or if it's undefined behavior
| (I'm pretty sure it's one or the other). But if it's the
| latter then you don't actually have to change the
| standard to make this happen, all you need to do is write
| a compiler that does the Right Thing. AFAIK no such
| compiler exists. But there is just no excuse for this:
| % gcc -v Apple clang version 14.0.0
| (clang-1400.0.29.202) ... % more test.c
| int main () { int x[10]; return
| *(x+20); } % gcc -Wall test.c %
| rfoo wrote:
| There are more than one billion lines of C code which
| hasn't been updated in the last decade and is still in
| use.
|
| Who is going to update these code once the "do the right
| thing"-compiler become available?
|
| Oh, and the worst part: some of them may already be bug-
| free due to 15 years (if not more) of people trying to
| make money by selling exploits to surveillance vendors or
| who knows. But there's certainly high-impact bugs left.
| Now what, refactor the code to use the fancy eliminate-
| spatial-memory-corruption C variant and introduce a few
| UAF by the way?
| lisper wrote:
| The authors and maintainers of that code. Any error or
| warning flagged by this change is something that really
| ought to be changed anyway because any unchecked array or
| pointer dereference is a potential security risk.
| rfoo wrote:
| Have you read the source code of Xpdf (the thing being
| exploited in the famous NSO Apple iMessage 0-click blah
| blah)?
|
| I did (because I wrote an exploit for the bug after the
| Google blogpost, for curiosity), the code looks
| disgusting. The author (one poor guy) does not have the
| code in an online VCS and instead dump a source tarball
| every few months (or years). The upstream vulnerable code
| was fixed months after news outbreak.
|
| My conclusion is if Apple had a choice it won't end up in
| iOS at all. Clearly, Apple already paid a lot of
| maintaining cost in this case (fixing bugs before
| upstream did), but what you asked for is a whole new
| level.
| LukeShu wrote:
| There is a well-maintained in-VCS less-disgusting version
| of Xpdf. It's called Poppler, and Apple chooses to not
| use it.
| VogonPoetry wrote:
| I think you conflating bugs. Apple doesn't use Xpdf as
| the basis of the PDF Framework. The NSO bug exploited a
| bug in the JBIG2 file format. The same implementation of
| this code was included in both Xpdf and the Apple PDF
| code. That is why Xpdf needed to also fix the same code.
| retrac wrote:
| In assembly, arrays and structs reduce to base address,
| plus offset times scaling factor. C provides a thin
| veneer over that. Scaling derived from size of type.
| Offset is the array index. The basic programming model of
| C is to view memory as an array.
| lisper wrote:
| Yes, I understand that. But the topic at hand is a
| compiler whose "intended use is the compilation of life-
| critical and mission-critical software written in C". The
| idea of using C in life-critical and mission-critical
| software is risible as long as the language definition
| requires it to have gaping security holes, especially
| when the single biggest contributor to this problem is
| fairly easy to fix, at least from a technological point
| of view if not a political and sociological one.
| nequo wrote:
| My understanding is that the primary purpose of CompCert
| is to make formally verified code that is extracted into
| C also get compiled by a compiler that is formally
| verified to preserve the intended semantics.
|
| So CompCert seems to me to aim to help mission-critical
| software to _move away_ from C, and possibly into Coq
| /Isabelle/etc., except for the purposes of compilation to
| machine code.
| lisper wrote:
| That is a noble goal, but I don't see how it can possibly
| achieve the intended result as long as the C standard is
| a fundamentally borken as it is.
|
| I tried to download CompCert so I could try it out, but
| they only have a source distribution and to build it you
| need Coq and OCaml and a few other things because _of
| course_ CompCert is not written in C. No one in their
| right mind would write mission-critical software in C.
| pyrolistical wrote:
| You should take a closer look at zig. While superficially
| the syntax is very different, what zig really is, c but
| more specific
| lisper wrote:
| No, I get that. I was referring to the syntax. Syntax
| matters.
|
| Zig is hands-down a _better_ language than C, and (I 'll
| take your word that) it fills the same niche as C, but it
| is still a _different_ language with its own idioms and
| lore and conventions. It is not C-with-tweaks. It cannot
| be compiled by an extant C compiler. Code written under
| my proposal would be legal C code under the current
| standard (but not the other way around).
|
| [EDIT] Actually, that turns out not to be true. You'd
| need to change the behavior of SIZEOF or provide some
| other way of getting the size of dynamically allocated
| arrays at run time, since this information would now be
| maintained by the compiler.
| admax88qqq wrote:
| > huge existing stock of C code
|
| I feel like the amount of effort that has been spent so
| far to make C safe and fix bugs due to C not being safe
| is greater than the effort that would have been required
| to rewrite all existing C code into memory safe
| languages.
|
| But I think secretly C programmers don't want memory
| safety. Dealing with pointers and remembering to malloc
| and free are part of what makes them feel more
| skilled/elite than those other programmers who have
| garbage collection and bounds checking.
| rfoo wrote:
| The problem here is people seldomly get paid rewriting
| existing C code into memory safe lamguages. While once in
| a while someone would be annoyed enough and pay for a
| fixing-C effort for a little.
|
| Do you have suggestions on how to fix the incentive?
| admax88qqq wrote:
| I don't think that's the root problem. I think C
| programmers don't believe C is a problem. New software is
| started every day on C. There's no excuse for that and no
| financial incentive to do it.
|
| If the engineers actually admitted that C is not a safe
| languages for shipping software, then we could at the
| very least freeze the existing code and write everything
| _new_ in a same language. But we don't. Engineers still
| go starting brand new greenfield projects in C which is
| just insane.
| mhh__ wrote:
| I don't know how much C code actually uses it tbh.
|
| Most things just take a pointer.
|
| Wouldn't be easy by any means but it could be done at the
| scale of the Linux kernel if anyone cared enough.
| Xeamek wrote:
| _C programmers_ NO! You can 't do that, that would break
| compatibility!
|
| _also C programmers_ Wow, You are trying to compile this old
| program on new toolchain?
|
| Here is a list of 10k errors due to changed defaults flags for
| compilation, dependencies breaking by going from 9.4.1 to 9.4.2
| and also, your code contains platform-specific extensions
| anyway.
|
| But C is portable guys! It really is!
| jeroenhd wrote:
| But that's the thing, code bases are full of non-standard C,
| but they never break because the C standard changes, only
| because their weird hacks are falling out of favour. Fixing a
| design flaw in C will break all existing code, instead of the
| code of the dozens of companies relying on weird hacks they
| wrote ten years ago.
|
| Once Zig becomes stable, I think it may have a chance of
| slowly fixing existing C code bases, by its virtue of being
| able to co-exist with existing code bases. On the other hand,
| a lot of C programmers don't want to change because they
| don't see a problem, so the presence of those will probably
| ruin any chance Zig has of improving the situation.
| samatman wrote:
| Other than the pessimistic finish, my prediction is that
| this is exactly what will happen.
|
| The change will be a generational one. C will never
| entirely disappear, but over time, C code will be old code.
| People will occasionally write new C for what amounts to
| aesthetic reasons, and there will be a robust niche
| (doubtless well-paid) of maintainers of legacy C projects.
| All the kernels currently written in C will still exist,
| and still be composed mostly or entirely of C.
|
| But new work in embedded, drivers, implementation of
| programming languages, netork stacks, will be in Zig
| instead. With some Rust as well, but I figure we've seen
| enough of that arc to tentatively conclude that what Rust
| is displacing is mostly C++, not C.
|
| But when I say generational, I mean it. You're quite right
| that there is a legion of C programmers who like it and
| intend to stick with it. But they'll retire one day.
| pdw wrote:
| That makes sense when you realize that what C folks care
| about more than anything else is _ABI compatibility_. Changes
| to language or toolchain are acceptable as long as they don't
| change the ABI.
|
| E.g. consider the hemming and hawing about a 64-bit time_t.
| That's a tiny change in comparison, and one that's obviously
| unavoidable and with a strict deadline. And yet...
| kukkamario wrote:
| 64-bit time has already been resolved in newer 32-bit Linux
| versions. Issue isn't that changing ABI couldn't be done.
| It is that no one wants to update OS and custom software of
| a 15-year-old embedded system that still works. Archeology
| to find correct instructions to build a working OS image is
| challenge itself and then there is a need to adapt them to
| more modern tools. Been there, done that, and it wasn't
| fun...
| zzo38computer wrote:
| > E.g. consider the hemming and hawing about a 64-bit
| time_t.
|
| I had received a compiler warning about this when trying to
| compile a program on Raspberry Pi (it was a old version; I
| did this a few days before they implemented 64-bit time_t
| on Raspberry Pi). Fortunately I was able to add a macro to
| allow it to work on computers without 64-bit time_t. Other
| than that, the program I compiled worked perfectly,
| although I wrote the program on PC and did not specifically
| try to make it work with Raspberry Pi. So, a C code is
| portable, although sometimes a few changes are required to
| make the program to work.
|
| But I think that 64-bit time_t is a good idea.
| LegionMammal978 wrote:
| In fact, per the standard, a+x is already an array reference:
| you aren't allowed to go outside the bounds of the array
| immediately containing the pointee, on pain of UB.
|
| (Arrays do exist in the object model, and you can take pointers
| to them of type T (*)[N]; you just can't copy them around by
| value, and the name of an array decays to its first element
| pointer.)
|
| Compilers just typically don't track array bounds at runtime
| because of (a) performance and (b) big ABI incompatibilities.
| There's nothing in the language itself that stops them.
| lisper wrote:
| > on pain of UB
|
| But this is exactly the problem. UB can be anything,
| including nothing. This should be _at the very least_ an
| optional warning. But here is what happens with a fairly
| current C compiler: % gcc -v Apple
| clang version 14.0.0 (clang-1400.0.29.202) ...
| % more test.c int main () { int x[10];
| return *(x+20); } % gcc -Wall test.c
| %
| fweimer wrote:
| These kinds of warnings need optimizations enabled.
| $ gcc -O2 -Wall -c t.c t.c: In function 'main':
| t.c:3:14: warning: array subscript 20 is outside array
| bounds of 'int[10]' [-Warray-bounds] 3 |
| return *(x+20); | ^~~~~~~
| t.c:2:11: note: at offset 80 into object 'x' of size 40
| 2 | int x[10]; | ^
|
| Not sure how to get Clang to warn. It clearly recognizes
| the undefined behavior.
| dzaima wrote:
| Clang appears to have -Warray-bounds-pointer-arithmetic
| for this, though not enabled on -Wall nor -Wextra. (fwiw,
| clang has -Weverything to turn on literally every
| warning, including conflicting ones, for finding what
| flags there are)
| lisper wrote:
| Yeah, Clang warns on x[20] but not *(x+20) even with
| -Wall and -O2. It's kinda weird.
| dzaima wrote:
| For the people who actually care about using C, there can be no
| unified concept of "array". Any single "the system" will just
| plain and simply be unusable in a good majority of situations.
|
| Some people will want to store the size in a type smaller than
| size_t (and potentially place it not adjacent to the data
| pointer for better packing in a struct; or perhaps even bit-
| pack it). Some will want to place the size relative to the data
| instead of the pointer (esp. flexible array members). Then
| you'll have questions on pointers in the middle of an array,
| indexable by positive and negative indices. Never mind the
| pretty significantly increased register shuffling of having to
| pass the size across functions.
|
| For projects that are fine with doing things in the single way
| forced upon you and don't care about how things are done as
| long as it's safe, C is already rather clearly not the language
| to choose.
|
| C++ as-is can already pretty trivially allow adding bounds-
| checked array types, and compilers can even be configured to
| warn on all pointer indexing (https://godbolt.org/z/W8sqGW5sh),
| achieving the entirety of your proposal while not locking
| people into a single data structure. (granted, some may not
| want to expand to C++ "just" for one or two features (me
| included), but here allowing customizable array data structures
| is basically the only sane option, and C would have to take a
| rather significant amount from C++ to allow for such itself)
| lisper wrote:
| > Some people will want to store the size in a type smaller
| than size_t
|
| So? My point here is that this should be the compiler's
| responsibility, not the programmer's. Why could not a
| compiler bitpack the same way -- or even better than -- a
| programmer could?
| dzaima wrote:
| The compiler cannot change the length field's size if a
| reference to the array (or a struct where it's contained)
| is ever passed to an unknown function, as that function has
| to be able to read the length from memory based just on the
| spelled-out type.
|
| Not a problem when passing the array by value (i.e. two
| registers of the data pointer and length), but then any
| packing automatically does not apply anyway.
| lisper wrote:
| I don't understand this. Why does it matter if you're
| calling an unknown function? Why would an unknown
| function be unable to get the length? All you would need
| to do is to change the behavior of SIZEOF to make it
| aware of dynamically allocated arrays.
| dzaima wrote:
| I mean in the case of an array type that tracks its
| length at runtime. Take: typedef struct
| { uint32_t arr1_len, arr2_len; int*
| arr1; int* arr2; } Foo;
|
| That's, on a 64-bit system, a 24-byte structure. Were it
| written as a struct of two array fields, the compiler
| couldn't choose a layout as efficient while maintaining
| being able to get a pointer to each field. Never mind
| that the compiler would likely not be omniscient enough
| to be able to tell that the structure is never used with
| arrays exceeding 2^32 elements.
|
| Perhaps you mean to keep using regular pointers for non-
| constant-size heap-stored things, but at that point your
| breaking change would have a very minor to non-existent
| impact.
| actionfromafar wrote:
| Or add another sizeof keyword.
| zzo38computer wrote:
| Arrays are a separate data type than pointers, although in many
| contexts you can use an array where a pointer is expected and
| it will work, and this feature is useful.
|
| Bounds-checking can sometimes be useful, and can perhaps have a
| switch to control its working.
|
| Some instruction sets (such as Flex) have tagged memory. In
| Flex, a pointer contains the address of a block (and a pointer
| can also be designated is read-only, disallowing writing
| through the pointer). There is also a "reference" consisting of
| a pointer and a displacement; I suppose this "reference" can be
| used to implement C pointers. If you use this, then the
| computer will automatically do bounds-checking and will result
| in an error if you make an access that is out of bounds.
|
| Tagged memory also allows to easily find the pointers in memory
| without needing to know the types of variables, which can be
| helpful for some uses, e.g. to detect improper use of
| realloc/free. Furthermore, null pointer can be zero (without
| the tag bits), which is automatically an error when used as a
| pointer because it is not a pointer.
| marmakoide wrote:
| That would break most C code handling hardware directly, like
| on MCUs
| lisper wrote:
| Why?
| lieks wrote:
| C99 did that. void f(int len, char s[len]);
| void g(float m[static 16]);
|
| The first one is for variable-length arrays, the second one for
| fixed-length. TCC even implements array bounds-checking.
|
| But because a lot of people were stuck using C89 for several
| decades (due to old compilers), and the syntax isn't that
| great, nobody even knows they exist.
|
| Personally, I think Dennis Ritchie's[a] proposed syntax[b] was
| much better: void f(char s[?]);
|
| [a] The creator of C.
|
| [b] https://www.bell-labs.com/usr/dmr/www/vararray.html
| pornel wrote:
| This article is 15 years old now, and nothing has changed:
|
| https://digitalmars.com/articles/C-biggest-mistake.html
|
| (and of course the problem wasn't new 15 years ago either.)
|
| It wasn't fixed then. It won't be fixed now.
|
| C is valued for not changing. C is valued for backwards
| compatibility with the most obscure platforms with unmaintained
| compilers.
|
| The C userbase is self-selected to like C exactly the way it
| is.
| dang wrote:
| Related:
|
| _CompCert - Formally-verified C compiler_ -
| https://news.ycombinator.com/item?id=27648735 - June 2021 (123
| comments)
|
| _CompCert C a formally verified optimizing compiler for a large
| subset of C99_ - https://news.ycombinator.com/item?id=27644356 -
| June 2021 (1 comment)
|
| _CompCert - A formally verified C compiler_ -
| https://news.ycombinator.com/item?id=18968125 - Jan 2019 (57
| comments)
|
| _Closing the Gap - The Formally Verified Optimizing Compiler
| CompCert [pdf]_ - https://news.ycombinator.com/item?id=13046449 -
| Nov 2016 (10 comments)
|
| _CompCert: A formally verified optimizing C compiler_ -
| https://news.ycombinator.com/item?id=9130934 - March 2015 (62
| comments)
|
| _CompCert - Compilers you can formally trust_ -
| https://news.ycombinator.com/item?id=2619650 - June 2011 (28
| comments)
| tombert wrote:
| When I was writing my proposal to get into a PhD program, I had
| to do a crash course in formally verified applications. The focus
| of the program is actually in Isabelle, but Coq is similar enough
| (in a hand-wavey kind of way) to where it was relevant to what I
| was writing about, and I stumbled across a few formally verified
| things with Coq.
|
| I became slightly obsessed with CompCert, but it felt like a
| "real" program that was utilizing proper formal verification
| techniques. It seemed so cool to me that there can be (to some
| extent), and "objectively correct" version of a C compiler. I
| still think it's very cool; I wish people would geek out about
| this stuff as much as I would sometimes.
___________________________________________________________________
(page generated 2024-05-26 23:00 UTC)