[HN Gopher] Improvements to static analysis in GCC 14
___________________________________________________________________
Improvements to static analysis in GCC 14
Author : dmalcolm
Score : 216 points
Date : 2024-04-03 13:54 UTC (9 hours ago)
(HTM) web link (developers.redhat.com)
(TXT) w3m dump (developers.redhat.com)
| noam_k wrote:
| Very cool stuff!
|
| I haven't done much C development lately, so I'm curious how
| often `strcpy` and `strcat` are used. Last I checked they're
| almost as big no-nos as using goto. (Yes, I know goto is often
| preferred in kernel dev...) Can anyone share on how helpful the
| c-string analyses are to them?
| sirwhinesalot wrote:
| There's nothing wrong with simple usages of goto.
|
| The strxcpy family on the other hand is complete garbage and
| should never be used for any reason. I'm horrified that they're
| used in the kernel at all. All of those functions (and every
| failed attempt at "fixing" them) should have been nuked from
| orbit.
| laweijfmvo wrote:
| What's wrong with `strncpy`?
| sirwhinesalot wrote:
| It doesn't guarantee that the output is null terminated.
| Big source of exploits.
| i80and wrote:
| strncpy won't always write a trailing nul byte, causing out
| of bounds reads elsewhere. It's a nasty little fellow. See
| the warning at https://linux.die.net/man/3/strncpy
|
| strlcpy() is better and what most people think strncpy()
| is, but still results in truncated strings if not used
| carefully which can also lead to big problems.
| sirwhinesalot wrote:
| Speaking of strlcpy, Linus has some colorful opinions on
| it:
|
| > Note that we have so few 'strlcpy()' calls that we
| really should remove that horrid horrid interface. It's a
| buggy piece of sh*t. 'strlcpy()' is fundamentally unsafe
| BY DESIGN if you don't trust the source string - which is
| one of the alleged reasons to use it. --Linus
|
| Maybe strscpy is finally the one true fixed design to fix
| them all. Personally I think the whole exercise is one of
| unbeliavable stupidity when the real solution is obvious:
| using proper string buffer types with length and capacity
| for any sort of string manipulation.
| jjav wrote:
| > the real solution is obvious
|
| If it were obvious it would have been done already.
| Witness the many variants that try to make it better but
| don't.
|
| > using proper string buffer types with length and
| capacity
|
| Which you then can't pass to any other library. String
| management is very easy to solve within the boundaries of
| your own code. But you'll need to interact with existing
| code as well.
| sirwhinesalot wrote:
| > If it were obvious it would have been done already.
| Witness the many variants that try to make it better but
| don't.
|
| Every other language with mutable strings, including C++,
| does it like that. It is obvious. The reason it is not
| done in C is not ignorance, it is laziness.
|
| > Which you then can't pass to any other library. String
| management is very easy to solve within the boundaries of
| your own code. But you'll need to interact with existing
| code as well.
|
| Ignoring the also obvious solution of just keeping a null
| terminator around (see: C++ std::string), you should only
| worry about it at the boundary with the other library.
|
| Same as converting from utf-8 to utf-16 to talk to the
| Windows API for example.
| jandrese wrote:
| The problem with strlcpy is the return value. You can be
| burned badly if you are using it to for example pull out
| a fixed chunk of string from a 10TB memory mapped file,
| especially if you're pulling out all of the 32 byte
| chunks from that huge file and you just wanted a function
| to stick the trailing 0 on the string and handle short
| reads gracefully.
|
| It's even worse if you are using it because you don't
| fully trust the input string to be null terminated. Maybe
| you have reasons to be believe that it will be at least
| as long as you need, but can't trust that it is a real
| string. As a function that was theoretically written as
| "fix" for strncpy it is worse in some fundamental ways.
| At least strncpy is easy enough to make safe by always
| over-allocating your buffer by 1 byte and stuffing a 0 in
| the last byte.
| Borg3 wrote:
| #define strncpyz(d,s,l) *(strncpy(d,s,l)+(l))=0
|
| Of course this one is unsafe for macro expansion. But
| well, its C :)
| spacechild1 wrote:
| As others have already pointed out it, it doesn't guarantee
| that the result is null-terminated. But that's not the only
| problem! In addition, it always pads the remaining space
| with zeros: char buf[1000];
| strncpy(buf, "foo", sizeof(buf));
|
| This writes 3 characters and 9997 zeros. It's probably not
| what you want 99% of the time.
| jlokier wrote:
| `strncpy` is commonly misunderstood. It's name misleads
| people into thinking it's a safely-truncating version of
| `strcpy`. It's not.
|
| I've seen a lot of code where people changed from `strcpy`
| to `strncpy` because they thought that was safety and
| security best practice. Even sometimes creating a new
| security vulnerability which wasn't there with `strcpy`.
|
| `strncpy` does two unexpected things which lead to safety,
| security and performance issues, especially in large
| codebases where the destination buffers are passed to other
| code:
|
| * `strncpy` does NOT zero-terminate the copied string if it
| limits the length.
|
| Whatever is given the copied string in future is vulnerable
| to a buffer-read-overrun and junk characters appended to
| the string, unless the reader has specific knowledge of the
| buffer length and is strict about NOT treating it as a
| null-terminated string. That's unusual C, so it's rarely
| done correctly. It also doesn't show up in testing or
| normal use, if `strnlen` is "for safety" and nobody enters
| data that large.
|
| * `strncpy` writes the entire destination buffer with zeros
| after the copied string.
|
| Usually this isn't a safety and security problem, but it
| can be terrible for performace if large buffers are being
| used to ensure there's room for all likely input data.
|
| I've seen these issues in large, commercial C code, with
| unfortunate effects:
|
| The code had a security fault because under some
| circumstances, a password check would read characters after
| the end of a buffer due to lack of a zero-terminator, that
| authors over the years assumed would always be there.
|
| A password change function could set the new password to
| something different than the user entered, so they couldn't
| login after.
|
| The code was assumed to be "fast" because it was C, and
| avoided "slow" memory allocation and a string API when
| processing strings. It used preallocated char arrays all
| over the place to hold temporary strings and `strncpy` to
| "safely" copy. They were wrong: It would have run faster
| with a clean string API that did allocations (for multiple
| reasons, not just `strncpy`).
|
| Those char arrays had the slight inconvenience of causing
| oddly mismatched string length limits in text fields all
| over the place. But it was worth it for performance, they
| thought. To avoid that being a real problem, buffers tended
| to be sized to be "larger" than any likely value, so buffer
| sizes like 256 or 1000, 10000 or other arbitrary lengths
| plucked at random depending on developer mood at the time,
| and mismatched between countless different places in the
| large codebase. `strncpy` was used to write to them.
|
| Using `malloc`, or better a proper string object API, would
| have run much faster in real use, at the same time as being
| safer and cleaner code.
|
| Even worse, sometimes strings would be appended in pieces,
| each time using `strncpy` with the remaining length of the
| destination buffer. That filled the destination with zeros
| repeatedly, for every few characters appended. Sometimes
| causing user-interactions that would take milliseconds if
| coded properly, to take minutes.
|
| Ironically, even a slow scripting language like Python
| using ordinary string type would have probably run faster
| than the C application. (Also Python dictionaries would
| have been faster than the buggy C hash tables in that
| application which took O(n) lookup time, and SQLite
| database tables would have been faster, smaller and simpler
| than the slow and large C "optimised" data structures they
| used to store data).
| lelanthran wrote:
| It's not possible to use it safely unless you know that the
| source string fits in the destination buffer. Every strncpy
| must be followed by `dst[sizeof dst - 1] = 0`, and even if
| you do that you still have no idea if you truncated the
| source string, so you have to put in a further check.
| strncpy (dst, src, (sizeof dst) - 1); dst[(sizeof
| dst) - 1] = 0; int truncated = strlen (dst) -
| strlen (src);
|
| Without the extra two lines after every strncpy, you're
| probably going to have a a hard to discover transient bug.
| actionfromafar wrote:
| if you really want to use standard C string functions,
| use instead: int ret = snprintf(dst,
| sizeof dst, "%s", src); if (ret >= n || ret < 0)
| { /* failed */ }
|
| or as a function: bool ya_strcpy(const
| char* s, char* d, size_t n) { int cp
| = snprintf(d, n, "%s", s); bool ok = cp >= 0
| && cp < n; ok ? *s = *s : 0;
| return ok; }
| rdtsc wrote:
| > There's nothing wrong with simple usages of goto
|
| Indeed a like a few gotos here and there for doing cleanup
| toward the end of the function.
| sirwhinesalot wrote:
| Or to break out of nested loops. The problem is with
| unstructured goto spaghetti making the code impossible to
| follow without essentially running it in your head (or a
| debugger).
|
| Goto + Switch (or the GCC computed goto extension) is also
| a wonderful way to implement state machines.
| i80and wrote:
| Some usage of goto is still idiomatic in C if used in ways
| logically equivalent to structured programming constructs C
| lacks. It requires some care, but I mean, it's C.
|
| (I'm not however fond at all of longjmp)
| saagarjha wrote:
| gotos are fine if used judiciously. strcpy and strcat are
| "fine" in that they work when you know your code is correct and
| you have big problems if you don't. But this describes most of
| C, unfortunately.
| dmit wrote:
| > gotos are fine if used judiciously
|
| Is there a language feature that is not? :)
| randomdata wrote:
| _> Last I checked they 're almost as big no-nos as using goto._
|
| Huh? Why is goto a no-no? It is there for good reason. I think
| we all agree with Dijkstra that, in his words, unbridled gotos
| are harmful, but C's goto is most definitely bridled. I doubt
| any language created in the last 50+ years has unbridled gotos.
| That's an ancient programming technique that went out of
| fashion long ago (in large part because of Dijkstra).
| umanwizard wrote:
| goto used in certain idiomatic ways (e.g. to jump to cleanup
| code after an error, or to go to a `retry:` label, or to
| continue or break out of a multiply nested loop) is fine.
| What's annoying is bypassing control flow with random goto
| spaghetti.
| bluGill wrote:
| Languages other than C give you options for flow control so
| that you don't need goto for that. It is a spectrum, if you
| only use goto to jump to the end of a small function on error
| it is okay, though I prefer something better in my language.
| I've seen 30,000 line functions with gotos used for flow
| control (loops and if branches) - something you can do in C
| if you are really that stupid and I think we will all agree
| is bad. This 30,000+ line function with gotos as flow control
| was a lot more common in Dijkstra's day.
| lelanthran wrote:
| > Languages other than C give you options for flow control
| so that you don't need goto for that.
|
| The idiom `if (error) goto cleanup` is about the only thing
| I see goto used for. What flow control replaces that
| _other_ than exceptions?
| randomdata wrote:
| _> What flow control replaces that other than
| exceptions?_
|
| defer has gained in popularity for that situation.
| sirwhinesalot wrote:
| Jumping out of nested loops. Implementing higher level
| constructs like yield or defer. State machines. Compiler
| output that uses C as a "cross-platform" assembly
| language.
|
| All of them are better served with more specialized
| language constructs but as a widely applicable hammer
| goto is pretty nice.
|
| I don't expect C to have good error handling or
| generators any time soon but with goto I can deal with
| it.
| nickpsecurity wrote:
| Compiling HLL constructs in some of those scenarios
| ultimately produces a jump statement. So, it makes sense
| that a higher-level version of a jump would be helpful in
| the same situations.
| randomdata wrote:
| We all agree that you shouldn't write bad code. Not using
| goto, not using any language construct.
|
| But when unbridled gotos were the only tool in the toolbox,
| bad code was an inevitability in a codebase of any
| meaningful size. Not even the best programmer was immune.
| This is what the "Go to statement considered harmful" paper
| was about.
|
| It was written in 1968. We listened. We created languages
| that addressed the concerns raised and moved forward. It is
| no longer relevant. Why does it keep getting repeated in a
| misappropriated way?
| jjav wrote:
| > 30,000 line functions with gotos
|
| The problem there is the 30K line function, not the goto!
| jandrewrogers wrote:
| The use of goto is unambiguously correct and elegant in some
| contexts. Unwavering avoidance of goto can lead to
| unnecessarily ugly, convoluted code that is difficult to
| maintain. It usually isn't common but it has valid uses.
|
| While use of functions like `strcpy` are less advisable, there
| are contexts in which they are guaranteed to be correct unless
| other strong (e.g. language-level) invariants are broken, in
| which case you have much bigger problems. In these somewhat
| infrequent cases, there is a valid argument that notionally
| safer alternatives may be slightly less efficient for no
| benefit.
| sirwhinesalot wrote:
| strcpy and friends don't really have any benefits beyond just
| being there. The "safer" versions are still unsafe in many
| cases, while being less performant and more annoying to use.
|
| Writing a strbuffer type and associated functions isn't
| particularly hard and the resulting interface is nicer to
| use, safer, and more efficient.
| xedrac wrote:
| > The use of goto is unambiguously correct and elegant in
| some contexts.
|
| For C, absolutely. For C++, it's likely a footgun.
| jandrewrogers wrote:
| It has fewer use cases in C++ but it still has use cases
| where the alternatives are worse.
| lelanthran wrote:
| > Last I checked they're almost as big no-nos as using goto.
|
| I don't think so. Gotos are fine, strcat and strcpy without a
| malloc with the correct size in the same scope is a code smell.
| saagarjha wrote:
| Very nice. I'm glad to see these all have detailed reports
| explaining what's wrong!
| 1udfx9cf8azi0 wrote:
| if (nbytes < sizeof(*hwrpb)) return -1;
| if (copy_to_user(buffer, hwrpb, nbytes) != 0) return
| -2;
|
| The fix that was done was: if (nbytes >
| sizeof(*hwrpb))
|
| But I think the correct fix is: if
| (copy_to_user(buffer, hwrpb, sizeof(*hwrpb)) != 0)
|
| It never makes sense to copy out of the hwrpb pointer any size
| other than sizeof(*hwrpb).
| pwagland wrote:
| Right, but the size of the buffer is given, it doesn't make
| sense to stomp over end of the callers buffer either, so you
| can't use pass in something longer than `nbytes` either.
| 1udfx9cf8azi0 wrote:
| That's what the original check is for: if
| (nbytes < sizeof(*hwrpb))
|
| If the buffer isn't large enough to hold *hwrpb, then it
| already fails. The original check was good, only needed to
| change the amount of bytes copied to sizeof(*hwrpb).
| tom_ wrote:
| The original less-than check was deemed incorrect, and was
| replaced entirely. For good or for ill, it seems the author
| deems it valid to pass in a value smaller than sizeof
| *hwrpb, and that many bytes will be dutifully copied. This
| might form part of some barebones API versioning mechanism.
| 1udfx9cf8azi0 wrote:
| > The original less-than check was deemed incorrect
|
| It was only deemed incorrect because of an information
| leak. Not because it's a valid use-case for user space to
| copy smaller portions of *hwrpb into user space. https://
| github.com/torvalds/linux/commit/21c5977a836e399fc71...
| sltkr wrote:
| No, because if nbytes > sizeof(*hwrpb), your version causes
| the kernel to only write part of the buffer, and then when
| the app accesses fields at the end of the struct, it would
| read uninitialized data, which is very bad.
|
| Recall that the API is intended to be used like this:
| struct hwrbp buf; getsysinfo(GSI_GET_HWRPB, &buf,
| sizeof(buf), /* .. */);
|
| At first glance, it might seem unnecessary to pass the
| buffer size at all, because in theory the user and kernel
| should agree on what the sizeof(struct hwrbp) is. But the
| reason it is passed is because there are various reasons
| why the separately compiled kernel and user binaries might
| disagree (e.g., incorrect compiler flags, wrong header file
| being used, struct has changed between different versions,
| etc.), and it's useful to detect that. So you can make an
| argument that the most conservative check is:
| if (nbytes != sizeof(\*hwrpb)) return -1;
|
| After all, if the user and kernel disagree on the correct
| size of the struct, then _something_ is wrong! But allowing
| nbytes < sizeof(*hwrpb) has the benefit that the kernel
| developers can add fields at the end of the struct without
| breaking backward compatibility with older applications.
|
| I would agree with you if the kernel had some other
| mechanism to pass the size of the buffer that was actually
| filled to the client (like e.g. the read() syscall does)
| but the getsysinfo() API doesn't return that data, so the
| kernel must either fill the buffer entirely or return
| failure.*
| 1udfx9cf8azi0 wrote:
| > No, because if nbytes > sizeof(*hwrpb), your version
| causes the kernel to only write part of the buffer, and
| then when the app accesses fields at the end of the
| struct, it would read uninitialized data, which is very
| bad.
|
| > I would agree with you if the kernel had some other
| mechanism to pass the size of the buffer that was
| actually filled to the client (like e.g. the read()
| syscall does) but the getsysinfo() API doesn't return
| that data, so the kernel must either fill the buffer
| entirely or return failure.
|
| As you mention, this struct is versioned. Userspace can
| tell how much of the struct was filled by checking the
| size field (hwrpb->size).
|
| > But allowing nbytes < sizeof(*hwrpb) has the benefit
| that the kernel developers can add fields at the end of
| the struct without breaking backward compatibility with
| older applications.
|
| That's a related but separate issue. Backward
| compatibility can be handled by switching on nbytes or by
| copying fewer bytes with a carefully designed struct.
| It's not clear that backward compatibility was the
| original intention of this code, the original intention
| more seems to be sanitizing tainted input. This struct
| has not changed in at least 16 years.
| perihelions wrote:
| 36 more comments in this other thread:
|
| https://news.ycombinator.com/item?id=39918278 ( _" GCC 14 Boasts
| Nice ASCII Art for Visualizing Buffer Overflows (phoronix.com)"_,
| 2 hours ago)
| Davidbrcz wrote:
| I wish there was a better output format for the analysis, because
| this is hell for screen readers.
| dmalcolm wrote:
| FWIW I implemented SARIF output in GCC 13 which is viewable by
| e.g. VS Code (via a plugin) - though the ASCII art isn't.
|
| You can see an example of the output here:
| https://godbolt.org/z/aan6Kfxds (that's the first example from
| the article, with -fdiagnostics-format=sarif-stderr added to
| the command-line options)
|
| I experimented with SVG output for the diagrams, but didn't get
| this in good enough shape for GCC 14.
| mgaunard wrote:
| -Wstringop-overflow is the first warning I disable because of all
| the false positives.
|
| I doubt the analyze variant would fare any better.
| quincepie wrote:
| To me fanalyzer is one of GCC killer features over clang. It
| makes programming C much easier by explaining errors. The error
| messages also began to feel similar to Rust in terms of being
| developer friendly.
| mr_00ff00 wrote:
| I know Rust (esp on HN) is very hyped for its memory safety and
| nice abstractions, but I really wonder how much Rust owes its
| popularity to its error messages.
|
| I would say the #1 reason I stop learning a technology is
| because of frustrating or unclear errors.
|
| EDIT: Getting a bit of topic, but I meant more because I love C
| and would love it more with rust level error messages.
| dist1ll wrote:
| That's what makes me wary of modifying my NixOS config. A
| single typo and you get an error dump comparable to C++03
| templates.
| Quekid5 wrote:
| ... but you do get an error. That's a lot better what you
| typically get with C or C++. Assuming it's valid systax, of
| course.
|
| This is a veering off topic, but I do agree that Nix-the-
| language has a lot of issues.
|
| (You might suggest Guix, but I don't want to faff about
| with non-supported repositories for table stakes like
| firmware and such. Maybe Nickel will eventually provide a
| more pleasant and principled way to define Nix
| configurations?)
| nh2 wrote:
| My favourite Nix error message is
| infinite recursion encountered, at undefined position
| pxc wrote:
| That's definitely the most painful part of iterating on Nix
| code for me, even in simple configs. You eventually develop
| an intuition for common problems and rely more on that than
| on deciphering the stack traces, but that's really not
| ideal.
| hardwaregeek wrote:
| Yeah Rust is popular because it's a practical language with a
| nice type system, decent escape hatches, and good tooling.
| The borrow checker attracts some, but it could have easily
| been done in a way with terrible usability.
| Quekid5 wrote:
| The hard problem with C is that it's hard to tell if what the
| programmer wrote is an error. Hence warnings... which can be
| very hit or miss, or absurd overkill in some cases.
|
| (Signed overflow being a prime example where you really
| either just need to _define_ what happens or accept that your
| compiler is basically never going to warn you about a
| possible signed overflow -- which is UB. The compromise here
| by Rust is to allow one to pick between some implementation
| defined behaviors. That seems pretty sensible.)
| uecker wrote:
| For signed overflow I use -fsanitize=signed-integer-
| overflow .
| Quekid5 wrote:
| Good. I wonder how many people do and also if their
| compilers support it. (One would hope so, of course. I
| assume clang and GCC do.)
|
| ... but the question is really what you ship to
| production.
|
| Btw, possible signed overflow was just an example of
| things _people do not want warnings for_. OOB is far more
| dangerous, obviously... and the cost for sanitizer in
| _that_ case is HUGE... and it doesn 't actually catch all
| cases AFAIUI.
| giovannibonetti wrote:
| Arguably Rust got good error messages by learning from Elm:
| https://elm-lang.org/news/compiler-errors-for-humans
| estebank wrote:
| Elm is acknowledged as being the initial inspiration for
| focusing on diagnostics early on, but Rust got good error
| messages through elbow grease and focused attention over a
| long period of time.
|
| People getting used to good errors and demanding more, is
| part of the virtuous circle that keeps them high quality.
|
| Making good looking diagnostics requires UX work, but
| making _good diagnostics_ requires a flexible compiler
| architecture and a lot of effort, nothing more, nothing
| less.
| jonathankoren wrote:
| > I would say the #1 reason I stop learning a technology is
| because of frustrating or unclear errors.
|
| Overly verbose error messages that obscure more than
| illuminate are chief complaint against C++.
|
| Honestly, they can just sap all the energy out of a project.
| cogman10 wrote:
| "You violated a template rule. Here's a novel on everything
| that's broken as a result"
|
| It's why the Constraint system was important for C++.
| someplaceguy wrote:
| Clang has a similar tool, the Clang Static Analyzer:
| https://clang-analyzer.llvm.org/
| chc4 wrote:
| I have had the exact opposite experience: clang constantly
| gives me much better error messages than GCC, implementations
| of some warnings or errors catch more cases, and clang-tidy is
| able to do much better static analysis.
___________________________________________________________________
(page generated 2024-04-03 23:00 UTC)