[HN Gopher] Love C, hate C: Web framework memory problems
___________________________________________________________________
Love C, hate C: Web framework memory problems
Author : OneLessThing
Score : 149 points
Date : 2025-10-10 03:39 UTC (1 days ago)
(HTM) web link (alew.is)
(TXT) w3m dump (alew.is)
| jacquesm wrote:
| There are many, many more such issues with that code. The person
| that posted it is new to C and had an AI help them to write the
| code. That's a recipe for disaster, it means the OP does not
| actually understand what they wrote. It looks nice but it is full
| of footguns and even though it is a useful learning exercise it
| also is a great example of why it is better run battle tested
| frame works than to inexpertly roll your own.
|
| As a learning exercise it is useful, but it should never see
| production use. What is interesting is that the apparent
| cleanliness of the code (it reads very well) is obscuring the
| fact that the quality is actually quite low.
|
| If anything I think the conclusion should be that AI+novice does
| not create anything that is useable without expert review and
| that that probably adds up to a net negative other than that the
| novice will (hopefully) learn something. It would be great if
| someone could put in the time to do a full review of the code, I
| have just read through it casually and already picked up a couple
| of problems, I'm pretty sure that if you did a thorough job of it
| there would be many more.
| drnick1 wrote:
| > What is interesting is that the apparent cleanliness of the
| code (it reads very well) is obscuring the fact that the
| quality is actually quite low.
|
| I think this is a general feature and one of the greatest
| advantages of C. It's simple, and it reads well. Modern C++ and
| Rust are just horrible to look at.
| messe wrote:
| I slightly unironically believe that one of the biggest
| hindrances to rust's growth is that it adopted the :: syntax
| from C++ rather than just using a single . for namespacing.
| jacquesm wrote:
| I believe that the fanatics in the rust community were the
| biggest factor. They turned me off what eventually became a
| decent language. There are some language particulars that
| were strange choices, but I get that if you want to start
| over you will try to get it all right this time around. But
| where the Go authors tried to make the step easy and kept
| their ego out of it, it feels as if the rust people aimed
| at creating a new temple rather than to just make a new
| tool. This created a massive chicken-and-the-egg problem
| that did not help adoption at all. Oh, and toolchain speed.
| For non-trivial projects for the longest time the rust
| toolchain was terribly slow.
|
| I don't remember any other language's proponents actively
| attacking the users of other programming language.
| imtringued wrote:
| Software vulnerabilities are an implicit form of
| harassment.
| messe wrote:
| I'm hoping that's meant to satirise the rust community,
| because it's horseshit like this that makes a sizeable
| subset of rust evangelists unbearable.
| 01HNNWZ0MV43FF wrote:
| > I don't remember any other language's proponents
| actively attacking the users of other programming
| language.
|
| I just saw someone on Hacker News saying that Rust was a
| bad language because of its users
| jacquesm wrote:
| Yawn. Really, if you have nothing to say don't do it
| here.
| LexiMax wrote:
| Gotcha hypocrisy might be a really cheap thing to point
| out, but they're not wrong.
|
| I have noticed my fair share of Rust Derangement Syndrome
| in C++ spaces that seems completely outsized from the
| series of microaggressions that they eventually point out
| when asked "Why?"
| dgfitz wrote:
| It's interesting, over the past 15 years I've had
| occasion to work with other c/c++ devs on various
| contracts, probably 50ish distinct different companies.
| Not once has rust even come up in casual conversation.
| lelanthran wrote:
| > I believe that the fanatics in the rust community were
| the biggest factor.
|
| I second this; for a few years it was impossible to have
| any sort of discussion on various programming places when
| the topic was C: the conversation would get quickly
| derailed with accusations of "dinosaur", etc.
|
| Things have gone quiet recently (last three years,
| though) and there have been much fewer derailments.
| LexiMax wrote:
| As an outsider, I don't really see Rust having done
| anything different recently than they weren't doing from
| the start.
|
| What seems to have changed in recent years is the buy-in
| from corporations that seemingly see value in its
| promises of safety. This seems to be paired with a
| general pulling back of corporate support from the C++
| world as well as a general recession of fresh faces, a
| change that at least from the sidelines seems to be
| mostly down to a series of standards committee own-goals.
| LexiMax wrote:
| Being a C++ developer and trafficking mostly in C++
| spaces, there is a phenomenon I've noticed that I've
| taken to calling Rust Derangement Syndrome. It's where C
| and C++ developers basically make Rust the butt of every
| joke, and make fun it it in a way that is completely
| outsized with how much they interact with Rust developers
| in the wild.
|
| It's very strange to witness. Annoying advocacy of
| languages is nothing new. C++ was at one point one of
| those languages, then it was Java, then Python, then
| Node.js. I feel like if anything, Rust was a victim of a
| period of increased polarization on social media, which
| blew what might have been previously seen as simple
| microaggressions completely out of proportion.
| hu3 wrote:
| I don't think Rust will ever be as big as C++ because
| there were fewer options back then.
|
| These days Go/Zig/Nim/C#/Java/Python/JS and other
| languages are fast enough for most use cases.
|
| And Rust learning curve doesn't help either. C++ was
| basically C with OOP on steroids. Rust is very different.
|
| I say that because I wouldn't group Rust opposition with
| any of those languages you cited. It's different for
| mostly different reasons and magnitudes.
| cyphar wrote:
| > But where the Go authors tried to make the step easy
| and kept their ego out of it
|
| That is very different to my memories of the past decade+
| of working on Go.
|
| Almost every single language decision they eventually
| caved on that I can think of (internal packages,
| vendoring, error wrapping, versioning, generics) was
| preceded by months if not years of arguing that it wasn't
| necessary, often followed by an implementation attempt
| that seems to be ever so slightly off just out of spite.
|
| Let's don't forget that the original Go 1.0 expected
| every project's main branch to maintain backward
| compatibility forever or else downstreams would break,
| and without hacks (which eventually became vendoring) you
| could not build anything without an internet connection.
|
| To be clear, I use Go (and C... and Rust) and I do like
| it on the whole (despite and for its flaws) but I don't
| think the Go _authors_ are that different to the Rust
| _authors_. There are (unfortunately) more fanatics in the
| Rust community but I think there 's also a degree to
| which some people see anything Rust-related as being an
| attack on other projects regardless of whether the Rust
| authors intended it to be that way.
| jacquesm wrote:
| Fair enough.
| citbl wrote:
| The safer the C code, the more horrible it starts looking
| though... e.g. my_func(char msg[static 1])
| uecker wrote:
| Compared to other languages, this is still nice.
| jacquesm wrote:
| It is - like everything else - nice because you, me and
| lots of others are used to it. But I remember starting
| out with C and thinking 'holy crap, this is ugly'. After
| 40+ years looking at a particular language it no longer
| looks ugly simply because of familiarity. But to a
| newcomer C would still look quite strange and
| intimidating.
|
| And this goes for almost all programming languages. Each
| and every one of them has warts and issues with syntax
| and expressiveness. That holds true even for the most
| advanced languages in the field, Haskell, Erlang, Lisp
| and more so for languages that were originally designed
| for 'readability'. Programming is by its very nature more
| akin to solving a puzzle than to describing something.
| The puzzle is to how to get the machine to do something,
| to do it correctly, to do it safely and to do it
| efficiently, and all of those while satisfying the
| constraint of how much time you are prepared (or allowed)
| to spend on it. Picking the 'right' language will always
| be a compromise on some of these, there is no programming
| language that is perfect (or even just 'the best' or
| 'suitable') for all tasks, and there are no programming
| languages that are better than any other for any subset
| of all tasks until 'tasks' is a very low number.
| uecker wrote:
| I agree that the first reaction usually is only about
| what one is used to. I have seen this many times. Still,
| of course, not all syntax is equally good.
|
| For example, the problem with Vec<Vec<T>> for a 2D array
| is not that one is not used to it, but that the syntax is
| just badly designed. Not that C would not have
| problematic syntax, but I still think it is fairly good
| in comparison.
| jacquesm wrote:
| C has one massive advantage over many other languages: it
| is just a slight level above assembler and it is just
| about as minimal as a language can be. It doesn't force
| you into an eco-system, plays nice with lots of other
| tools and languages and gets out of the way. 'modern'
| languages, such as Java, Rust, Python, Javascript (Node)
| and so on all require you to buy in to the whole menu,
| they're not 'just a language' (even if some of them
| started out like that).
| uecker wrote:
| Not forcing you into an eco-system is what makes C
| special, unique and powerful, and this aspect is not well
| understood by most critics. Stephen Kell wrote a great
| essay about it.
| moefh wrote:
| I don't understand why people think this is safer, it's the
| complete opposite.
|
| With that `char msg[static 1]` you're telling the compiler
| that `msg` can't possibly be NULL, which means it will
| optimize away any NULL check you put in the function. But
| it will still happily call it with a pointer that could be
| NULL, with no warnings whatsoever.
|
| The end result is that with an "unsafe" `char *msg`, you
| can at least handle the case of `msg` being NULL. With the
| "safe" `char msg[static 1]` there's nothing you can do --
| if you receive NULL, you're screwed, no way of guarding
| against it.
|
| For a demonstration, see[1]. Both gcc and clang are passed
| `-Wall -Wextra`. Note that the NULL check is removed in the
| "safe" version (check the assembly). See also the gcc
| warning about the "useless" NULL check ("'nonnull' argument
| 'p' compared to NULL"), and worse, the lack of warnings in
| clang. And finally, note that neither gcc or clang warn
| about the call to the "safe" function with a pointer that
| could be NULL.
|
| [1] https://godbolt.org/z/qz6cYPY73
| lelanthran wrote:
| > I don't understand why people think this is safer, it's
| the complete opposite.
|
| Yup, and I don't even need to check your godbolt link -
| I've had this happen to me once. It's the implicit
| casting that makes it a problem. You cannot even typedef
| it away as a new type (the casting still happens).
|
| The real solution is to create and use opaque types. In
| this case, wrapping the `char[1]` in a struct would
| almost certainly generate compilation errors if any
| caller passed the wrong thing in the `char[1]` field.
| pjmlp wrote:
| Meanwhile, in Modula-2 from 1978, that would be
| PROCEDURE my_func(msg: ARRAY OF CHAR);
|
| Now you can use LOW() and HIGH() to get the lower and upper
| bounds, and naturally bounds checked unless you disabled
| them, locally or globaly.
| jacquesm wrote:
| This should not be downvoted, it is both factually
| correct _and_ a perfect illustration of these problems
| already being solved and ages ago at that.
|
| It is as if just pointing this out already antagonizes
| people.
| pjmlp wrote:
| A certain group of people likes to pretend before C there
| were no other systems programming languages, other than
| BCPL.
|
| Ignoring what happened since 1958 (JOVIAL being a first
| attempt), and thus all its failings are excused because
| it was discovering the world.
| OneLessThing wrote:
| I agree that it reads really well which is why I was also
| surprised the quality is not high when I looked deeper. The
| author claims to have only used AI for the json code, so your
| conclusion may be off, it could just be a novice doing novice
| things.
|
| I suppose I was just surprised to find this code promoted in my
| feed when it's not up to snuff. And I'm not hating, I do in
| fact love the project idea.
| lifthrasiir wrote:
| Yeah, I recently wrote a moderate amount of C code [1] entirely
| with Gemini and while it was much better than what I initially
| expected I needed a constant steering to avoid inefficient or
| less safe code. It needed an extensive fuzzing to get the
| minimal amount of confidence, which caught at least two serious
| problems---seriously, it's much better than most C programmers,
| but still.
|
| [1] https://github.com/lifthrasiir/wah/blob/main/wah.h
| jacquesm wrote:
| I've been doing this the better part of a lifetime and I
| still need to be careful so don't feel bad about it. Just
| like rust has an 'unsafe' keyword I realize _all_ of my code
| is potentially unsafe. Guarding against UB, use-after-free,
| array overruns and so on is a lot of extra work and you only
| need to slip up once to have a bug, and if you 're unlucky
| something exploitable. You get better at this over the years.
| But if I know something needs to be bullet proof the C
| compiler would not be my first tool of choice.
|
| One good defense is to reduce your scope continuously. The
| smaller you make your scope the smaller the chances of
| something escaping your attention. Stay away from globals and
| global data structures. Make it impossible to inspect the
| contents of a box without going through a well defined
| interface. Use assertions liberally. Avoid fault propagation,
| abort immediately when something is out of the expected
| range.
| uecker wrote:
| I strategy that helps me is just not use open-coded pointer
| arithmetic or string manipulation but encapsulate those
| behind safe bounds-checked interfaces. Then essentially
| only life-time issues remain and for those I usually do
| have a simple policy and clearly document any exception. I
| also use signed integers and the sanitizer in trapping
| mode, which turns any such issue I may have missed into a
| run-time trap.
| OneLessThing wrote:
| This is why I love C. You can build these guard rails at
| exactly the right level for you. You can build them all
| the way up to CPython and do garbage collection and
| constant bounds checking. Or keep them at just raw
| pointer math. And everywhere in between. I like your
| approach. The downside being that there are probably
| 100,000+ bespoke implementations of similar guard rails
| where python users for example all get them for free.
| jacquesm wrote:
| It definitely is a lot of freedom.
|
| But the lack of a good string library is by itself
| responsible for a very large number of production issues,
| as is the lack of foresight regarding de-referencing
| pointers that are no longer valid. Lack of guardrails
| seems to translate in 'do what you want' not necessarily
| 'build guard rails at the right level for you', most
| projects simply don't bother with guardrails at all.
|
| Rust tries to address a lot of these issues, but it does
| so by tossing out a lot of the good stuff as well and
| introducing a whole pile of new issues and concepts that
| I'm not sure are an improvement over what was there
| before. This creates a take-it-or-leave it situation, and
| a barrier to entry. I would have loved to see that guard
| rails concept extended to the tooling in the form of
| compile time flags resulting in either compile time
| flagging of risky practices (there is some of this now,
| but I still think it is too little) and runtime errors
| for clear violations.
|
| The temptation to 'start over' is always there, I think C
| with all of its warts and shortcomings is not the best
| language for a new programmer to start with if they want
| to do low level work. At the same time, I would - still,
| maybe that will change - hesitate to advocate for rust,
| it is a massive learning curve compared to the kind of
| appeal that C has for a novice. I'd probably recommend Go
| or Java over both C and rust if you're into imperative
| code and want to do low level work. For functional
| programming I'd recommend Erlang (if only because of the
| very long term view of the people that build it) or
| Clojure, though the latter seems to be on its retour.
| uecker wrote:
| I think the C standard should provide some good
| libraries, e.g. a string library. But in any case the
| problem with 100000+ bespoke implementations in C is not
| fixed by designing new programming languages and also
| adding them to the mix. Entropy is a bitch.
| lelanthran wrote:
| > I strategy that helps me [...]
|
| In another comment recently I opined that C projects,
| initiated in 2025, are likely to be much more secure than
| the same project written in Python/PHP (etc).
|
| This is because the only people _choosing_ C in 2025 are
| those who have been using it already for decades, have
| internalised the handful of footguns via actual
| experience and have a set of strategies for minimising
| those footguns, all shaped with decades of experience
| working around that tiny handful of footguns.[1]
|
| Sadly, _this_ project has rendered my opinion wrong - it
| 's a project initiated in 2025, in C, that was obviously
| done by an LLM, and thus is filled with footguns and
| over-engineering.
|
| ============
|
| [1] I also have a set of strategies for dealing with the
| footguns; I would gues if we sat down together and
| compared notes our strategies would have more in common
| than they would differ.
| uecker wrote:
| If you want something fool-proof where a statistical code
| generated will not generate issues, then C is certainly
| not a good choice. But also for other languages this will
| cause issues. I think for vibe-coding a network server
| you might want something sand-boxed with all security
| boundaries outside, in which case it does not really
| matter anymore.
| OneLessThing wrote:
| This is exactly my problem with LLM C code, lack of
| confidence. On the other hand, when my projects get big
| enough to the point where I cannot keep the code base
| generally loaded into my brains cache they eventually get to
| the point where my confidence comes from extensive testing
| regardless. So maybe it's not such a bad approach.
|
| I do think that LLM C code if made with great testing tooling
| in concert has great promise.
| jacquesm wrote:
| That generalizes to anything LLM related.
| lelanthran wrote:
| > It needed an extensive fuzzing to get the minimal amount of
| confidence, which caught at least two serious problems---
| seriously, it's much better than most C programmers, but
| still.
|
| How are you doing your fuzzing? You need either valgrind (or
| compiler sanitiser flags) in the loop for a decent level of
| confidence.
| lifthrasiir wrote:
| The "minimal" amount of confidence, not a decent level of
| confidence. You are completely right that I need much more
| to establish anything higher than that.
| citbl wrote:
| The irony is also that AI could have been used to audit the
| code and find these issues. All the author had to do was to
| question.
| nurettin wrote:
| > should never see production use.
|
| I have an issue with high strung opinions like this. I wrote
| plenty of crappy delphi code while learning the language that
| saw production use and made a living from it.
|
| Sure, it wasn't the best experience for users, it took years to
| iron out all the bugs and there was plenty of frustration
| during the support phase (mostly null pointer exceptions and db
| locks in gui).
|
| But nobody would be better off now if that code never saw
| production use. A lot of business was built around it.
| zdragnar wrote:
| Buggy code that just crashes or produces incorrect results
| are a whole different category. In C a bug can compromise a
| server and your users. See the openssl heart bleed
| vulnerability as a prime example.
|
| Once upon a time, you could put up a relatively vulnerable
| server, and unless you got a ton of traffic, there weren't
| too many things that would attack it. Nowadays, pretty much
| anything Internet facing will get a constant stream of
| probes. Putting up a server requires a stricter mindset than
| it used to.
| jacquesm wrote:
| There are minimum standards for deployment to the open web. I
| think - and you're of course entirely free to have a
| different opinion - that those are not met with this code.
| nurettin wrote:
| Yes, I have lots of opinions!
|
| I guess the question at spotlight is: At what point would
| your custom server's buffer overflow when reading a header
| matter and would that bug even exist at that point?
|
| Could a determined hacker get to your server without even
| knowing what weird software you cooked up and how to
| exploit your binary?
|
| We have a lot of success stories born from bad code. I mean
| look at Micro$oft.
|
| Look at all the big players like discord leaking user
| credentials. Why would you still call out the little fish?
|
| Maybe I should create a form for all these ahah.
| frumplestlatz wrote:
| > Could a determined hacker get to your server without
| even knowing what weird software you cooked up and how to
| exploit your binary?
|
| Yes.
| nurettin wrote:
| Yes but how? After the overflow they still have to know
| the address of the next call site and the server would be
| in a UB state.
| jacquesm wrote:
| The code is on github. Figure out a way to get a shell
| through that code and you're hosed if someone recognizes
| it in active use.
| nurettin wrote:
| I mean tha hacker won't know what software is running on
| the server, unless the server announces itself which can
| be traced to the repo, but then, why ?? Who cares about
| this guy's vps? This whole thread makes no sense to me
| and I'm the only one questioning.
| lelanthran wrote:
| I can't completely blame the language here: anyone "coding" in a
| language new to them using an LLM is going to have real problems.
| OneLessThing wrote:
| It's funny the author says this was 90% written without AI, and
| that AI was mostly used for the json code. I think they're just
| new to C.
|
| Trust me I love C. Probably over 90% of my lifetime code has
| been written in C. But python newbies don't get their web
| frameworks stack smashed. That's kind of nice.
| lelanthran wrote:
| > But python newbies don't get their web frameworks stack
| smashed. That's kind of nice.
|
| Hah! True :-)
|
| The thing is, smashed stacks are _difficult_ to exploit
| deterministically or automatically. Even heartbleed, as
| widespread as it was, was not a guaranteed RCE.
|
| OTOH, an exploit in a language like Python is almost
| certainly going to be easier to exploit deterministically.
| Log4j, for example, was a _guaranteed_ exploit and the skill
| level required was basically _" Create a Java object"_.
|
| This is because of the ease with which even very junior
| programmers can create something that appears to run and work
| and not crash.
| alfiedotwtf wrote:
| > The thing is, smashed stacks are difficult to exploit
| deterministically or automatically. Even heartbleed, as
| widespread as it was, was not a guaranteed RCE.
|
| That's like driving without a seatbelt - it's not safe, but
| it would only matter on that very rare chance you have a
| crash. I would rather just wear a seatbelt!
| uyzstvqs wrote:
| It's a double-sided coin. LLMs are probably the best way to
| learn programming languages right now. But if you vibecode in a
| programming language that you don't understand, it's going to
| be a disaster sooner or later.
|
| This is also the reason why AI will not replace any actual jobs
| with merit.
| AdieuToLogic wrote:
| > LLMs are probably the best way to learn programming
| languages right now.
|
| Books still exist, be they in print or electronic form.
| zweifuss wrote:
| I would claim that:
|
| (interactive labs + quizzes) > Learning from books
|
| Good online documentation > 5yr old tome on bookshelf
|
| chat/search with ai > CTRL+F in a PDF manual
| skydhash wrote:
| Interactive labs can do a great job of teaching skills,
| but they fell short of teaching understanding. And at
| some point, it's faster to read a book to learn, because
| there's a reduced need for practice.
|
| Hypertext is better than printed book format, but if
| you're just starting with something you need a guide that
| provides a coherent overview. Also most online
| documentation are just bad.
|
| Why ctrl+f? You can still have a table of contents and an
| index with pdf. And the pdf formats support link. And I'd
| prefer filtering/querying over generation because the
| latter is always tainted by my prompt. If I type `man
| unknown_function`, I will get an error, not a generated
| manual page.
| estimator7292 wrote:
| Examples are the best documentation, and we now have a
| machine to produce infinite examples tailored specifically
| to any situation
| nxobject wrote:
| Pending on the quality of the examples, of course.
| messe wrote:
| > Another interesting choice in this project is to make lengths
| signed:
|
| There are good reasons for this choice in C (and C++) due to
| broken integer promotion and casting rules.
|
| See: "Subscripts and sizes should be signed" (Bjarne Stroustrup)
| https://open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1428r0...
|
| As a nice bonus, it means that ubsan traps on overflow (unsigned
| overflows just wrap).
| uecker wrote:
| I do not agree that the integer promotion or casting (?) rules
| are broken in C. That some people make mistakes because they do
| not know them is a different problem.
|
| The reason you should make length signed is that you can use
| the sanitizer to find or mitigate overflow as you correctly
| observe, while unsigned wraparound leads to bugs which are
| basically impossible to find. But this has nothing to do with
| integer promotion and wraparound bugs can also create bugs in -
| say - Rust.
| OneLessThing wrote:
| It's interesting to hear these takes. I've never had problems
| catching unsigned wrap bugs with plain old memory sanitizers,
| though I must admit to not having a lot of experience with
| ubsan in particular. Maybe I should use it more.
| jacquesm wrote:
| I've had some fun reviewing some very old code I wrote
| (1980's) to see what it looked like to me after such a long
| time of gaining experience. It's not unlike what the OP did
| here, it reads cleanly but I can see many issues that
| escaped my attention at the time. I always compared C with
| a very fast car: you can take some corners on two wheels
| but if you make a habit of that you're going to end up in a
| wall somewhere. That opinion has not changed.
| uecker wrote:
| I think the correct comparison is a sharp knife. It is
| extremely useful and while there is a risk it is fully
| acceptable. The idea that we should all use plastic
| knifes because there are often accidents with knifes is
| wrong and so is the idea that we use should abandon C
| because of memory safety. I follow computer security
| issues for several decades, and while I think we should
| have memory safety IMHO the push and arguments are
| completely overblown - and they are especially not worth
| the complexity and issues of Rust. I never was personally
| impacted by a security exploit caused by memory safety or
| know anybody in my personal vicinity who was. I know many
| cases where people where affected by _other_ kinds of
| security issues. So I think those are what we should
| focus on first. And having timely security updates is a
| hell lot more important than memory safety, so I am not
| happy that Rust now makes this harder.
| jacquesm wrote:
| That's an interesting point you are making there. The
| most common exploits are of the human variety. Even so it
| is probably a good idea to minimize the chances of all
| kinds of exploits. One other problem - pet peeve of mine
| - is that instead of giving people _just_ security
| updates manufacturers will happily include a whole bunch
| of new and 'exciting' stuff in their updates that in
| turn will (1) introduce new security issues and (2) will
| inevitably try to extract more money from the updaters.
| This is extremely counterproductive.
| simonask wrote:
| I'm sorry, but there is an incredible amount of hard data
| on this, including the number of CVEs directly
| attributable to memory safety bugs. This is publicly
| available information, and we as an industry should take
| it seriously.
|
| I don't mean to be disrespectful, but this cavalier
| attitude towards it reads like vaccine skepticism to me.
| It is not serious.
|
| Programming can be inconsequential, but it can also be
| national security. I know which engineers I would trust
| with the latter, and they aren't the kind who believe
| that discipline is "enough".
| jacquesm wrote:
| So what do you propose to do?
| simonask wrote:
| I propose that we start taking the appropriate amount of
| professional responsibility.
|
| That includes being honest about the actual costs of
| software when you don't YOLO the details. Zero UB is
| table stakes now - it didn't use to be, but we don't live
| in that world anymore.
|
| It's totally fine to use C or whatever language for it,
| but you are absolutely kidding yourself if you think the
| cost is less than at least an order of magnitude higher
| than the equivalent code written in Rust, C#, or any
| other language that helps you avoid these bugs. Rust even
| lets you get there at zero performance cost, so we're
| down to petty squabbles about syntax or culture - not
| serious.
| jacquesm wrote:
| > I propose that we start taking the appropriate amount
| of professional responsibility.
|
| I agree. For me that means: software _engineering_ should
| start taking the same attitude to writing software that
| structural engineers bring to the table when they talk
| about bridges, buildings and other structures that will
| have people 's lives depending on them. I'm not sure how
| we're going to make rings out of bits but we need to
| realize - continuously - that the price of failure is
| often paid in blood, or in the best case with financial
| loss and usually not by us. And in turn we should be
| enabled to impose that same ethic on management, because
| more often than not that's the root cause of the problem.
|
| > That includes being honest about the actual costs of
| software when you don't YOLO the details.
|
| Does that include development cost?
|
| Maintenance costs?
|
| Or just secondary costs?
|
| Why the focus on costs?
|
| > Zero UB is table stakes now - it didn't use to be, but
| we don't live in that world anymore.
|
| This is because 'Rust and C# exist'? Or is it because
| Java, Erlang, Visual Basic, Lisp etc exist?
|
| > It's totally fine to use C or whatever language for it,
| but you are absolutely kidding yourself if you think the
| cost is less than at least an order of magnitude higher
| than the equivalent code written in Rust, C#, or any
| other language that helps you avoid these bugs.
|
| We were talking about responsibility first, and that goes
| well beyond just measuring 'cost'. The mistake in
| bringing cost into it is that cost is a business concept
| that is used to justify picking a particular technology
| over another. And just like security is an expense that
| doesn't show anything on the balance sheet if it works
| besides that it cost money the same goes for picking a
| programming language eco-system.
|
| So I think focusing on cost is a mistake. That just
| allows the bookkeeper to make the call and that call will
| often be the wrong one.
|
| > Rust even lets you get there at zero performance cost,
| so we're down to petty squabbles about syntax or culture
| - not serious.
|
| The debate goes a lot further than that. You have
| millions of people that are writing software every day
| that are not familiar with Rust. To get them to pick a
| managed language over what they are used to is going to
| take a lot of convincing.
|
| It starts of with ethics, and I don't think it should
| start off with picking a favorite language. You educate,
| show by example and you deliver at or below the same cost
| that those other eco-systems do and then you slowly eat
| the world because your projects are delivered on-time,
| with provably lower real world defects and hopefully at a
| lower cost.
|
| And then I really couldn't care what language was picked,
| in the rust world that translates into 'anything but C'
| because that is perceived to be the enemy somehow, which
| is strange because there are many alternatives to rust
| that are perfectly suitable, have much higher mind share
| already.
|
| C is - even today - at 10x the popularity that rust is,
| it will take a massive amount of resources to switch
| those people over, and likely it will take more than one
| generation. In the meantime all of the C code in the
| world will have to be maintained, which means there is
| massive job security for people learning C. For people
| learning rust to the exclusion of learning C that
| situation is far worse. This needs to be solved.
|
| These are not 'petty squabbles' about syntax or culture.
| They are the harsh reality of the software development
| world at large, which has seen massive projects deployed
| at scale developed with those really bad languages full
| of undefined behavior (well, that's at least one thing
| that Assembly Language has going for it, as long as the
| CPU does what it says in the book undefined behavior
| doesn't exist). People are going to point at that and say
| 'good enough'. And they see all those memory overflows,
| CVEs etc as a given, and they realize that in spite of
| all of those the main vector for security issues is
| people, and configuration mistakes not so much the
| software itself.
|
| This is not ideal, obviously, but C, like any bad habit,
| is very hard to dislodge if your main argument is 'you
| should drop this tool because mine is better'. Then you
| need to _show_ that your tool is better, so much better
| that it negates the cost to switch. And that 's a very
| tall order, for any programming language, much more so
| for one that is struggling for adoption in the first
| place.
| simonask wrote:
| Cost is a useful metric because it reflects a number of
| relevant things: Time to develop, effort to maintain -
| yes, but also people turnover, required expertise levels,
| satisfaction, and so on. Whether or not you like it, you
| have to care about cost if you want to make rational
| decisions. I'm not talking about assigning a
| Euro/Dollar/Yuan value to each hour spent on a project,
| but you need a rough idea about the size of the time and
| energy investment you are making when starting a project.
|
| > This is because 'Rust and C# exist'? Or is it because
| Java, Erlang, Visual Basic, Lisp etc exist?
|
| Things have changed for three important reasons: (1)
| C/C++ compilers have evolved, and UB is significantly
| more catastrophic than it was in the 90s and early 00s.
| (2) As societies digitize, the stakes are higher than
| even - leaking personal data has huge legal and moral
| consequences, and system outages can have business-
| killing financial consequences. (3) There are actual,
| viable alternatives - GC is no longer a requirement for
| memory safety.
|
| > To get them to pick a managed language over what they
| are used to is going to take a lot of convincing.
|
| Perhaps you didn't mean to say so, but Rust is not a
| managed language (that's a .NET term referring to C#, F#,
| etc.).
|
| Me and other Rust users are obviously trying to convince
| even more people to use the language, and that's because
| we are having a great time over here. It's a very
| pleasant language with a pleasant community and a high
| level of technical expertise, and it allows me to get
| significantly closer to living up to my own ideals. I'm
| not making a moral argument here, trying to say that you
| or anyone is a bad person for not using Rust, but I am
| making a moral argument saying that _denying_ the huge
| cost and risk associated with developing software in C
| and C++ is bullshit.
|
| > And then I really couldn't care what language was
| picked, in the rust world that translates into 'anything
| but C' because that is perceived to be the enemy somehow,
| which is strange because there are many alternatives to
| rust that are perfectly suitable, have much higher mind
| share already.
|
| The point here is that, until Rust came along, you had
| the choice between wildly risky (but fast) C and C++
| code, or completely safe (but slow) garbage collected
| languages with heavy runtimes and significant deployment
| challenges.
|
| C is certainly not "the enemy" - I never said that, and I
| wouldn't. But that old world is gone. The excuse of
| picking risky, problem-riddled languages that we _know_
| are associated with extreme costs for reasons of
| performance no longer has any technical merit. There can
| be other reasons, but this isn 't it.
|
| > C is - even today - at 10x the popularity that rust is,
| it will take a massive amount of resources to switch
| those people over [...]
|
| It's insane to me that anyone would limit themselves to a
| single language. Every competent programmer I know knows
| at least a handful. Why are we worried about this? I'm a
| decent C programmer, and a very good C++ programmer -
| better at both because I'm also fairly good at Rust.
|
| > And they see all those memory overflows, CVEs etc as a
| given, and they realize that in spite of all of those the
| main vector for security issues is people, and
| configuration mistakes not so much the software itself.
|
| "Pobody's nerfect." I'm sorry, I really dislike this
| attitude. We can't let the fact that security is hard, or
| that perfection is unattainable, be an excuse to deliver
| more crap.
|
| > This is not ideal, obviously, but C, like any bad
| habit, is very hard to dislodge if your main argument is
| 'you should drop this tool because mine is better'
|
| Again, that's not my argument. My argument is that you
| should be honest about what the actual costs, or
| alternatively the actual quality.
| jacquesm wrote:
| > Cost is a useful metric because it reflects a number of
| relevant things: Time to develop, effort to maintain -
| yes, but also people turnover, required expertise levels,
| satisfaction, and so on. Whether or not you like it, you
| have to care about cost if you want to make rational
| decisions. I'm not talking about assigning a
| Euro/Dollar/Yuan value to each hour spent on a project,
| but you need a rough idea about the size of the time and
| energy investment you are making when starting a project.
|
| You are missing the cost to switch and that's a massive
| one and the one that I think most parties are using to
| decide whether or not to stick with what they know or to
| try something that is new to them. If you have a team of
| 50 embedded C++ developers and a deadline 'let's use
| rust' is a gamble very few managers will make.
|
| > Things have changed for three important reasons: (1)
| C/C++ compilers have evolved, and UB is significantly
| more catastrophic than it was in the 90s and early 00s.
|
| That depends on what industry you are looking at. For
| instance, in aviation the cost of undefined behavior,
| crashing software or wrong calculations was always that
| high. The difference is that in that industry (and a
| handful of others) there is enough budget to do it right
| resulting in far fewer in production issues than what we
| have come to accept in the 'always online, auto-update'
| world. That whole attitude is as much or more to blame
| for this than any particular language.
|
| > (2) As societies digitize, the stakes are higher than
| even - leaking personal data has huge legal and moral
| consequences, and system outages can have business-
| killing financial consequences.
|
| Show me the names of the businesses that have died
| because of data leaks or UB. See, the problem is that for
| those businesses it usually is just a speedbump. They
| don't care and no matter what the size of the breach the
| consequences are usually minor.
|
| The employee sticking a USB drive found on the street
| into their laptop causing a cryptolocker incident is a
| much more concrete problem.
|
| > (3) There are actual, viable alternatives - GC is no
| longer a requirement for memory safety.
|
| GC is a convenience, and if you're going to switch
| languages you might as well pick one that is is more
| convenient. Java for instance is suitable now for 90% or
| so of the use cases where C or C++ would be your only
| option 15 years ago.
|
| > Perhaps you didn't mean to say so, but Rust is not a
| managed language (that's a .NET term referring to C#, F#,
| etc.).
|
| I know, but Java, Lisp and so on _are_ managed languages,
| and they offer both safety _and_ convenience. Rust only
| offers safety, other than that it is only marginally more
| convenient than C and some would argue less so.
|
| > Me and other Rust users are obviously trying to
| convince even more people to use the language, and that's
| because we are having a great time over here.
|
| Show, don't tell.
|
| > It's a very pleasant language with a pleasant community
| and a high level of technical expertise, and it allows me
| to get significantly closer to living up to my own
| ideals.
|
| Yes, but those are _your_ ideals, which don 't
| necessarily overlap with mine. I don't particularly care
| about one programming language or another, I've learned
| enough of them by now to know that _all_ of them have
| their limitations, their warts, their good bits and their
| bad bits. I also know that the size of the eco-system is
| a large function in whether or not I 'll be able to get
| through the day in a productive way.
|
| > I'm not making a moral argument here, trying to say
| that you or anyone is a bad person for not using Rust,
| but I am making a moral argument saying that denying the
| huge cost and risk associated with developing software in
| C and C++ is bullshit.
|
| See, your use of the word 'bullshit' triggers me in a way
| that you probably do not intend, but it is exactly that
| attitude that turns me off the language that you would
| like me to switch to. I don't particularly see that huge
| cost and risk as applied to myself because I'm not
| currently writing code that is going to be part of some
| network service. If I see an embedded shop doing their
| work in Rust then I'm happy because I can ignore at least
| one small aspect of the source of bugs in such software.
| But there are plenty remaining and Rust - no matter what
| you think - is not a silver bullet for all of the things
| that can go wrong with low level software. There are
| other, better alternatives for most of those
| applications, I'd be more inclined to use Java or Erlang
| if those are available, and Go if they are not. The speed
| at which I can develop software is a massive factor in
| that whole 'cost' evaluation for me.
|
| > The point here is that, until Rust came along, you had
| the choice between wildly risky (but fast) C and C++
| code, or completely safe (but slow) garbage collected
| languages with heavy runtimes and significant deployment
| challenges.
|
| That just isn't true. There are more languages besides
| Rust that allow for low level and fast work. Go for
| instance is an excellent contender. And for long running
| processes Java is excellent, it is approaching C levels
| of throughput and excels at networked services.
|
| > C is certainly not "the enemy" - I never said that, and
| I wouldn't. But that old world is gone.
|
| Sorry, but this is not a realistic stance. That old world
| is not gone, and it is likely here to stay for many more
| decades. There is so much inertia here in terms of
| invested capital that you can't just make declarations
| like these and expect to be taken serious.
|
| > The excuse of picking risky, problem-riddled languages
| that we know are associated with extreme costs for
| reasons of performance no longer has any technical merit.
| There can be other reasons, but this isn't it.
|
| Do you realize that this is just your opinion and not a
| statement of fact?
|
| > It's insane to me that anyone would limit themselves to
| a single language.
|
| 'Insane' is another very loaded word. Is this really the
| kind of language you want to be using while advocating
| for Rust? There are many programmers that learn one eco
| system well enough to carve out a career for themselves,
| and I'm not going to be the one to judge them for that.
| I'm not one of them, but I can see how it happens and I
| would definitely not label everybody that's not a
| polyglot as not entirely right in the head.
|
| > Every competent programmer I know knows at least a
| handful.
|
| I know some _very_ competent programmers that only know
| one. But they know that one better than I know any of the
| ones that I 'm familiar with. For instance, I know a guy
| that decided early on that if nobody wants to work on
| COBOL projects that that is exactly what he's going to
| do: become a world class expert in COBOL to help maintain
| all that old stuff. At a price. He's making very good
| money with that, far more than he'd have ever made by
| going with something more popular. I know plenty of Java
| only programmers and a couple that have decided that
| python is all they need. That's _their right_ and it isn
| 't up to me to look down on them or call them incompetent
| because they can do something that I apparently can't:
| focus, and get really good at one thing.
|
| > Why are we worried about this? I'm a decent C
| programmer, and a very good C++ programmer - better at
| both because I'm also fairly good at Rust.
|
| I would not label myself as 'very good' in any language,
| I always hope to get better and in spite of doing this
| for 4+ decades I have never felt that I was 'good
| enough'.
|
| > "P[sic]obody's nerfect." I'm sorry, I really dislike
| this attitude.
|
| Again, why the antagonism. We have many different classes
| of issues, and depending on the context some of them may
| not be a problem at all. I've built stuff in _JavaScript_
| because it was the most suitable for the job. But I stay
| the hell away from node and anything associated with it
| because I don 't consider myself qualified to audit all
| of the code that could be pulled in through a dependency.
| And that's a good chunk of this: just know your
| limitations, and realize that not just 'nobody's perfect'
| but also that _you yourself_ are not perfect and more
| than likely to mess up when you go into territory that is
| unfamiliar to you.
|
| > We can't let the fact that security is hard, or that
| perfection is unattainable, be an excuse to deliver more
| crap.
|
| Ok. So now you are labeling what other people produce as
| 'crap'. This isn't helping.
|
| > Again, that's not my argument. My argument is that you
| should be honest about what the actual costs, or
| alternatively the actual quality.
|
| So I'm not honest. If you are wondering what I meant when
| I wrote earlier that it is the attitude of some of the
| Rust advocates that turns me off then here in this thread
| you have a very nice example of that. All of this
| pontification and emotionally laden language serves
| nobody, least of all Rust.
|
| If you want to win people over try the following:
|
| - refrain from insulting your target audience
|
| - respect the fact that your opinions are just that
|
| - understand that there may be factors outside of your
| view that are part of the decision making process
|
| - understand that you may not have a complete
| understanding of the problem domain or the restrictions
| involved (is a variation on the previous one)
|
| - try to not use emotional language to make your point
|
| - showing beats telling any day of the week
| simonask wrote:
| I don't have time to respond to all of this, but let me
| just say that you seem to be under the impression that it
| is somehow my responsibility to "win you over" and
| convince you to use Rust. I have stated very clearly that
| that's not my point. My point is that we should all stop
| lying about the actual cost of delivering reliable
| software written in C or C++, and in particular that we
| as an industry _need_ to stop downplaying the
| consequences of things like UB.
|
| Are _you personally_ doing any of those things? I don 't
| know, and I don't think I have accused you of that.
|
| I'm not here to swoon you by sweet-talking you into using
| a different programming language. All this "show don't
| tell" - what are you talking about? Do you need real-
| world examples of successful Rust projects? There's a
| myriad of impressive ones, but you are fully capable of
| googling that.
|
| I'm not a representative of Rust the language (how could
| I be), and I reserve the right to call out moral
| corruption as I see it. I frankly do not need any "well-
| meaning" advice about how best to advocate for Rust -
| that's not my job.
| jacquesm wrote:
| > I'm not a representative of Rust the language (how
| could I be), and I reserve the right to call out moral
| corruption as I see it. I frankly do not need any "well-
| meaning" advice about how best to advocate for Rust -
| that's not my job.
|
| Whether you realize it or not, you are an advocate and
| you are doing a very, very poor job of it.
| pjmlp wrote:
| > The point here is that, until Rust came along, you had
| the choice between wildly risky (but fast) C and C++
| code, or completely safe (but slow) garbage collected
| languages with heavy runtimes and significant deployment
| challenges.
|
| Not really, I have been mostly coding in managed
| languages for the last couple of decades, and this has
| been not really true for quite some time.
|
| Yes if we go down language benchmark games, they won't
| win every little micro benchmark, however for like 99% of
| commercial use cases, what they deliver is fast enough
| for project requirements in execution time, and hardware
| resources.
|
| Now where they fail is in human perception and urban
| myths, of where they are suitable to be adopted.
|
| Languages like Rust overcome this, with their type system
| approach to resource management, the naysayers have run
| out of excuses.
| simonask wrote:
| I think you are pointing out that garbage collected
| languages can be very fast, right? I agree about that,
| but it does fundamentally comes with some very big
| caveats.
|
| There's a huge number of use cases that are perfectly
| served by GC languages, even where performance matters,
| but there's also a huge number that benefit from the
| extra boost and significantly lower memory usage of a
| compiled language.
| pjmlp wrote:
| There are plenty of compiled languages with GC, value
| types and low level programming capabilities, including
| playing with pointers C style.
|
| D, C#, Nim, Swift, Go for mainstream examples.
|
| If we dive into less successful attempts from the past,
|
| Cedar, Modula-2+, Modula-3, Oberon, Oberon-2, Active
| Oberon, Component Pascal, Oberon-07, Spec#, System C#
| among plenty others that are probably listed on ACM
| SIGPLAN list of papers.
|
| As for some commercial examples,
|
| https://www.withsecure.com/en/solutions/innovative-
| security-...
|
| https://dlang.org/blog/2018/12/04/interview-liran-zvibel-
| of-...
|
| https://www.wildernesslabs.co/
|
| https://www.astrobe.com/boards.htm
| pjmlp wrote:
| Thankfully the new cybersecurity laws will help here,
| when companies map production costs to languages, the
| needle will keep moving away from those that tank
| security budgets.
| jacquesm wrote:
| I was actually hoping for far more strict enforcement but
| so far they're taking it relatively easy.
| pjmlp wrote:
| Indeed, however better slowly than nothing at all.
| goalieca wrote:
| CVE are important but there's also a lot of theatre
| there. How many are known exploitable? Most aren't if you
| follow threat intel. Most of the Internet infrastructure
| is running c/c++ and is very safe.
| simonask wrote:
| It's fine to have a sober view of the severity, but we
| can hopefully agree in general that writing any program
| in C or C++ that faces the internet requires _extreme_
| caution.
| goalieca wrote:
| I think anything that faces the internet needs extreme
| caution. I've done enough pentesting myself to see that
| mistakes are abound and most of them are logic problems.
| pjmlp wrote:
| Except any good chef or butcher knows that they should be
| wearing protective gloves when using sharp knifes.
|
| > Cut-resistant gloves are an essential piece of safety
| equipment in any kitchen.
|
| https://www.restaurantware.com/blogs/smallwares/how-to-
| choos...
|
| Where are C's gloves?
| uecker wrote:
| GCC's sanitizer does not catch unsigned wraparound. But the
| bigger problem is that a lot of code is written where it
| assumes that unsigned wraps around and this is ok. So you
| you would use a sanitizer you get a lot of false positives.
| For signed overflow, one can always consider this a bug in
| portable C.
|
| Of course, if you consistently treat unsigned wraparound as
| a bug in your code, you can also use a sanitizer to screen
| for it. But in general I find it more practical to use
| signed integers for everything except for modular
| arithmetic where I use unsigned (and where wraparound is
| then expected and not a bug)
| messe wrote:
| I meant implicit casting, but I guess that really falls under
| promotion in most cases where it's relevant here (I'm on a
| train from Aarhus to Copenhagen right now to catch a flight,
| and I've slept considerably less than usual, so apologies if
| I'm making some slight mistakes).
|
| The issues really arise when you mix signed/unsigned
| arithmetic and end up promoting everything to signed
| unexpectedly. That's usually "okay", as long as you're not
| doing arithmetic on anything smaller than an int.
|
| As an aside, if you like C enough to have opinions on
| promotion rules then you might enjoy the programming language
| Zig. It's around the same level as C, but with much nicer
| ergonomics, and overflow traps by default in
| Debug/ReleaseSafe optimization modes. If you want explicit
| two's complement overflow it has +%, *% and -% variants of
| the usual arithmetic operations, as well as saturating +|,
| *|, -| variants that clamp to [minInt(T), maxInt(T)].
|
| EDIT to the aside: it's also true if you hate C enough to
| have opinions on promotion rules.
| jacquesm wrote:
| Yes, this is one of the more subtle pitfalls of C. What
| helps is that in most contexts the value of 2 billion is
| large enough that a wraparound would be noticed almost
| immediately. But when it isn't then it can lead to very
| subtle errors that can propagate for a long time before
| anything goes off the rails that is noticed.
| uecker wrote:
| I prefer C to Zig. IMHO all the successor languages throw
| out the baby with the bathwater and add unnecessary
| complexity. But Zig is much better than Rust, but, still, I
| would never use it for a serious project.
|
| The "promoting unexpectedly" is something I do not think
| happens if you know C well. At least, I can't remember ever
| having a bug because of this. In most cases the promotion
| prevents you from having a bug, because you do not get
| unexpected overflow or wraparound because your type is too
| small.
|
| Mixing signed and unsigned is problematic, but I see issues
| mostly in code from people who think they need to use
| unsigned when they shouldn't because they heard signed
| integers are dangerous. Recently I saw somebody "upgrading"
| a C code basis to C++ and also changing all loop variables
| to size_t. This caused a bug which he blamed on working on
| the "legacy C code" he is working on, although the original
| code was just fine. In general, there are compiler warnings
| that should catch issues with sign for conversions.
| lelanthran wrote:
| > Recently I saw somebody "upgrading" a C code basis to
| C++ and also changing all loop variables to size_t. This
| caused a bug which he blamed on working on the "legacy C
| code" he is working on, although the original code was
| just fine.
|
| I had the same experience about 10 years back when a
| colleague "upgrade" code from using size_t to `int`; on
| that platform (ATMEGA or XMEGA, not too sure now) `int`
| was too small, overflowed and bad stuff happened in the
| field.
|
| The only takeaway is "don't needlessly change the size
| and sign of existing integer variables".
| uecker wrote:
| I don't think this is the only takeway. My point is that
| you can reliably identify signed integer overflow using
| sanitizers and you can also reliably mitigate related
| attacks by trapping for signed integer overflow (it still
| may be a DoS, but you can stop more serious harm). Both
| does not work with unsigned types except in a tightly
| controlled project where you treat unsigned wraparound as
| a bug, but this fails the moment you introduce other
| idiomatic C code that does not follow this.
| Sukera wrote:
| Could you expand on how these wraparound bugs happen in Rust?
| As far as I know, integer overflow panics (i.e. aborts) your
| code when compiled in debug mode, which I think is often used
| for testing.
| 01HNNWZ0MV43FF wrote:
| > That some people make mistakes because they do not know
| them is a different problem.
|
| We can argue til we're blue in the face that people should
| just not make any mistakes, but history is against us -
| People will always make mistakes.
|
| That's why surgeons are supposed to follow checklists and
| count their sponges in and out
| bringbart wrote:
| >while unsigned wraparound leads to bugs which are basically
| impossible to find.
|
| What?
|
| unsigned sizes are way easier to check, you just need one
| invariant:
|
| if(x < capacity) // good to go
|
| Always works, regardless how x is calculated and you never
| have to worry about undefined behavior when computing x. And
| the same invariant is used for forward and backward loops -
| some people bring up i >= 0 as a problem with unsigned, but
| that's because you should use i < n for backward loops as
| well, The One True Invariant.
| user____name wrote:
| I just put assertions to check the ranges of all sizes and
| indices upon function entry, doubles as documentation, and I
| mostly don't have to worry about signedness as a result.
| kstenerud wrote:
| Yup, unsigned math is just nasty.
|
| Actually, unchecked math on an integer is going to be bad
| regardless of whether it's signed or unsigned. The difference
| is that with signed integers, your sanity check is simple and
| always the same and requires no thought for edge cases:
| `if(index < 0 || index > max)`. Plus ubsan, as mentioned above.
|
| My policy is: Always use signed, unless you have a specific
| reason to use unsigned (such as memory addresses).
| bringbart wrote:
| unsigned is easier: 'if(index >= max)' and has fewer edge
| cases because you don't need to worry about undefined
| behavior when computing index.
| lelanthran wrote:
| > The difference is that with signed integers, your sanity
| check is simple and always the same and requires no thought
| for edge cases: `if(index < 0 || index > max)`
|
| Wait, what? How is that easier than `if (index > max)`?
| kstenerud wrote:
| Because if max is a calculated value, it could silently
| wrap around and leave index to cause a buffer overflow.
|
| Or if index is counting down, a calculated index could
| silently wrap around and cause the same issue.
|
| And if both are calculated and wrap around, you'll have fun
| debugging spooky action at a distance!
|
| If both are signed, that won't happen. You probably do have
| a bug if max or index is calculated to a negative value,
| but it's likely not an exploitable one.
| 1718627440 wrote:
| I have no clue what cases you have in mind, can you give
| some examples? Surely when you have index as unsigned the
| maximum would be represented unsigned as well?
| accelbred wrote:
| If using C23, _BitInt allows for integer types without
| promotion.
| bluetomcat wrote:
| Good C code will try to avoid allocations as much as possible in
| the first place. You absolutely don't need to copy strings around
| when handling a request. You can read data from the socket in a
| fixed-size buffer, do all the processing in-place, and then
| process the next chunk in-place too. You get predictable
| performance and the thing will work like precise clockwork.
| Reading the entire thing just to copy the body of the request in
| another location makes no sense. Most of the "nice" javaesque
| XXXParser, XXXBuilder, XXXManager abstractions seen in "easier"
| languages make little sense in C. They obfuscate what really
| needs to happen in memory to solve a problem efficiently.
| 01HNNWZ0MV43FF wrote:
| Can you do parsing of JSON and XML without allocating?
| bluetomcat wrote:
| Yes, you can do it with minimal allocations - provided that
| the source buffer is read-only or is mutable but is unused
| later directly by the caller. If the buffer is mutable, any
| un-escaping can be done in-place because the un-escaped
| string will always be shorter. All the substrings you want
| are already in the source buffer. You just need a growable
| array of pointer/length pairs to know where tokens start.
| gritzko wrote:
| Yep, no problem. In place parsing only requires a stack.
| Stack length is the maximum JSON nesting allowed. I have a C
| dialect exactly like that.
| veqq wrote:
| Of course. You can do it in a single pass/just parse the
| token stream. There are various implementations like:
| https://zserge.com/jsmn/
| andrepd wrote:
| It requires manual allocation of an array of tokens. So it
| needs a backing "stack vector" of sorts.
|
| And what about escapes?
| Ygg2 wrote:
| Theoretically yes. Practically there is character escaping.
|
| That kills any non-allocation dreams. Moment you have "Hi
| \uxxxx isn't the UTF nice?" you will probably have to
| allocate. If source is read-only you have to allocate. If
| source is mutable you have to waste CPU to rewrite the
| string.
| lelanthran wrote:
| > Moment you have "Hi \uxxxx isn't the UTF nice?" you will
| probably have to allocate.
|
| Depends on what you are doing with it. If you aren't
| displaying it (and typically you are not in a server
| application), you don't _need_ to unescape it.
| mpyne wrote:
| And this is indeed something that the C++ Glaze library
| supports, to allow for parsing into a string_view
| pointing into the original input buffer.
| deaddodo wrote:
| I'm confused why this would be a problem. UTF-8 and UTF-16
| (the only two common unicode subsets) are a maximum of 4
| bytes wide (and, most commonly, 2 in English text). The
| ASCII representation you gave is 6-bytes wide. I don't know
| of many ASCII unicode representations that have less
| bytewidth than their native Unicode representation.
|
| Same goes for other characters such as \n, \0, \t, \r, etc.
| All half in native byte representation.
| topspin wrote:
| > Practically there is character escaping
|
| The voice of experience appears. Upvoted.
|
| It is conceivable to deal with escaping in-place, and thus
| remain zero-alloc. It's hideous to think about, but I'll
| bet someone has done it. Dreams are powerful things.
| _3u10 wrote:
| It's just two pointers the current place to write and the
| current place to read, escapes are always more characters
| than they represent so there's no danger of overwriting the
| read pointer. If you support compression this can become
| somewhat of and issue but you simply support a max block
| size which is usually defined by the compression algorithm
| anyway.
| Ygg2 wrote:
| If you have a place to write, then it's not zero
| allocation. You did an allocation.
|
| And usually if you want maximum performance, buffered
| read is the way to go, which means you need a write slab
| allocation.
| lelanthran wrote:
| > If you have a place to write, then it's not zero
| allocation. You did an allocation.
|
| Where did that allocation happen? You can write into the
| buffer you're reading from, because the replacement data
| is shorter than the original data.
| lelanthran wrote:
| > Can you do parsing of JSON and XML without allocating?
|
| If the source JSON/XML is in a writeable buffer, with some
| helper functions you can do it. I've done it for a few small-
| memory systems.
| zzo38computer wrote:
| It depends what you intend to do with the parsed data, and
| where the input comes from and where the output will be going
| to. There are situations that allocations can be reduced or
| avoided, but that is not all of them. (In some cases, you do
| not need full parsing, e.g. to split an array, you can check
| if it is a string or not and the nesting level, and then find
| the commas outside of any arrays other than the first one, to
| be split.) (If the input is in memory, then you can also
| consider if you can modify that memory for parsing, which is
| sometimes suitable but sometimes not.)
|
| However, for many applications, it will be better to use a
| binary format (or in some cases, a different text format)
| rather than JSON or XML.
|
| (For the PostScript binary format, there is no escaping, and
| the structure does not need to be parsed and converted ahead
| of time; items in an array are consecutive and fixed size,
| and data it references (strings and other arrays) is given by
| an offset, so you can avoid most of the parsing. However,
| note that key/value lists in PostScript binary format is
| nonstandard (even though PostScript does have that type, it
| does not have a standard representation in the binary object
| format), and that PostScript has a better string type than
| JavaScript but a worse numeric type than JavaScript.)
| megous wrote:
| Yes, you can first validate the buffer, to know it contains
| valid JSON, and then you can work with pointers to beginings
| of individual syntactic parts of JSON, and have functions
| that decide what type of the current element is, or move to
| the next element, etc. Even string work (comparisons with
| other escaped or unescaped strings, etc.) can be done on
| escaped strings directly without unescaping them to a buffer
| first.
|
| Ergonomically, it's pretty much the same as parsing the JSON
| into some AST first, and then working on the AST. And it can
| be much faster than dumb parsers that use malloc for
| individual AST elements.
|
| You can even do JSON path queries on top of this, without
| allocations.
|
| Eg. https://xff.cz/git/megatools/tree/lib/sjson.c
| acidx wrote:
| Yes! The JSON library I wrote for the Zephyr RTOS does this.
| Say, for instance, you have the following struct:
| struct SomeStruct { char *some_string;
| int some_number; };
|
| You would need to declare a descriptor, linking each field to
| how it's spelled in the JSON (e.g. the some_string member
| could be "some-string" in the JSON), the byte offset from the
| beginning of the struct where the field is (using the
| offsetof() macro), and the type.
|
| The parser is then able to go through the JSON, and
| initialize the struct directly, as if you had reflection in
| the language. It'll validate the types as well. All this
| without having to allocate a node type, perform copies, or
| things like that.
|
| This approach has its limitations, but it's pretty efficient
| -- and safe!
|
| Someone wrote a nice blog post about (and even a video) it a
| while back: https://blog.golioth.io/how-to-parse-json-data-
| in-zephyr/
|
| The opposite is true, too -- you can use the same descriptor
| to serialize a struct back to JSON.
|
| I've been maintaining it outside Zephyr for a while, although
| with different constraints (I'm not using it for an embedded
| system where memory is golden): https://github.com/lpereira/l
| wan/blob/master/src/samples/tec...
| lock1 wrote:
| Why does "good" C have to be zero alloc? Why should "nice"
| javaesque make little sense in C? Why do you implicitly assume
| performance is "efficient problem solving"?
|
| Not sure why many people seem fixated on the idea that using a
| programming language must follow a particular approach. You can
| do minimal alloc Java, you can simulate OOP-like in C, etc.
|
| Unconventional, but why do we need to restrict certain
| optimizations (space/time perf, "readability", conciseness,
| etc) to only a particular language?
| bluetomcat wrote:
| Because in C, every allocation incurs a responsibility to
| track its lifetime and to know who will eventually free it.
| Copying and moving buffers is also prone to overflows, off-
| by-one errors, etc. The generic memory allocator is a smart
| but unpredictable complex beast that lives in your address
| space and can mess your CPU cache, can introduce undesired
| memory fragmentation, etc.
|
| In Java, you don't care because the GC cleans after you and
| you don't usually care about millisecond-grade performance.
| jstimpfle wrote:
| No. Look up Arenas. In general group allocations to avoid
| making a mess.
| rictic wrote:
| If you send a task off to a work queue in another thread,
| and then do some local processing on it, you can't
| usually use a single Arena, unless the work queue itself
| is short lived.
| jenadine wrote:
| I don't see how arenas solve the problems.
| jstimpfle wrote:
| You group things from the same context together, so you
| can free everything in a single call.
| estimator7292 wrote:
| No. Arenas are not a general case solution. Look it up
| lelanthran wrote:
| > Why does "good" C have to be zero alloc?
|
| GP didn't say "zero-alloc", but "minimal alloc"
|
| > Why should "nice" javaesque make little sense in C?
|
| There's little to no indirection in idiomatic C compared with
| idiomatic Java.
|
| Of course, in both languages you can write unidiomatically,
| but that is a great way to ensure that bugs get in and never
| get out.
| lock1 wrote:
| > Of course, in both languages you can write
| unidiomatically, but that is a great way to ensure that
| bugs get in and never get out.
|
| Why does "unidiomatic" have to imply "buggy" code? You're
| basically saying an unidiomatic approach is doomed to
| introduce bugs and will never reduce them.
|
| It sounds weird. If I write Python code with minimal side
| effects like in Haskell, wouldn't it at least reduce the
| possibility of side-effect bugs even though it wasn't
| "Pythonic"?
|
| AFAIK, nothing in the language standard mentions anything
| about "idiomatic" or "this is the only correct way to use
| X". The definition of "idiomatic X" is not as clear-cut and
| well-defined as you might think.
|
| I agree there's a risk with an unidiomatic approach.
| Irresponsibly applying "cool new things" is a good way to
| destroy "readability" while gaining almost nothing.
|
| Anyway, my point is that there's no single definition of
| "good" that covers everything, and "idiomatic" is just
| whatever convention a particular community is used to.
|
| There's nothing wrong with applying an "unidiomatic"
| mindset like awareness of stack/heap alloc, CPU cache
| lines, SIMD, static/dynamic dispatch, etc in languages like
| Java, Python, or whatever.
|
| There's nothing wrong either with borrowing ideas like
| (Haskell) functor, hierarchical namespaces, visibility
| modifiers, borrow checking, dynamic dispatch, etc in C.
|
| Whether it's "good" or not is left as an exercise for the
| reader.
| lelanthran wrote:
| > Why does "unidiomatic" have to imply "buggy" code?
|
| Because when you stray from idioms you're going off down
| unfamiliar paths. All languages have better support for
| specific idioms. Trying to pound a square peg into a
| round hole _can_ work, but is unlikely to work well.
|
| > You're basically saying an unidiomatic approach is
| doomed to introduce bugs and will never reduce them.
|
| Well, yes. Who's going to reduce them? Where are you
| planning to find people who are used to code written in
| an unusual manner?
|
| By definition alone, code is written for humans to read.
| If you're writing it in a way that's difficult for humans
| to read, then _of course_ the bug level can only go up
| and not down.
|
| > It sounds weird. If I write Python code with minimal
| side effects like in Haskell, wouldn't it at least reduce
| the possibility of side-effect bugs even though it wasn't
| "Pythonic"?
|
| "Pythonic" does not mean the same thing as "Idiomatic
| code in Python".
| codr7 wrote:
| In C, direct memory control is the top feature, which means
| you can assume anyone who uses your code is going to want
| to control memory through the process. This means not
| allocating from wherever and returning blobs of memory,
| which means designing different APIs, which is part of the
| reason why learning C well takes so long.
|
| I started writing sort of a style guide to C a while ago,
| which attempts to transfer ideas like this one more by
| example:
|
| https://github.com/codr7/hacktical-c
| jabits wrote:
| Thanks for sharing this work.
| nxobject wrote:
| Echoing my sibling comment - thanks for sharing this.
| cogman10 wrote:
| > Why should "nice" javaesque make little sense in C?
|
| Very importantly, because Java is tracking the memory.
|
| In java, you could create an item, send it into a queue to be
| processed concurrently, but then also deal with that item
| where you created it. That creates a huge problem in C
| because the question becomes "who frees that item"?
|
| In java, you don't care. The freeing is done automatically
| when nobody references the item.
|
| In C, it's a big headache. The concurrent consumer can't free
| the memory because the producer might not be done with it.
| And the producer can't free the memory because the consumer
| might not have ran yet. In idiomatic java, you just have to
| make sure your queue is safe to use concurrently. The right
| thing to do in C would be to restructure things to ensure the
| item isn't used before it's handed off to the queue or that
| you send a copy of the item into the queue so the question of
| "who frees this" is straight forward. You can do both
| approaches in java, but why would you? If the item is
| immutable there's no harm in simply sharing the reference
| with 100 things and moving forward.
|
| In C++ and Rust, you'd likely wrap that item in some sort of
| atomic reference counted structure.
| estimator7292 wrote:
| Good C has minimal allocations because you, the human, are
| the memory allocator. It's up to your own meat brain to
| correctly track memory allocation and deallocation. Over the
| last century, C programmers have converged on some best
| practices to manage this more effectively. We statically
| allocate, kick allocations up the call chain as far as
| possible. Anything to get that bit of tracked state out of
| your head.
|
| But we use different approaches for different languages
| because those languages _are designed for that approach_. You
| _can_ do OOP in C and you _can_ do manual memory management
| in C#. Most people don 't because it's unnecessarily
| difficult to use languages in a way they aren't designed for.
| Plus when you re-invent a wheel like "classes" you _will_
| inevitably introduce a bug you wouldn 't have if you'd used a
| language with proper support for that construct. You _can_
| use a hammer to pull out a screw, but you 'd do a much better
| job if you used a screwdriver instead.
|
| Programming languages are not all created equal and are
| _absolutely not_ interchangeable. A language is much, much
| more than the text and grammar. The entire reason we have
| different languages is because we needed different ways to
| express certain classes of problems and constructs that go
| way beyond textual representation.
|
| For example, in a strictly typed OOP language like C#,
| classes are hideously complex under the hood. Miles and miles
| of code to handle vtables, inheritance, polymorphism,
| virtual, abstract functions and fields. To implement this in
| C would require effort _far_ beyond what any single
| programmer can produce in a reasonable time. Similarly, I 'm
| sure one _could_ force JavaScript to use a very strict typing
| and generics system like C#, but again the effort would be
| enormous and guaranteed to have many bugs.
|
| We use different languages in different ways because they're
| different and work differently. You're asking why everyone
| twists their screwdrivers into screws instead of using the
| back end to pound a nail. Different tools, different uses.
| lelanthran wrote:
| > Good C code will try to avoid allocations as much as possible
| in the first place.
|
| I've upvoted you, but I'm not so sure I agree though.
|
| Sure, each allocation imposes a new obligation to track that
| allocation, but on the downside, passing around already-
| allocated blocks imposes a new burden for each call to ensure
| that the callees have the correct permissions (modify it,
| reallocate it, free it, etc).
|
| If you're doing any sort of concurrency this can be hard to
| track - sometimes it's easier to simply allocate a new block
| and _give_ it to the callee, and then the caller can forget all
| about it (callee then has the obligation to free it).
| 1718627440 wrote:
| To reduce the amount of allocation instead of:
| struct parsed_data * = parse (...); struct
| process_data * = process (..., parsed_data); struct
| foo_data * = do_foo (..., process_data);
|
| you can do parse (...) { ...
| process (...); ... } process
| (...) { ... do_foo (...);
| ... }
|
| It sounds like violating separation of concerns at first, but
| it has the benefit, that you can easily do procession and
| parsing in parallel, and all the data can become readonly.
| Also I was impressed when I looked at a call graph of this,
| since this essentially becomes the documentation of the whole
| program.
| ambicapter wrote:
| How testable is this, though?
| 1718627440 wrote:
| It might be a problem when you can't afford side-effects
| that you later throw away, but I haven't experienced that
| yet. The functions still have return codes, so you still
| can test, whether a correct input results in no error
| check being followed and that incorrect input results in
| an error check being triggered.
| throwawaymaths wrote:
| is there _any_ system where doing the basics of http
| (everything up to framework handoff of structured data) are
| done outside of a single concurrency unit?
| obviouslynotme wrote:
| The most important pattern to learn in C is to allocate a
| giant arena upfront and reuse it over and over in a loop.
| Ideally, there is only one allocation and deallocation in the
| entire program. As with all things multi-threaded, this
| becomes trickier. Luckily, web servers are embarrassingly
| parallel, so you can just have an arena for each worker
| thread. Unluckily, web servers do a large amount of string
| processing, so you have to be careful in how you build them
| to prevent the memory requirements from exploding. As always,
| tradeoffs can and will be made depending on what you are
| actually doing.
|
| Short-run programs are even easier. You just never deallocate
| and then exit(0).
| adrianN wrote:
| Arenas are a nice tool, but they don't work for all use
| cases. In the limit you're reimplementing malloc on top of
| your big chunk of memory.
| galangalalgol wrote:
| Most games have to do this for performance reasons at
| some point and there are plenty of variants to choose
| from. Rust has libraries for some of them, but in c
| rolling it yourself is the idiom. One I used in c++ and
| worked well as a retrofit was to overload new to grab the
| smallest chunk that would fit the allocation from banks
| of them. Profiling under load let the sizes of the banks
| be tuned for efficiency. Nothing had to know it wasn't a
| real heap allocation, but it was way faster and with zero
| possibility of memory fragmentation.
| lifthrasiir wrote:
| Most pre-2010 games had to. As a prior gamedev after that
| period I can confidently say that it is a relic of the
| past in most cases now. (Not like that I don't care, but
| I don't have to be _that_ strict about allocations.)
| card_zero wrote:
| Because why?
| lifthrasiir wrote:
| Probably because hardwares became powerful enough that
| you can make a performant game without thinking much
| about allocations.
| user____name wrote:
| Virtual memory gets rid of a lot of fragmentation issues.
| galangalalgol wrote:
| Yeah. Fragmentation was a niche concern of that embedded
| use case. It had an mmu, just wasn't used by the rtos. I
| am surprised that allocations aren't a major hitter
| anymore. I still have to minimize/eliminate them in linux
| signal processing code to stay realtime.
| juped wrote:
| The normal practical version of this advice that isn't a
| "guy who just read about arenas post" is that you
| generally kick allocations outward; the caller allocates.
| lelanthran wrote:
| They don't work for all use-cases, but they most
| certainly work for this use-case (HTTP server).
| bheadmaster wrote:
| > Ideally, there is only one allocation and deallocation in
| the entire program.
|
| Doesn't this techically happen with most of the modern
| allocators? They do a lot of work to avoid having to
| request new memory from the kernel as much as possible.
| Asmod4n wrote:
| last time i checked, the glibc allocator doesnt ask the
| OS that often for new heap memory.
|
| Like, every ~thousand malloc calls invoked (s)brk and
| that was it.
| lelanthran wrote:
| I agree, which is why I wrote an arena allocator library I
| use (somewhere on github, probably public and free).
| card_zero wrote:
| > there is only one allocation and deallocation in the
| entire program.
|
| > Short-run programs are even easier. You just never
| deallocate and then exit(0).
|
| What's special about "short-run"? If you deallocate only
| once, presumably just before you exit, then why do it at
| all?
| free_bip wrote:
| Just because there's only one deallocation doesn't mean
| it's run only once. It would likely be run once every
| time the thread it belongs to is deallocated, like when
| it's finished processing a request.
| fulafel wrote:
| This shared memory and pointer shuffling is of course fraught
| with requiring correct logic to avoid memory safety bugs. Good
| C code doesn't get you pwned, I'd argue.
| jenadine wrote:
| > Good C code doesn't get you pwned, I'd argue.
|
| This is not a serious argument because you don't really
| define good C code and how easy or practical it is to do. The
| sentence works for every language. "Good <whatever language>
| code doesn't get you pwned"
|
| But the question is whether "Average" or "Normal" C code gets
| you pwned? And the answer is yes, as told in the article.
| fulafel wrote:
| The comment I was responding to suggested Good C Code
| employes optimizations that, I opined, are more error prone
| wrt memory safety - so I was not attempting to define it,
| but challenging the offered characterisation.
| riedel wrote:
| A long time ago I was involved in building compilers. It was
| common that we solved this problem with obstacks, which are
| basically stacked heaps. I wonder one could not build more
| things like this, where freeing is a bit more best effort but
| you have some checkpoints. (I guess one would rather need tree
| like stacks) Just have to disallow pointers going the wrong
| way. Allocation remains ugly in C and I think explicit data
| structures are are definitely a better way of handling it.
| self_awareness wrote:
| That mythical "Good C Code", which is known only to some people
| who I never met.
| pjmlp wrote:
| These abstractions were already common in enterprise C code
| decades before Java came to be, thanks to stuff like Yourdon
| Structured Method.
|
| Using fixed size buffers doesn't fix out of bounds errors, and
| stack corruption caused by such bugs.
|
| Naturally we all know good C programmers never make them. /s
| jqpabc123 wrote:
| Reads like an indictment of vibe coding.
|
| LLMs are fundamentally probabilistic --- not deterministic.
|
| This basically means that anything produced this way is highly
| suspect. And this framework is an example.
| erichocean wrote:
| Give Fil-C a try, the speed hit is pretty minimal and you get
| full memory safety.
|
| https://fil-c.org/
| Karrot_Kream wrote:
| Wow this is really cool, I'd never seen this before. Thanks!
| adhamsalama wrote:
| Why isn't this used more?
| dang wrote:
| Recent and related:
|
| _Show HN: I built a web framework in C_ -
| https://news.ycombinator.com/item?id=45526890 - Oct 2025 (208
| comments)
| yipikaya wrote:
| As an aside, it's amusing that it took 25 years for C coders to
| embrace the C99 named struct designator feature:
| HttpParser parser = { .isValid = true,
| .requestBuffer = strdup(request), .requestLength =
| strlen(request), .position = 0, };
|
| All the kids are doing it now!
| 1718627440 wrote:
| This is nice for constant data, but strdup can return NULL
| here, which is again never checked.
|
| > it took 25 years for C coders to embrace the C99 named struct
| designator feature
|
| Not sure if this actually true, but this is kind of the feature
| of C, 20 years old code or compiler is supposed to work just
| fine, so you just wait for some time to settle things. For fast
| and shiny, there is Javascript.
| davemp wrote:
| I'm still regularly getting on projects and moving C89 variable
| declarations from the start of functions to where they're
| initialized, but I guess it's not the kids doing it.
| mkfs wrote:
| > C89 variable declarations from the start of functions
|
| Technically it's the start of a block.
| davemp wrote:
| Technically but I don't think folks ever really bothered.
| 1718627440 wrote:
| I only declare variables at the begin of a block, not because
| I would need C89 compatibility, but because I find it clearer
| to establish the working set of variables upfront. This
| doesn't restrict me in anyway, because I just start a new
| block, when I feel the need. I also try to keep the scope of
| a variable as small as possible.
| rurban wrote:
| It's only Microsoft's fault to have not implemented it for
| decades in MSVC. They stayed at C89 forever.
| flykespice wrote:
| I never understand the reason why Microsoft lagged so much
| behind on newer c standards adoption. Did their compiler
| infrastructure made it difficult to adopt newer standards
| flexible? Or they simply did not care?
| rurban wrote:
| They focused on C++ only. Management, not their devs.
| varjag wrote:
| Some of the most famous C codebases (e.g. the Linux kernel)
| been using them for some time.
| ge96 wrote:
| Long as you allocate me, it's alright
| acidx wrote:
| One thing to note, too, is that `atoi()` should be avoided as
| much as possible. On error (parse error, overflow, etc), it has
| an unspecified return value (!), although most libcs will return
| 0, which can be just as bad in some scenarios.
|
| Also not mentioned, is that atoi() can return a negative number
| -- which is then passed to malloc(), that takes a size_t, which
| is unsigned... which will make it become a very large number if a
| negative number is passed as its argument.
|
| It's better to use strtol(), but even that is a bit tricky to
| use, because it doesn't touch errno when there's no error but you
| need to check errno to know if things like overflow happened, so
| you need to set errno to 0 before calling the function. The man
| page explains how to use it properly.
|
| I think it would be a very interesting exercise for that web
| framework author to make its HTTP request parser go through a
| fuzz-tester; clang comes with one that's quite good and easy to
| use (https://llvm.org/docs/LibFuzzer.html), especially if used
| alongside address sanitizer or the undefined behavior sanitizer.
| Errors like the one I mentioned will most likely be found by a
| fuzzer really quickly. :)
| MathMonkeyMan wrote:
| Unspecified, really? cppreference's [C documentation][1] says
| that it returns zero. The [OpenGroup][2] documentation doesn't
| specify a return value when the conversion can't be performed.
| This recent [draft][3] of the ISO standard for C says that if
| the value cannot be represented (does that mean over/underflow,
| bad parse, both, neither?), then it's undefined behavior.
|
| So three references give three different answers.
|
| You could always use sscanf instead, which tells you how many
| values were scanned (e.g. zero or one).
|
| [1]: https://en.cppreference.com/w/c/string/byte/atoi.html
|
| [2]:
| https://pubs.opengroup.org/onlinepubs/9799919799/functions/a...
|
| [3]: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2310.pdf
| acidx wrote:
| The Linux man page (https://man7.org/linux/man-
| pages/man3/atoi.3.html#VERSIONS) says that POSIX.1 leaves it
| unspecified. As you found out, it's really something that
| should be avoided as much as possible, because pretty much
| everywhere disagrees how it should behave, especially if you
| value portability.
|
| sscanf() is not a good replacement either! It's better to use
| strtol() instead. Either do what Lwan does
| (https://github.com/lpereira/lwan/blob/master/src/lib/lwan-
| co...), or look (https://cvsweb.openbsd.org/src/lib/libc/stdl
| ib/strtonum.c?re...) at how OpenBSD implemented strtonum(3).
|
| For instance, if you try to parse a number that's preceded by
| a lot of spaces, sscanf() will take a long time going through
| it. I've been hit by that when fuzzing Lwan.
|
| Even cURL is avoiding sscanf():
| https://daniel.haxx.se/blog/2025/04/07/writing-c-for-curl/
| MathMonkeyMan wrote:
| If your use case can have C++, then [std::from_chars][1] is
| ideal. Here's gcc's [implementation][2]; a lot of it seems
| to be handling different bases.
|
| [1]:
| https://en.cppreference.com/w/cpp/utility/from_chars.html
|
| [2]: https://github.com/gcc-
| mirror/gcc/blob/461fa63908b5bb1a44f12...
| AdieuToLogic wrote:
| While the classic "Parse, don't validate"[0] paper uses Haskell
| instead of C as its illustrative programming language, the
| approach detailed is very much applicable in these scenarios.
|
| 0 - https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-
| va...
| lelanthran wrote:
| > While the classic "Parse, don't validate"[0] paper uses
| Haskell instead of C as its illustrative programming language,
| the approach detailed is very much applicable in these
| scenarios.
|
| Good thing someone (i.e. _me_ ) took the time to demonstrate
| PdV in C: https://www.lelanthran.com/chap13/content.html
| nxobject wrote:
| I appreciate that link - now I see the parallels between
| "consolidate allocation in C to the extent that the rest of
| your code doesn't have to worry", and "consolidate validation
| in C" to the extent that...".
| pizlonator wrote:
| Just compile it with Fil-C
| kazinator wrote:
| I definitely don't love C that does atoi on a Content-Length
| value that came from the network and passes that to malloc.
|
| Even before we get to how a malicious would interact with malloc,
| there is this:
|
| > The functions atof, atoi, atol, and atoll are not required to
| affect the value of the integer expression errno on an error. If
| the value of the result cannot be represented, the behavior is
| undefined. [ISO C N3220 draft]
|
| That includes not only out-of-range values by garbage that cannot
| be converted to a number at all. atoi("foo") can behave in any
| manner whatsoever and return anything.
|
| Those functions are okay to use on something that has been
| validated in a way that it cannot cause a problem. If you know
| you have a nonempty sequence of nothing but digits, possibly with
| a minus sign, and the number digits is small enough that the
| value will fit into int, you are okay.
|
| > A malicious user can pass Content-Length of 4294967295
|
| But why would they when it's fewer keystrokes to use -1, which
| will go to 4294967295 on a 32 bit malloc, while scaling to
| 18446744073709551615 on 64 bit?
___________________________________________________________________
(page generated 2025-10-11 23:02 UTC)