hngopher.com

       [HN Gopher] Some __nonstring__ Turbulence
       ___________________________________________________________________
        
       Some __nonstring__ Turbulence
        
       Author : jwilk
       Score  : 121 points
       Date   : 2025-04-25 06:46 UTC (16 hours ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | jey wrote:
       | That the annotation applies to variables and not types is surely
       | an oversight or mistake right? Seems like it could have been
       | easier to initially implement that way but it just doesn't seem
       | to fit with how C type system works. (Yes it will make
       | declarations uglier to do it on types but that ship has sailed
       | long ago; see cdecl.org)
        
         | leni536 wrote:
         | I either don't understand how the annotation would work on
         | types, or what would be gained by it. What type would be
         | annotated? A typedef to char[]?
         | 
         | edit: Unless what they actually mean is annotating struct
         | _members_ , that would actually make sense.
        
           | _nalply wrote:
           | I do understand.
           | 
           | I imagine that it could work a little bit like unsigned: a
           | modifier to integer types that tells that an integer's MSB is
           | not to be used as a sign bit.
           | 
           | __nonstring__ tells that the last byte of a byte sequence
           | doesn't need to be NUL.
           | 
           | I would find it sensible allowing putting the attribute to a
           | type, but whatever.
        
             | leni536 wrote:
             | But that doesn't make any difference in the way you have to
             | address existing `char arr[4] = "abcd"` declarations.
        
               | _nalply wrote:
               | True.
               | 
               | This would be only useful in typedefs. An API could
               | declare some byte arrays not strings. But again,
               | whatever.
        
         | rurban wrote:
         | And how would you type a string vs byte array then? C doesn't
         | even have proper string support yet, ie unicode strings. Most
         | wchar functions don't care at all about unicode rules. Zero-
         | terminated byte buffers are certainly not strings, just
         | garbage.
         | 
         | C will never get proper string support, so you'll never be able
         | to seperate them from zero-terminated byte buffers vs byte-
         | buffers in the type system.
         | 
         | So annotating vars is perfectly fine.
         | 
         | The problem was that the PM and Release manager was completely
         | unaware of the state of the next branch, of its upcoming
         | problems and fixes, and just hacked around in his usual cowboy
         | manner. Entirely unprofessional. A release manager should have
         | been aware of Kees' gcc15 fixes.
         | 
         | But they have not tooling support, no oversight, just endless
         | blurbs on their main mailinglist. No CI for a release
         | candidate? Reminds us of typical cowboys in other places.
        
           | iforgotpassword wrote:
           | I think the idea is simply to                 typedef
           | __nostring__ char* bytes;
           | 
           | And then use that type instead of annotating every single
           | variable declaration.
        
             | OskarS wrote:
             | But you would still need to change it everywhere, right?
             | Like, instead of changing the annotation everywhere you
             | have to change the type everywhere. Doesn't seem like a
             | huge difference to me.
        
               | Animux wrote:
               | There is a difference if the type is used inside a
               | structure.
        
               | OskarS wrote:
               | Fair, good point.
        
           | timewizard wrote:
           | > No CI for a release candidate?
           | 
           | If the CI system didn't get the Fedora upgrade then it would
           | not have caught it. Aside from that the kernel has a highly
           | configurable build process so getting good coverage is
           | equally complex.
           | 
           | Plus, this is a release candidate, which is noted as being
           | explicitly targeted at developers and enthusiasts. I'm not
           | sure the strength of Kees' objections are well matched to the
           | size of the actual problem.
        
             | badmintonbaseba wrote:
             | But Linus broke the kernel for gcc<15, a CI would have
             | surely caught it.
             | 
             | And Linus is usually much more critical in what gets into
             | master when it comes to other people's contribution, let
             | alone into an RC.
        
         | dataflow wrote:
         | > That the annotation applies to variables and not types is
         | surely an oversight or mistake right?
         | 
         | I don't think so. It doesn't make sense on the type. Otherwise,
         | what should happen here?                 char s[1];       char
         | (__nonstring ns)[1];  // (I guess this would be the syntax?)
         | s[0] = '1';       ns[0] = '\0';       char* p1 = s;  // Should
         | this be legal?       char* p2 = ns;  // Should this be legal?
         | char* __nonstring p3 = s;  // Should this be legal?       char*
         | __nonstring p4 = ns;  // Should this be legal?
         | foo(s, ns, p1, p2, p3, p4);  // Which ones can foo() assume to
         | be NUL-terminated?                                    // Which
         | ones can foo() assume to NOT be NUL-terminated??
         | 
         | By putting it in the type you're not just affecting the
         | initialization, you're establishing an invariant throughout the
         | lifetime of the object... which you cannot enforce in any
         | desirable way here. That would be equivalent to laying a
         | minefield throughout your code.
        
           | dwattttt wrote:
           | Do you mean s & ns to be swapped? ns starts with a NUL
           | terminator and s does not.
        
             | dataflow wrote:
             | No actually, that was the point. I was asking, what do you
             | think should happen if you store a NUL when you're claiming
             | you're not. Or if you don't store a NUL, when you claim
             | it's there.
        
               | dwattttt wrote:
               | Well, as a human compiler, I said "Hey, you've non-NUL
               | terminated a NUL terminated string". If that was what you
               | intended you should use the type annotation for that, so
               | I think that case worked as intended.
               | 
               | EDIT: > what do you think should happen if you store a
               | NUL when you're claiming you're not
               | 
               | I don't believe nonstring implies it doesn't end with a
               | NUL, just that it isn't required to.
        
               | dataflow wrote:
               | But char[] _already_ isn 't required to be NUL-terminated
               | to begin with. char a[1] = {'a'} is perfectly fine, as is
               | a[0] = '1'. If all you want to do is to document the fact
               | that a type can do exactly what it already can...
               | changing the type to something new doesn't make sense.
               | 
               | Note that "works as intended" isn't sole the criterion
               | for "does it make sense" or "should we do this." You can
               | kill a fly with a cannon too, and it achieves the
               | intended outcome, but that doesn't mean you should.
        
             | toast0 wrote:
             | Is ns NUL terminated, or is it an array of chars that
             | happens to end with NUL?
        
           | _nalply wrote:
           | Perhaps unsigned could help here with understanding.
           | 
           | unsigned means, don't use of an integer MSB as sign bit.
           | __nonstring means, the byte array might not be terminated
           | with a NUL byte.
           | 
           | So what happens if you use integers instead of byte arrays? I
           | mean cast away unsigned or add unsigned. Of course these two
           | areas are different, but one could try to design such
           | features that they behave in similar ways where it makes
           | sense.
           | 
           | I am unsure but it seems, if you cast to a different type you
           | lose the conditions of the previous type. And "should this be
           | legal", you can cast away a lot of things and it's legal.
           | That's C.
           | 
           | But whatever because it's not implemented. This all is
           | hypothetical. I understand GCC that they took the easier way.
           | Type strictness is not C's forte.
        
             | dataflow wrote:
             | > Perhaps unsigned could help here with understanding.
             | 
             | No, they're _very_ different situations.
             | 
             | > unsigned means, don't use of an integer MSB as sign bit.
             | 
             | First: unsigned is a keyword. This fact is not
             | insignificant.
             | 
             | But anyway, even assuming they were both keywords or both
             | attributes: "don't use an MSB as a sign bit" makes sense,
             | because the MSB otherwise _is_ used as a sign bit.
             | 
             | > __nonstring means, the byte array might not be terminated
             | with a NUL byte.
             | 
             | The byte array _already_ doesn 't have to contain a NUL
             | character to begin with. It just so happens that you
             | usually initialize it somewhere with an initializer that
             | does, but it's already perfectly legal to strip that NUL
             | away later, or to initialize it in a manner that doesn't
             | include a NUL character (say, char a[1] = {'a'}). It
             | doesn't really make sense to change the type to say "we now
             | have a new type with the cool invariant that is...
             | identical to the original type's."
             | 
             | > I understand GCC that they took the easier way. Type
             | strictness is not C's forte.
             | 
             | People would want whatever they do to make sense in C++
             | too, FWIW. So if they introduce a type incompatibility,
             | they would want it to avoid breaking the world in other
             | languages that enforce them, even if C doesn't.
        
       | AceJohnny2 wrote:
       | Fedora stupidly uses beta compiler in new release, Torvalds
       | blindly upgrades, makes breaking, unreviewed changes in kernel,
       | then flames the maintainer who was working on cleanly updating
       | the kernel for the not-yet-released compiler?
       | 
       | I admire Kees Cook's patience.
        
         | eb0la wrote:
         | IMHO Cook is following good development practices.
         | 
         | You need to know what you support. If you are going to change,
         | it must be planned somehow.
         | 
         | I find Torwalds reckless by changing his development
         | environment before release. If he really needs that computer to
         | release the kernel, it must be stable one. Even better: it
         | should be a VM (hosted somewhere) or part of a CI-CD pipeline.
        
           | evgpbfhnr wrote:
           | He releases rc every single week (ok, except before rc1
           | there's two weeks for merge window), there's no "off" time to
           | upgrade anywhere.
           | 
           | Not that I approve the untested changes, I'd have used a
           | different gcc temporarily (container or whatever), but, yeah,
           | well...
        
             | jaapz wrote:
             | I find it surprising that linus bases his development and
             | release tools based on whatever's in the repositories at
             | that time. Surely it is best practice to pin to a
             | specified, fixed version and upgrade as necessary, so
             | everyone is working with the same tools?
             | 
             | This is common best practice in many environments...
             | 
             | Linus surely knows this, but here he's just being hard
             | headed.
        
               | IsTom wrote:
               | People downloading and compiling the kernel will not be
               | using a fixed version of GCC.
        
               | charcircuit wrote:
               | Why not specify one?
        
               | mort96 wrote:
               | What would that help? People use the compilers in their
               | distros, regardless of what's documented as a supported
               | version in some readme.
        
               | GTP wrote:
               | Because then, if something that is expected to compile
               | doesn't compile correctly, you know that you should check
               | your compiler version. It is the exact same reason why
               | you don't just specify which library your project depends
               | on but also the libraries' version.
        
               | leenify wrote:
               | That can work, but it can also bring quite a few issues.
               | Mozilla effectively does this; their build process
               | downloads the build toolchain, including a specific clang
               | version, during bootstrap, i.e., setting up the build
               | environment.
               | 
               | This is super nice in theory, but it gets murky if you
               | veer off the "I'm building current mainline Firefox
               | path". For example, I'm a maintainer of a Firefox fork
               | that often lags a few versions behind. It has substantial
               | changes, and we are only two guys doing the major work,
               | so keeping up with current changes is not feasible.
               | However, this is a research/security testing-focused
               | project, so this is generally okay.
               | 
               | However, coming back to the build issue, apparently, it's
               | costly to host all those buildchain archives. So they get
               | frequently deleted from the remote repository, which
               | leads to the build only working on machines that
               | downloaded the toolchain earlier (i.e., not Github action
               | runner, for example).
               | 
               | Given that there are many more downstream users of
               | effectively a ton of kernel versions, this quickly gets
               | fairly expensive and takes up a ton of effort unless you
               | pin it to some old version and rarely change it.
               | 
               | So, as someone wanting to mess around with open source
               | projects, their supporting more than 1 specific compiler
               | version is actually quite nice.
        
               | charcircuit wrote:
               | Conceptually it's no different than any other build
               | dependency. It is not expensive to host many versions. $1
               | is enough to store over 1000 compiler versions which
               | would be overkill for the needs of the kernel.
        
               | dooglius wrote:
               | People are usually going to go through `make`, I don't
               | see a reason that couldn't be instrumented to (by
               | default) acquire an upstream GCC vs whatever forked
               | garbage ends up in $PATH
        
               | bombcar wrote:
               | This would result in many more disasters as system GCC
               | and kernel GCC would quickly be out of sync causing all
               | sorts of "unexpected fun".
        
               | dooglius wrote:
               | Why would it go wrong, the ABI is stable and independent
               | of compiler? You would hit issues with C++ but not C. I
               | have certainly built kernels using different versions of
               | GCC than what /lib stuff is compiled with, without issue.
        
               | ndesaulniers wrote:
               | You'd think that, but in effect kconfig/kbuild has many
               | cases where they say "if the compiler supports flag X,
               | use it" where X implies an ABI break. Per task stack
               | protectors comes to mind.
        
               | dooglius wrote:
               | Ah that's interesting, thanks
        
           | hyperpape wrote:
           | I'm completely unsure whether to respond "it was stable, he
           | was running a release version of Fedora" or "there's no such
           | thing as stable under Linux".
           | 
           | The insanity is that the Kernel, Fedora and GCC are so badly
           | coordinated that the beta of the compiler breaks the Kernel
           | build (this is not a beta, this is a pre-alpha in a
           | reasonable universe...is the Kernel a critical user of GCC?
           | Apparently not), and a major distro packages that beta
           | version of the compiler.
           | 
           | To borrow a phrase from Reddit: "everybody sucks here" (even
           | Cook, who looks the best of everyone here, seems either
           | oblivious or defeated about how clownshoes it is that
           | released versions of major linux distros can't build the
           | Kernel. The solution of "don't update to release versions" is
           | crap).
           | 
           | (Writing this from a Linux machine, which I will continue
           | using, but also sort of despise).
        
           | ploxiln wrote:
           | The real problem here was "-Werror", dogmatically fixing
           | warnings, and using the position of privilege to push in
           | last-minute commits without review.
           | 
           | Compilers will be updated, they will have new warnings, this
           | has happened numerous times and will happen in the future.
           | The linux kernel has always supported a wide range of
           | compiler versions, from the very latest to 5+ years old.
           | 
           | I've ranted about "-Werror" in the past, but to try to keep
           | it concise: it breaks builds that would and should otherwise
           | work. It breaks older code with newer compiler and different-
           | platform compiler. This is bad because then you can't, say,
           | use the exact code specified/intended without modifications,
           | or you can't test and compare different versions or different
           | toolchains, etc. A good developer will absolutely not
           | tolerate a deluge of warnings all the time, they will decide
           | to fix the warnings to get a clean build, over a reasonable
           | time with well-considered changes, rather than be forced to
           | fix them immediately with brash disruptive code changes. And
           | this is a perfect example why. New compiler fine, new
           | warnings fine. Warnings are a useful feature, distinct from
           | errors. "-Werror" is the real error.
        
             | mort96 wrote:
             | With or without -Werror, you need your builds to be clean
             | with the project's chosen compilers.
             | 
             | Linux decided, on a whim, that a pre-release of GCC 15
             | ought to suddenly be a compiler that the Linux project
             | officially uses, and threw in some last-minute commits
             | straight to main, which is insane. But even without
             | -Werror, when the project decides to upgrade compiler
             | versions, warnings must be silenced, either through
             | disabling new warnings or through changing the source code.
             | Warnings have value, and they only have value if they're
             | not routinely ignored.
             | 
             | For the record, I agree that -Werror sucks. It's nice in
             | CI, but it's terrible to have it enabled by default, as it
             | means that your contributors will have their build broken
             | just because they used a different compiler version than
             | the ones which the project has decided to officially adopt.
             | But I don't think it's the problem here. The problem here
             | is Linus's sudden decision to upgrade to a pre-release
             | version of GCC which has new warnings and commit "fixes"
             | straight to main.
        
             | llm_nerd wrote:
             | This is my take-away as well. Many projects let warnings
             | fester until they hit a volume where critical warnings are
             | missed amidst all the noise. That isn't ideal, but seems to
             | be the norm in many spaces (for instance the nodejs world
             | where it's just pages and pages of warnings and
             | deprecations and critical vulnerabilities and...).
             | 
             | But pushing breaking changes just to suppress some new
             | warning should not be the alternative. Working to minimize
             | warnings in a pragmatic way seems more tenable.
        
             | ndesaulniers wrote:
             | Sadly, I lost that battle with Torvalds. You can see me
             | make some of those points on LKML.
        
               | ploxiln wrote:
               | I see, thanks. ( Found it here:
               | https://lkml.org/lkml/2021/9/7/716 )
        
         | josefx wrote:
         | > makes breaking, unreviewed changes in kernel,
         | 
         | And reverted them as soon as the issue became apparent.
         | 
         | > then flames the maintainer who was working on cleanly
         | updating the kernel for the not-yet-released compiler?
         | 
         | Talking aboutchanges that he had not pushed by the time Linus
         | published the release candidate.
         | 
         | Also the "not yet released" seems to be a red herring, as the
         | article notes having beta versions of compilers in new releases
         | is a tradition for some distros, so that should not be
         | unexpected. It makes some sense since distros tend to stick to
         | a compiler for each elease, so shipping a soon to be out of
         | maintenance compiler from day one will only cause other issues
         | down the road.
        
           | Kwpolska wrote:
           | Fedora releases are supported for about 13 months after
           | release. They could live with an older version of GCC for a
           | year.
        
             | Denvercoder9 wrote:
             | > They could live with an older version of GCC for a year.
             | 
             | That's just not what Fedora is, though. Being on the
             | bleeding edge is foundational to Fedora, even if it's
             | sometimes inconvenient. If you want battle-tested and
             | stable, don't run Fedora, but use Debian or something.
        
               | Kwpolska wrote:
               | Bleeding-edge is fine, but shipping a beta C compiler
               | seems a bridge too far. Even Arch does not ship GCC 15
               | yet.
        
         | rwmj wrote:
         | The GCC 15 transition has been very disruptive, but Fedora is
         | known for being on the bleeding edge ("first" is in the "four
         | foundations" [1]). Be glad because eventually everyone will get
         | GCC 15, and we've worked out most of the problems for you
         | already.
         | 
         | [1] https://docs.fedoraproject.org/en-US/project/
        
           | genewitch wrote:
           | Do you work in marketing
        
           | stefan_ wrote:
           | GCC 15.1 was released today. Your Fedora release was two
           | weeks earlier, now using a nonexistent version of 15.0.1,
           | ironically now including bugs you reported and that were
           | fixed for 15.1. That just seems like poor decision making.
        
             | rwmj wrote:
             | You're belittling the large amount of work done across
             | thousands of packages to get them ready for GCC 15, which
             | did involve backporting fixes to GCC 15 itself. All those
             | fixes went into GCC upstream. GCC 15.1 was released two
             | hours ago as of writing this message, even before the US
             | wakes up, yet I'm sure there will be a build of it in
             | Fedora later today.
        
               | blueflow wrote:
               | Creating the fake release for gcc was by no means
               | necessary for that.
        
               | rwmj wrote:
               | GCC 15.1 building: https://koji.fedoraproject.org/koji/bu
               | ildinfo?buildID=270512...
        
           | ahoka wrote:
           | This is just GCC 2.96 again, they will never learn.
        
             | bonzini wrote:
             | GCC 2.96 lasted a year or more and even after GCC 3.0 was
             | released it wasn't able to compile a working kernel. This
             | lasted two weeks and the issue is just a new warning; it's
             | just bad timing across the release cycles of two projects.
        
           | mackal wrote:
           | Gentoo also has a tracker [1] for GCC 15 issues that they've
           | been working on as well. (Note: GCC 15 is masked in Gentoo so
           | you have to go out of your way to install it)
           | 
           | [1] https://bugs.gentoo.org/932474
        
         | JoshTriplett wrote:
         | Exactly. As quoted in the article:
         | 
         | > you didn't coordinate with anyone. You didn't search lore for
         | the warning strings, you didn't even check -next where you've
         | now created merge conflicts. You put insufficiently tested
         | patches into the tree at the last minute and cut an rc release
         | that broke for everyone using GCC <15. You mercilessly flame
         | maintainers for much much less.
         | 
         | Hypocrisy is an even worse trait than flaming people.
        
           | genewitch wrote:
           | On the one hand, sure, fine. He has raked people for less.
           | However this is just an RC. Further, how long has Linus been
           | doing this?
           | 
           | I remember Maddox on xmission having a page explaining that
           | while he may make a grammatical error from time to time, he
           | has published literally hundreds of thousands of words, and
           | the average email he receives contains 10% errors.
           | 
           | However, Linus is well-known for being abrasive, abusive,
           | call it what you want. If you can't take it, don't foist it,
           | Linus. Even if you've earned the right, IMO.
        
             | 7bit wrote:
             | Nobody earns the right to be an asshole. That is nothing
             | that can be earned.
        
               | wizzwizz4 wrote:
               | I'd say if you're doing truly-heroic _solo_ efforts, then
               | you can earn that. (But I can only think of fictional
               | examples.) For team efforts like the Linux kernel, sure,
               | no amount of individual contribution to that project
               | grants you the right to belittle the other contributors.
        
               | nick__m wrote:
               | Fabrice Bellard as earned that right but somehow I don't
               | think he is !
        
               | wizzwizz4 wrote:
               | Fabrice Bellard's work is _impressive_ , but I wouldn't
               | call it _heroic_. I was thinking more like, the grumpy-
               | guts who ensures the local homeless shelter is adequately
               | stocked with food, clean bedding, and toiletries, day-in
               | and day-out, even in the depths of winter. You 're
               | allowed to be vaguely misanthropic in your interpersonal
               | relationships if you're doing something like that, at
               | least in my book.
               | 
               | Again, the only non-fictional people I know who qualify,
               | are actually really nice to people.
        
               | sanderjd wrote:
               | Still nope.
        
               | kelnos wrote:
               | This idea that if you've done great things, then you've
               | earned the right to treat people poorly, needs to go
               | away. It's toxic and gross, and we should expect and
               | demand better of our heroes (and ourselves).
        
               | mannykannot wrote:
               | Indeed. On the other hand, the right to _show_ that you
               | are an asshole is available to anyone, and it has become
               | quite popular!
        
           | deeThrow94 wrote:
           | > Hypocrisy is an even worse trait than flaming people.
           | 
           | Eh I mean everyone's a hypocrite if you dig deep enough--
           | we're all a big nest of contradictions internally.
           | Recognition of this and accountability is paramount though.
           | He could have simply owned his mistake and swallowed his
           | pride and this wouldn't have been such an issue.
        
       | jaapz wrote:
       | Torvalds is known for being flamey towards kernel maintainers,
       | but most of the time that is for good reason. Here however, he
       | should just admit he made a mistake instead of doubling down.
       | Admitting your own mistakes is a mark of a great maintainer as
       | well.
        
         | bonzini wrote:
         | Yeah, admitting he's wrong is certainly not his strong suit. He
         | will do so years down the road but not in the heat of the
         | argument.
        
         | bastawhiz wrote:
         | Torvalds is exactly the sort of person I'd leave a company to
         | be as far away from as possible. He's brilliant but absolutely
         | insufferable whether it's deserved or not. Anyone with admin
         | access who leans into a pissing contest after breaking the
         | build because they upgraded their operating system just before
         | a release rather than taking the hour to fix the mess they made
         | is going to make you hate your job. God bless the well-mannered
         | kernel maintainers who grind past it.
        
           | ndesaulniers wrote:
           | I don't miss working on the kernel, tbf. Constant arguments.
        
       | Ukv wrote:
       | From the comments:
       | 
       | > C "strings" work the way they do because C is a low level
       | language, where you want to be able to do low-level things when
       | necessary. It's a feature, not a deficiency.
       | 
       | Are NUL-terminated strings really considered preferable, even for
       | low-level work? I always just considered them an unfortunate
       | design choice C was stuck with.
       | 
       | Many O(1) operations/checks become O(n) because you have to
       | linearly traverse the entire string (or keep a second pointer) to
       | know where it ends/how long it is; you can't take a substring
       | within another string without reallocating and copying that part
       | over with a new NUL appended at the end; you can't store data
       | that may contain a NUL (which text _shouldn 't_, in theory, but
       | then you need a separate approach for binary data); and plenty of
       | security issues arise from missing or extra NULs.
        
         | formerly_proven wrote:
         | C's design is probably the most post-hoc rationalized thing in
         | the world directly after Abrahamic scripture.
         | 
         | "Of course the null-terminated strings of C are more low-level
         | than the length-prefixed strings of Pascal, because the elders
         | of C wisely designed them to be so." Alternatively, something
         | is low-level because it works like C because C semantics have
         | simply become the universal definition of what is thought of as
         | low-level, regardless of machine mismatch.
         | 
         | Likewise, maybe it's not such a good idea that UNIXv6 or other
         | educational unix-likes are used in operating system classes in
         | universities. It's well-applicable, sure, but that's not the
         | point of that education. Maybe we should use a Japanese or
         | German clone of some IBM mainframe system instead, so that
         | people actually get exposed to different ideas, instead of
         | slightly simpler and less sophisticated versions of the ideas
         | they are already familiar with. Too much unix-inbreeding in CS
         | education isn't good.
        
           | pjmlp wrote:
           | Especially when C advocates tend to ignore the history of
           | systems programming languages predating the language by a
           | decade, because the authors decided it was cooler to do their
           | own thing, notice a similar pattern to other languages?
           | 
           | > Although we entertained occasional thoughts about
           | implementing one of the major languages of the time like
           | Fortran, PL/I, or Algol 68, such a project seemed hopelessly
           | large for our resources: much simpler and smaller tools were
           | called for. All these languages influenced our work, but it
           | was more fun to do things on our own.
           | 
           | -- https://www.nokia.com/bell-labs/about/dennis-m-
           | ritchie/chist...
           | 
           | And using Pascal as counter example gets tiresome, not only
           | it wasn't designed for systems programming, most of its
           | dialects did fix those issues including its revised report
           | (ISO Extended Pascal), by 1978 Niklaus Wirth had created
           | Modula-2, based on Mesa (Xerox PARC replacement for their use
           | of BCPL), both of which never had problem with string
           | lengths.
        
             | formerly_proven wrote:
             | Well it's just the common name for that particular string
             | representation, even though it certainly existed before
             | Pascal - just like C did not invent null-terminated
             | strings, either.
        
               | pjmlp wrote:
               | The name has nothing to do with the insecure way it was
               | implemented in C.
        
           | xnorswap wrote:
           | I agree there's a teaching problem happening somewhere. I'm
           | not sure I blame CS-education since I'd wager that most
           | developers don't have a formal CS background.
           | 
           | I too regularly however come across people who believe some
           | or all of the following:
           | 
           | - "Everything is ultimately just C"
           | 
           | - "All other languages just compile to C, so you should use
           | it to be fast"
           | 
           | - "C is faster because it's closer to bare metal"
           | 
           | - "C is fast because it doesn't need to be interpreted unlike
           | all other languages"
           | 
           | The special elevated position of C, being some kind of
           | "ground truth" of computers is bizarre. It leads to all kinds
           | of false-optimizations in practitioners in other languages
           | out of some kind of misplaced confidence in the speed of C
           | relative to all other languages.
           | 
           | The idea that C is "naturally faster" due to being some kind
           | of representation of a computer that no other language could
           | achieve is a hard myth to shake.
        
         | IshKebab wrote:
         | Yeah it was clearly an old design mistake. There's never a
         | situation now where null-terminated strings make more sense
         | than length-prefixed. I'm dubious they were ever better.
        
           | nofriend wrote:
           | Null terminated strings make writing parsers really clean.
           | The null byte becomes another character for you to check
           | against in your parser code, so you don't need a separate
           | check for the length (usually checks are inclusive: it is
           | character 'a'? if not, defer to caller. so checking for null
           | byte can happen in a single location, whereas checking for
           | length would need to happen in every function). And it means
           | you have lots of *ptr++ spread around your code, rather than
           | having to pass around a struct and modify it, or call methods
           | on it.
        
             | IshKebab wrote:
             | It's really not hard to check the length. Checking null
             | bytes also adds an awkward memory data dependency that can
             | make SIMD more awkward. It also makes strlen O(n) which is
             | kinda shit - for example it led to that famous GTA5
             | accidental O(n^2) bug.
             | 
             | For situations where a null terminator really is better
             | it's easy to add them to a length-prefixed string, whereas
             | the reverse is not true.
             | 
             | They clearly got this wrong.
        
         | GuB-42 wrote:
         | Zero-termination is not lower level than having a separate
         | size, or an end pointer.
         | 
         | What is low level is deciding on an memory representation and
         | working with it directly. A high level language will just have
         | a "string" object, its internal representation is hidden to the
         | programmer and could potentially be changed between versions.
         | 
         | In C, "string" has a precise meaning, it is a pointer to a
         | statically allocated array of bytes with the characters 's',
         | 't', 'r', 'i', 'n', 'g' followed by a zero. That is the low
         | level part, C programmers manipulate the memory directly and
         | need such guarantees. Had it been defined as the number of
         | characters in 4 bytes followed by each character of 2 bytes
         | each in native endian would be just as low level. Defining it
         | as "it is a character string, use the standard library and
         | don't look too closely", as it is the case in Java is high
         | level.
         | 
         | The "feature" is that the memory representation of strings is
         | well defined. The choice of zero-termination has some pros and
         | cons.
         | 
         | Note that in many cases, you can use size+data instead, using
         | mem* functions instead of the str* ones. And though it is not
         | ideal, you can also use "%.*s" in printf(). Not ideal, but
         | workable.
        
         | wat10000 wrote:
         | They have one advantage, which is saving 3 bytes of memory
         | (depending on what you decide your max supported string length
         | should be) per string. It's hard to imagine an environment
         | where that's a worthwhile tradeoff, even in the most
         | constrained embedded systems (where you can probably get away
         | with a 16-bit length field and thus only save one byte), but
         | they're not completely without merit.
        
       | Philpax wrote:
       | Linus was a hypocritical asshole here, but more to the point, why
       | are they using strings for this anyway? No byte arrays / literals
       | in their C dialect?
        
         | xnorswap wrote:
         | My layman understanding is that it's the other way around, C
         | doesn't have a string type.
         | 
         | Since C doesn't have a string type, "quoted strings" are
         | actually char[] but with '\0' as an extra last character.
         | 
         | People have therefore made warnings happen when defining a
         | char[] which silently truncates the '\0', because that's a
         | common source of bugs.
         | 
         | They've then had to develop a way of easily disabling that
         | warning from being generated, because it's also common enough
         | to want to avoid the warning.
         | 
         | All of this seems insane coming from a modern language.
         | 
         | But look at the complete disaster that was the Python 2 -> 3
         | migration, a large motivator for which was "fixing" strings
         | from a non-unicode to unicode compatible type. A decade or more
         | of lost productivity as people struggled to migrate.
         | 
         | There's no way to actually fix C. Just keep recommending that
         | people don't use it.
        
           | Philpax wrote:
           | Yep, agreed on all accounts; I'm an advocate for Rust for
           | Linux for these reasons, among others.
           | 
           | My thinking was that the Linux kernel already uses a custom
           | dialect of C with specific features that benefit their
           | workflow; I'm surprised that one of those features wasn't a
           | char[] charset = b"abcdefghijklmnopqrstuvwxyz";
           | 
           | that would allow for intent to be signalled to the compiler.
        
       | Incipient wrote:
       | This is definitely unexpected for me - I'd have thought something
       | like an RC for a kernel would have to be 'approved' for release
       | only after passing all tests, which should include building with
       | all official compilers (and all official architectures, etc).
       | 
       | Unless either the older GCC or the beta GCC isn't "official"? In
       | which case that's not necessarily expected to be picked up in an
       | RC?
        
         | bonzini wrote:
         | Release candidates are just time-based for many projects. For
         | Linux, in addition, the rhythm of stabilization can be
         | different for various subsystems.
        
         | Philpax wrote:
         | My understanding is that Linux has no form of CI [0], so they
         | don't actually have an automated way to check for compilation
         | across all platforms and compilers.
         | 
         | [0]: https://lwn.net/Articles/1018802/
        
       | bjourne wrote:
       | Err... we teach C neophytes that you should never write values to
       | variables that are larger than what the variables can hold. Don't
       | write an int to a short, don't write a short to a char, and don't
       | initialize five bytes to an array storing four bytes. Am I
       | missing something here? char foo[4] = "ABCD" is always incorrect,
       | no ifs and buts. If you want "readable" bytes, use character
       | literals. You should never discount the null terminator.
        
         | SonOfLilit wrote:
         | Yes, you're missing something.
         | 
         | Sometimes I'll need an array of 4 ints, so I'll define one:
         | int a[4] = {1,2,3,4};
         | 
         | other times I'll want 4 bytes. So sure, I can write:
         | char a[4] = {'A','B','C','D'};
         | 
         | However, (I hope) I'll get the exact same compiler warning as
         | the more readable:                   char a[4] = "ABCD";
         | 
         | that does the exact same. So I'll need the __nonstring__
         | anyway. And then why not use the more readable syntax, since
         | I'm telling the compiler and reader explicitly that I don't
         | want a null terminator?
         | 
         | The core issue is C's habit of using the exact same language
         | construct for different purposes, here char[] for both
         | uint8_array and null_terminated_str.
        
           | bjourne wrote:
           | Your char a[4] is not more readable and sooner or later
           | you'll get screwed by strlen(a) or some such. It's quite
           | telling that the construct is not legal in c++.
        
             | SonOfLilit wrote:
             | Why would I run strlen() on the second one but not on the
             | first one? Presumably I know that I defined an array of
             | chars and not a cstring? Or if I forget, couldn't I forget
             | in the first case too? Once I defined it, they're both just
             | char[4]s.
        
           | nofriend wrote:
           | > However, (I hope) I'll get the exact same compiler warning
           | as the more readable:
           | 
           | The latter is a null terminated string, the former is not.
           | Compiler warnings are principally a set of heuristics for bad
           | code. Heuristically the first example is more likely to be
           | intentional than the latter.
        
       | mastax wrote:
       | It's still shocking to me that there's no official kernel CI.
        
       ___________________________________________________________________
       (page generated 2025-04-25 23:01 UTC)