[HN Gopher] Some __nonstring__ Turbulence
___________________________________________________________________
Some __nonstring__ Turbulence
Author : jwilk
Score : 121 points
Date : 2025-04-25 06:46 UTC (16 hours ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| jey wrote:
| That the annotation applies to variables and not types is surely
| an oversight or mistake right? Seems like it could have been
| easier to initially implement that way but it just doesn't seem
| to fit with how C type system works. (Yes it will make
| declarations uglier to do it on types but that ship has sailed
| long ago; see cdecl.org)
| leni536 wrote:
| I either don't understand how the annotation would work on
| types, or what would be gained by it. What type would be
| annotated? A typedef to char[]?
|
| edit: Unless what they actually mean is annotating struct
| _members_ , that would actually make sense.
| _nalply wrote:
| I do understand.
|
| I imagine that it could work a little bit like unsigned: a
| modifier to integer types that tells that an integer's MSB is
| not to be used as a sign bit.
|
| __nonstring__ tells that the last byte of a byte sequence
| doesn't need to be NUL.
|
| I would find it sensible allowing putting the attribute to a
| type, but whatever.
| leni536 wrote:
| But that doesn't make any difference in the way you have to
| address existing `char arr[4] = "abcd"` declarations.
| _nalply wrote:
| True.
|
| This would be only useful in typedefs. An API could
| declare some byte arrays not strings. But again,
| whatever.
| rurban wrote:
| And how would you type a string vs byte array then? C doesn't
| even have proper string support yet, ie unicode strings. Most
| wchar functions don't care at all about unicode rules. Zero-
| terminated byte buffers are certainly not strings, just
| garbage.
|
| C will never get proper string support, so you'll never be able
| to seperate them from zero-terminated byte buffers vs byte-
| buffers in the type system.
|
| So annotating vars is perfectly fine.
|
| The problem was that the PM and Release manager was completely
| unaware of the state of the next branch, of its upcoming
| problems and fixes, and just hacked around in his usual cowboy
| manner. Entirely unprofessional. A release manager should have
| been aware of Kees' gcc15 fixes.
|
| But they have not tooling support, no oversight, just endless
| blurbs on their main mailinglist. No CI for a release
| candidate? Reminds us of typical cowboys in other places.
| iforgotpassword wrote:
| I think the idea is simply to typedef
| __nostring__ char* bytes;
|
| And then use that type instead of annotating every single
| variable declaration.
| OskarS wrote:
| But you would still need to change it everywhere, right?
| Like, instead of changing the annotation everywhere you
| have to change the type everywhere. Doesn't seem like a
| huge difference to me.
| Animux wrote:
| There is a difference if the type is used inside a
| structure.
| OskarS wrote:
| Fair, good point.
| timewizard wrote:
| > No CI for a release candidate?
|
| If the CI system didn't get the Fedora upgrade then it would
| not have caught it. Aside from that the kernel has a highly
| configurable build process so getting good coverage is
| equally complex.
|
| Plus, this is a release candidate, which is noted as being
| explicitly targeted at developers and enthusiasts. I'm not
| sure the strength of Kees' objections are well matched to the
| size of the actual problem.
| badmintonbaseba wrote:
| But Linus broke the kernel for gcc<15, a CI would have
| surely caught it.
|
| And Linus is usually much more critical in what gets into
| master when it comes to other people's contribution, let
| alone into an RC.
| dataflow wrote:
| > That the annotation applies to variables and not types is
| surely an oversight or mistake right?
|
| I don't think so. It doesn't make sense on the type. Otherwise,
| what should happen here? char s[1]; char
| (__nonstring ns)[1]; // (I guess this would be the syntax?)
| s[0] = '1'; ns[0] = '\0'; char* p1 = s; // Should
| this be legal? char* p2 = ns; // Should this be legal?
| char* __nonstring p3 = s; // Should this be legal? char*
| __nonstring p4 = ns; // Should this be legal?
| foo(s, ns, p1, p2, p3, p4); // Which ones can foo() assume to
| be NUL-terminated? // Which
| ones can foo() assume to NOT be NUL-terminated??
|
| By putting it in the type you're not just affecting the
| initialization, you're establishing an invariant throughout the
| lifetime of the object... which you cannot enforce in any
| desirable way here. That would be equivalent to laying a
| minefield throughout your code.
| dwattttt wrote:
| Do you mean s & ns to be swapped? ns starts with a NUL
| terminator and s does not.
| dataflow wrote:
| No actually, that was the point. I was asking, what do you
| think should happen if you store a NUL when you're claiming
| you're not. Or if you don't store a NUL, when you claim
| it's there.
| dwattttt wrote:
| Well, as a human compiler, I said "Hey, you've non-NUL
| terminated a NUL terminated string". If that was what you
| intended you should use the type annotation for that, so
| I think that case worked as intended.
|
| EDIT: > what do you think should happen if you store a
| NUL when you're claiming you're not
|
| I don't believe nonstring implies it doesn't end with a
| NUL, just that it isn't required to.
| dataflow wrote:
| But char[] _already_ isn 't required to be NUL-terminated
| to begin with. char a[1] = {'a'} is perfectly fine, as is
| a[0] = '1'. If all you want to do is to document the fact
| that a type can do exactly what it already can...
| changing the type to something new doesn't make sense.
|
| Note that "works as intended" isn't sole the criterion
| for "does it make sense" or "should we do this." You can
| kill a fly with a cannon too, and it achieves the
| intended outcome, but that doesn't mean you should.
| toast0 wrote:
| Is ns NUL terminated, or is it an array of chars that
| happens to end with NUL?
| _nalply wrote:
| Perhaps unsigned could help here with understanding.
|
| unsigned means, don't use of an integer MSB as sign bit.
| __nonstring means, the byte array might not be terminated
| with a NUL byte.
|
| So what happens if you use integers instead of byte arrays? I
| mean cast away unsigned or add unsigned. Of course these two
| areas are different, but one could try to design such
| features that they behave in similar ways where it makes
| sense.
|
| I am unsure but it seems, if you cast to a different type you
| lose the conditions of the previous type. And "should this be
| legal", you can cast away a lot of things and it's legal.
| That's C.
|
| But whatever because it's not implemented. This all is
| hypothetical. I understand GCC that they took the easier way.
| Type strictness is not C's forte.
| dataflow wrote:
| > Perhaps unsigned could help here with understanding.
|
| No, they're _very_ different situations.
|
| > unsigned means, don't use of an integer MSB as sign bit.
|
| First: unsigned is a keyword. This fact is not
| insignificant.
|
| But anyway, even assuming they were both keywords or both
| attributes: "don't use an MSB as a sign bit" makes sense,
| because the MSB otherwise _is_ used as a sign bit.
|
| > __nonstring means, the byte array might not be terminated
| with a NUL byte.
|
| The byte array _already_ doesn 't have to contain a NUL
| character to begin with. It just so happens that you
| usually initialize it somewhere with an initializer that
| does, but it's already perfectly legal to strip that NUL
| away later, or to initialize it in a manner that doesn't
| include a NUL character (say, char a[1] = {'a'}). It
| doesn't really make sense to change the type to say "we now
| have a new type with the cool invariant that is...
| identical to the original type's."
|
| > I understand GCC that they took the easier way. Type
| strictness is not C's forte.
|
| People would want whatever they do to make sense in C++
| too, FWIW. So if they introduce a type incompatibility,
| they would want it to avoid breaking the world in other
| languages that enforce them, even if C doesn't.
| AceJohnny2 wrote:
| Fedora stupidly uses beta compiler in new release, Torvalds
| blindly upgrades, makes breaking, unreviewed changes in kernel,
| then flames the maintainer who was working on cleanly updating
| the kernel for the not-yet-released compiler?
|
| I admire Kees Cook's patience.
| eb0la wrote:
| IMHO Cook is following good development practices.
|
| You need to know what you support. If you are going to change,
| it must be planned somehow.
|
| I find Torwalds reckless by changing his development
| environment before release. If he really needs that computer to
| release the kernel, it must be stable one. Even better: it
| should be a VM (hosted somewhere) or part of a CI-CD pipeline.
| evgpbfhnr wrote:
| He releases rc every single week (ok, except before rc1
| there's two weeks for merge window), there's no "off" time to
| upgrade anywhere.
|
| Not that I approve the untested changes, I'd have used a
| different gcc temporarily (container or whatever), but, yeah,
| well...
| jaapz wrote:
| I find it surprising that linus bases his development and
| release tools based on whatever's in the repositories at
| that time. Surely it is best practice to pin to a
| specified, fixed version and upgrade as necessary, so
| everyone is working with the same tools?
|
| This is common best practice in many environments...
|
| Linus surely knows this, but here he's just being hard
| headed.
| IsTom wrote:
| People downloading and compiling the kernel will not be
| using a fixed version of GCC.
| charcircuit wrote:
| Why not specify one?
| mort96 wrote:
| What would that help? People use the compilers in their
| distros, regardless of what's documented as a supported
| version in some readme.
| GTP wrote:
| Because then, if something that is expected to compile
| doesn't compile correctly, you know that you should check
| your compiler version. It is the exact same reason why
| you don't just specify which library your project depends
| on but also the libraries' version.
| leenify wrote:
| That can work, but it can also bring quite a few issues.
| Mozilla effectively does this; their build process
| downloads the build toolchain, including a specific clang
| version, during bootstrap, i.e., setting up the build
| environment.
|
| This is super nice in theory, but it gets murky if you
| veer off the "I'm building current mainline Firefox
| path". For example, I'm a maintainer of a Firefox fork
| that often lags a few versions behind. It has substantial
| changes, and we are only two guys doing the major work,
| so keeping up with current changes is not feasible.
| However, this is a research/security testing-focused
| project, so this is generally okay.
|
| However, coming back to the build issue, apparently, it's
| costly to host all those buildchain archives. So they get
| frequently deleted from the remote repository, which
| leads to the build only working on machines that
| downloaded the toolchain earlier (i.e., not Github action
| runner, for example).
|
| Given that there are many more downstream users of
| effectively a ton of kernel versions, this quickly gets
| fairly expensive and takes up a ton of effort unless you
| pin it to some old version and rarely change it.
|
| So, as someone wanting to mess around with open source
| projects, their supporting more than 1 specific compiler
| version is actually quite nice.
| charcircuit wrote:
| Conceptually it's no different than any other build
| dependency. It is not expensive to host many versions. $1
| is enough to store over 1000 compiler versions which
| would be overkill for the needs of the kernel.
| dooglius wrote:
| People are usually going to go through `make`, I don't
| see a reason that couldn't be instrumented to (by
| default) acquire an upstream GCC vs whatever forked
| garbage ends up in $PATH
| bombcar wrote:
| This would result in many more disasters as system GCC
| and kernel GCC would quickly be out of sync causing all
| sorts of "unexpected fun".
| dooglius wrote:
| Why would it go wrong, the ABI is stable and independent
| of compiler? You would hit issues with C++ but not C. I
| have certainly built kernels using different versions of
| GCC than what /lib stuff is compiled with, without issue.
| ndesaulniers wrote:
| You'd think that, but in effect kconfig/kbuild has many
| cases where they say "if the compiler supports flag X,
| use it" where X implies an ABI break. Per task stack
| protectors comes to mind.
| dooglius wrote:
| Ah that's interesting, thanks
| hyperpape wrote:
| I'm completely unsure whether to respond "it was stable, he
| was running a release version of Fedora" or "there's no such
| thing as stable under Linux".
|
| The insanity is that the Kernel, Fedora and GCC are so badly
| coordinated that the beta of the compiler breaks the Kernel
| build (this is not a beta, this is a pre-alpha in a
| reasonable universe...is the Kernel a critical user of GCC?
| Apparently not), and a major distro packages that beta
| version of the compiler.
|
| To borrow a phrase from Reddit: "everybody sucks here" (even
| Cook, who looks the best of everyone here, seems either
| oblivious or defeated about how clownshoes it is that
| released versions of major linux distros can't build the
| Kernel. The solution of "don't update to release versions" is
| crap).
|
| (Writing this from a Linux machine, which I will continue
| using, but also sort of despise).
| ploxiln wrote:
| The real problem here was "-Werror", dogmatically fixing
| warnings, and using the position of privilege to push in
| last-minute commits without review.
|
| Compilers will be updated, they will have new warnings, this
| has happened numerous times and will happen in the future.
| The linux kernel has always supported a wide range of
| compiler versions, from the very latest to 5+ years old.
|
| I've ranted about "-Werror" in the past, but to try to keep
| it concise: it breaks builds that would and should otherwise
| work. It breaks older code with newer compiler and different-
| platform compiler. This is bad because then you can't, say,
| use the exact code specified/intended without modifications,
| or you can't test and compare different versions or different
| toolchains, etc. A good developer will absolutely not
| tolerate a deluge of warnings all the time, they will decide
| to fix the warnings to get a clean build, over a reasonable
| time with well-considered changes, rather than be forced to
| fix them immediately with brash disruptive code changes. And
| this is a perfect example why. New compiler fine, new
| warnings fine. Warnings are a useful feature, distinct from
| errors. "-Werror" is the real error.
| mort96 wrote:
| With or without -Werror, you need your builds to be clean
| with the project's chosen compilers.
|
| Linux decided, on a whim, that a pre-release of GCC 15
| ought to suddenly be a compiler that the Linux project
| officially uses, and threw in some last-minute commits
| straight to main, which is insane. But even without
| -Werror, when the project decides to upgrade compiler
| versions, warnings must be silenced, either through
| disabling new warnings or through changing the source code.
| Warnings have value, and they only have value if they're
| not routinely ignored.
|
| For the record, I agree that -Werror sucks. It's nice in
| CI, but it's terrible to have it enabled by default, as it
| means that your contributors will have their build broken
| just because they used a different compiler version than
| the ones which the project has decided to officially adopt.
| But I don't think it's the problem here. The problem here
| is Linus's sudden decision to upgrade to a pre-release
| version of GCC which has new warnings and commit "fixes"
| straight to main.
| llm_nerd wrote:
| This is my take-away as well. Many projects let warnings
| fester until they hit a volume where critical warnings are
| missed amidst all the noise. That isn't ideal, but seems to
| be the norm in many spaces (for instance the nodejs world
| where it's just pages and pages of warnings and
| deprecations and critical vulnerabilities and...).
|
| But pushing breaking changes just to suppress some new
| warning should not be the alternative. Working to minimize
| warnings in a pragmatic way seems more tenable.
| ndesaulniers wrote:
| Sadly, I lost that battle with Torvalds. You can see me
| make some of those points on LKML.
| ploxiln wrote:
| I see, thanks. ( Found it here:
| https://lkml.org/lkml/2021/9/7/716 )
| josefx wrote:
| > makes breaking, unreviewed changes in kernel,
|
| And reverted them as soon as the issue became apparent.
|
| > then flames the maintainer who was working on cleanly
| updating the kernel for the not-yet-released compiler?
|
| Talking aboutchanges that he had not pushed by the time Linus
| published the release candidate.
|
| Also the "not yet released" seems to be a red herring, as the
| article notes having beta versions of compilers in new releases
| is a tradition for some distros, so that should not be
| unexpected. It makes some sense since distros tend to stick to
| a compiler for each elease, so shipping a soon to be out of
| maintenance compiler from day one will only cause other issues
| down the road.
| Kwpolska wrote:
| Fedora releases are supported for about 13 months after
| release. They could live with an older version of GCC for a
| year.
| Denvercoder9 wrote:
| > They could live with an older version of GCC for a year.
|
| That's just not what Fedora is, though. Being on the
| bleeding edge is foundational to Fedora, even if it's
| sometimes inconvenient. If you want battle-tested and
| stable, don't run Fedora, but use Debian or something.
| Kwpolska wrote:
| Bleeding-edge is fine, but shipping a beta C compiler
| seems a bridge too far. Even Arch does not ship GCC 15
| yet.
| rwmj wrote:
| The GCC 15 transition has been very disruptive, but Fedora is
| known for being on the bleeding edge ("first" is in the "four
| foundations" [1]). Be glad because eventually everyone will get
| GCC 15, and we've worked out most of the problems for you
| already.
|
| [1] https://docs.fedoraproject.org/en-US/project/
| genewitch wrote:
| Do you work in marketing
| stefan_ wrote:
| GCC 15.1 was released today. Your Fedora release was two
| weeks earlier, now using a nonexistent version of 15.0.1,
| ironically now including bugs you reported and that were
| fixed for 15.1. That just seems like poor decision making.
| rwmj wrote:
| You're belittling the large amount of work done across
| thousands of packages to get them ready for GCC 15, which
| did involve backporting fixes to GCC 15 itself. All those
| fixes went into GCC upstream. GCC 15.1 was released two
| hours ago as of writing this message, even before the US
| wakes up, yet I'm sure there will be a build of it in
| Fedora later today.
| blueflow wrote:
| Creating the fake release for gcc was by no means
| necessary for that.
| rwmj wrote:
| GCC 15.1 building: https://koji.fedoraproject.org/koji/bu
| ildinfo?buildID=270512...
| ahoka wrote:
| This is just GCC 2.96 again, they will never learn.
| bonzini wrote:
| GCC 2.96 lasted a year or more and even after GCC 3.0 was
| released it wasn't able to compile a working kernel. This
| lasted two weeks and the issue is just a new warning; it's
| just bad timing across the release cycles of two projects.
| mackal wrote:
| Gentoo also has a tracker [1] for GCC 15 issues that they've
| been working on as well. (Note: GCC 15 is masked in Gentoo so
| you have to go out of your way to install it)
|
| [1] https://bugs.gentoo.org/932474
| JoshTriplett wrote:
| Exactly. As quoted in the article:
|
| > you didn't coordinate with anyone. You didn't search lore for
| the warning strings, you didn't even check -next where you've
| now created merge conflicts. You put insufficiently tested
| patches into the tree at the last minute and cut an rc release
| that broke for everyone using GCC <15. You mercilessly flame
| maintainers for much much less.
|
| Hypocrisy is an even worse trait than flaming people.
| genewitch wrote:
| On the one hand, sure, fine. He has raked people for less.
| However this is just an RC. Further, how long has Linus been
| doing this?
|
| I remember Maddox on xmission having a page explaining that
| while he may make a grammatical error from time to time, he
| has published literally hundreds of thousands of words, and
| the average email he receives contains 10% errors.
|
| However, Linus is well-known for being abrasive, abusive,
| call it what you want. If you can't take it, don't foist it,
| Linus. Even if you've earned the right, IMO.
| 7bit wrote:
| Nobody earns the right to be an asshole. That is nothing
| that can be earned.
| wizzwizz4 wrote:
| I'd say if you're doing truly-heroic _solo_ efforts, then
| you can earn that. (But I can only think of fictional
| examples.) For team efforts like the Linux kernel, sure,
| no amount of individual contribution to that project
| grants you the right to belittle the other contributors.
| nick__m wrote:
| Fabrice Bellard as earned that right but somehow I don't
| think he is !
| wizzwizz4 wrote:
| Fabrice Bellard's work is _impressive_ , but I wouldn't
| call it _heroic_. I was thinking more like, the grumpy-
| guts who ensures the local homeless shelter is adequately
| stocked with food, clean bedding, and toiletries, day-in
| and day-out, even in the depths of winter. You 're
| allowed to be vaguely misanthropic in your interpersonal
| relationships if you're doing something like that, at
| least in my book.
|
| Again, the only non-fictional people I know who qualify,
| are actually really nice to people.
| sanderjd wrote:
| Still nope.
| kelnos wrote:
| This idea that if you've done great things, then you've
| earned the right to treat people poorly, needs to go
| away. It's toxic and gross, and we should expect and
| demand better of our heroes (and ourselves).
| mannykannot wrote:
| Indeed. On the other hand, the right to _show_ that you
| are an asshole is available to anyone, and it has become
| quite popular!
| deeThrow94 wrote:
| > Hypocrisy is an even worse trait than flaming people.
|
| Eh I mean everyone's a hypocrite if you dig deep enough--
| we're all a big nest of contradictions internally.
| Recognition of this and accountability is paramount though.
| He could have simply owned his mistake and swallowed his
| pride and this wouldn't have been such an issue.
| jaapz wrote:
| Torvalds is known for being flamey towards kernel maintainers,
| but most of the time that is for good reason. Here however, he
| should just admit he made a mistake instead of doubling down.
| Admitting your own mistakes is a mark of a great maintainer as
| well.
| bonzini wrote:
| Yeah, admitting he's wrong is certainly not his strong suit. He
| will do so years down the road but not in the heat of the
| argument.
| bastawhiz wrote:
| Torvalds is exactly the sort of person I'd leave a company to
| be as far away from as possible. He's brilliant but absolutely
| insufferable whether it's deserved or not. Anyone with admin
| access who leans into a pissing contest after breaking the
| build because they upgraded their operating system just before
| a release rather than taking the hour to fix the mess they made
| is going to make you hate your job. God bless the well-mannered
| kernel maintainers who grind past it.
| ndesaulniers wrote:
| I don't miss working on the kernel, tbf. Constant arguments.
| Ukv wrote:
| From the comments:
|
| > C "strings" work the way they do because C is a low level
| language, where you want to be able to do low-level things when
| necessary. It's a feature, not a deficiency.
|
| Are NUL-terminated strings really considered preferable, even for
| low-level work? I always just considered them an unfortunate
| design choice C was stuck with.
|
| Many O(1) operations/checks become O(n) because you have to
| linearly traverse the entire string (or keep a second pointer) to
| know where it ends/how long it is; you can't take a substring
| within another string without reallocating and copying that part
| over with a new NUL appended at the end; you can't store data
| that may contain a NUL (which text _shouldn 't_, in theory, but
| then you need a separate approach for binary data); and plenty of
| security issues arise from missing or extra NULs.
| formerly_proven wrote:
| C's design is probably the most post-hoc rationalized thing in
| the world directly after Abrahamic scripture.
|
| "Of course the null-terminated strings of C are more low-level
| than the length-prefixed strings of Pascal, because the elders
| of C wisely designed them to be so." Alternatively, something
| is low-level because it works like C because C semantics have
| simply become the universal definition of what is thought of as
| low-level, regardless of machine mismatch.
|
| Likewise, maybe it's not such a good idea that UNIXv6 or other
| educational unix-likes are used in operating system classes in
| universities. It's well-applicable, sure, but that's not the
| point of that education. Maybe we should use a Japanese or
| German clone of some IBM mainframe system instead, so that
| people actually get exposed to different ideas, instead of
| slightly simpler and less sophisticated versions of the ideas
| they are already familiar with. Too much unix-inbreeding in CS
| education isn't good.
| pjmlp wrote:
| Especially when C advocates tend to ignore the history of
| systems programming languages predating the language by a
| decade, because the authors decided it was cooler to do their
| own thing, notice a similar pattern to other languages?
|
| > Although we entertained occasional thoughts about
| implementing one of the major languages of the time like
| Fortran, PL/I, or Algol 68, such a project seemed hopelessly
| large for our resources: much simpler and smaller tools were
| called for. All these languages influenced our work, but it
| was more fun to do things on our own.
|
| -- https://www.nokia.com/bell-labs/about/dennis-m-
| ritchie/chist...
|
| And using Pascal as counter example gets tiresome, not only
| it wasn't designed for systems programming, most of its
| dialects did fix those issues including its revised report
| (ISO Extended Pascal), by 1978 Niklaus Wirth had created
| Modula-2, based on Mesa (Xerox PARC replacement for their use
| of BCPL), both of which never had problem with string
| lengths.
| formerly_proven wrote:
| Well it's just the common name for that particular string
| representation, even though it certainly existed before
| Pascal - just like C did not invent null-terminated
| strings, either.
| pjmlp wrote:
| The name has nothing to do with the insecure way it was
| implemented in C.
| xnorswap wrote:
| I agree there's a teaching problem happening somewhere. I'm
| not sure I blame CS-education since I'd wager that most
| developers don't have a formal CS background.
|
| I too regularly however come across people who believe some
| or all of the following:
|
| - "Everything is ultimately just C"
|
| - "All other languages just compile to C, so you should use
| it to be fast"
|
| - "C is faster because it's closer to bare metal"
|
| - "C is fast because it doesn't need to be interpreted unlike
| all other languages"
|
| The special elevated position of C, being some kind of
| "ground truth" of computers is bizarre. It leads to all kinds
| of false-optimizations in practitioners in other languages
| out of some kind of misplaced confidence in the speed of C
| relative to all other languages.
|
| The idea that C is "naturally faster" due to being some kind
| of representation of a computer that no other language could
| achieve is a hard myth to shake.
| IshKebab wrote:
| Yeah it was clearly an old design mistake. There's never a
| situation now where null-terminated strings make more sense
| than length-prefixed. I'm dubious they were ever better.
| nofriend wrote:
| Null terminated strings make writing parsers really clean.
| The null byte becomes another character for you to check
| against in your parser code, so you don't need a separate
| check for the length (usually checks are inclusive: it is
| character 'a'? if not, defer to caller. so checking for null
| byte can happen in a single location, whereas checking for
| length would need to happen in every function). And it means
| you have lots of *ptr++ spread around your code, rather than
| having to pass around a struct and modify it, or call methods
| on it.
| IshKebab wrote:
| It's really not hard to check the length. Checking null
| bytes also adds an awkward memory data dependency that can
| make SIMD more awkward. It also makes strlen O(n) which is
| kinda shit - for example it led to that famous GTA5
| accidental O(n^2) bug.
|
| For situations where a null terminator really is better
| it's easy to add them to a length-prefixed string, whereas
| the reverse is not true.
|
| They clearly got this wrong.
| GuB-42 wrote:
| Zero-termination is not lower level than having a separate
| size, or an end pointer.
|
| What is low level is deciding on an memory representation and
| working with it directly. A high level language will just have
| a "string" object, its internal representation is hidden to the
| programmer and could potentially be changed between versions.
|
| In C, "string" has a precise meaning, it is a pointer to a
| statically allocated array of bytes with the characters 's',
| 't', 'r', 'i', 'n', 'g' followed by a zero. That is the low
| level part, C programmers manipulate the memory directly and
| need such guarantees. Had it been defined as the number of
| characters in 4 bytes followed by each character of 2 bytes
| each in native endian would be just as low level. Defining it
| as "it is a character string, use the standard library and
| don't look too closely", as it is the case in Java is high
| level.
|
| The "feature" is that the memory representation of strings is
| well defined. The choice of zero-termination has some pros and
| cons.
|
| Note that in many cases, you can use size+data instead, using
| mem* functions instead of the str* ones. And though it is not
| ideal, you can also use "%.*s" in printf(). Not ideal, but
| workable.
| wat10000 wrote:
| They have one advantage, which is saving 3 bytes of memory
| (depending on what you decide your max supported string length
| should be) per string. It's hard to imagine an environment
| where that's a worthwhile tradeoff, even in the most
| constrained embedded systems (where you can probably get away
| with a 16-bit length field and thus only save one byte), but
| they're not completely without merit.
| Philpax wrote:
| Linus was a hypocritical asshole here, but more to the point, why
| are they using strings for this anyway? No byte arrays / literals
| in their C dialect?
| xnorswap wrote:
| My layman understanding is that it's the other way around, C
| doesn't have a string type.
|
| Since C doesn't have a string type, "quoted strings" are
| actually char[] but with '\0' as an extra last character.
|
| People have therefore made warnings happen when defining a
| char[] which silently truncates the '\0', because that's a
| common source of bugs.
|
| They've then had to develop a way of easily disabling that
| warning from being generated, because it's also common enough
| to want to avoid the warning.
|
| All of this seems insane coming from a modern language.
|
| But look at the complete disaster that was the Python 2 -> 3
| migration, a large motivator for which was "fixing" strings
| from a non-unicode to unicode compatible type. A decade or more
| of lost productivity as people struggled to migrate.
|
| There's no way to actually fix C. Just keep recommending that
| people don't use it.
| Philpax wrote:
| Yep, agreed on all accounts; I'm an advocate for Rust for
| Linux for these reasons, among others.
|
| My thinking was that the Linux kernel already uses a custom
| dialect of C with specific features that benefit their
| workflow; I'm surprised that one of those features wasn't a
| char[] charset = b"abcdefghijklmnopqrstuvwxyz";
|
| that would allow for intent to be signalled to the compiler.
| Incipient wrote:
| This is definitely unexpected for me - I'd have thought something
| like an RC for a kernel would have to be 'approved' for release
| only after passing all tests, which should include building with
| all official compilers (and all official architectures, etc).
|
| Unless either the older GCC or the beta GCC isn't "official"? In
| which case that's not necessarily expected to be picked up in an
| RC?
| bonzini wrote:
| Release candidates are just time-based for many projects. For
| Linux, in addition, the rhythm of stabilization can be
| different for various subsystems.
| Philpax wrote:
| My understanding is that Linux has no form of CI [0], so they
| don't actually have an automated way to check for compilation
| across all platforms and compilers.
|
| [0]: https://lwn.net/Articles/1018802/
| bjourne wrote:
| Err... we teach C neophytes that you should never write values to
| variables that are larger than what the variables can hold. Don't
| write an int to a short, don't write a short to a char, and don't
| initialize five bytes to an array storing four bytes. Am I
| missing something here? char foo[4] = "ABCD" is always incorrect,
| no ifs and buts. If you want "readable" bytes, use character
| literals. You should never discount the null terminator.
| SonOfLilit wrote:
| Yes, you're missing something.
|
| Sometimes I'll need an array of 4 ints, so I'll define one:
| int a[4] = {1,2,3,4};
|
| other times I'll want 4 bytes. So sure, I can write:
| char a[4] = {'A','B','C','D'};
|
| However, (I hope) I'll get the exact same compiler warning as
| the more readable: char a[4] = "ABCD";
|
| that does the exact same. So I'll need the __nonstring__
| anyway. And then why not use the more readable syntax, since
| I'm telling the compiler and reader explicitly that I don't
| want a null terminator?
|
| The core issue is C's habit of using the exact same language
| construct for different purposes, here char[] for both
| uint8_array and null_terminated_str.
| bjourne wrote:
| Your char a[4] is not more readable and sooner or later
| you'll get screwed by strlen(a) or some such. It's quite
| telling that the construct is not legal in c++.
| SonOfLilit wrote:
| Why would I run strlen() on the second one but not on the
| first one? Presumably I know that I defined an array of
| chars and not a cstring? Or if I forget, couldn't I forget
| in the first case too? Once I defined it, they're both just
| char[4]s.
| nofriend wrote:
| > However, (I hope) I'll get the exact same compiler warning
| as the more readable:
|
| The latter is a null terminated string, the former is not.
| Compiler warnings are principally a set of heuristics for bad
| code. Heuristically the first example is more likely to be
| intentional than the latter.
| mastax wrote:
| It's still shocking to me that there's no official kernel CI.
___________________________________________________________________
(page generated 2025-04-25 23:01 UTC)