[HN Gopher] OpenBSD now enforcing no invalid NUL characters in s...
       ___________________________________________________________________
        
       OpenBSD now enforcing no invalid NUL characters in shell scripts
        
       Author : CTOSian
       Score  : 152 points
       Date   : 2024-09-24 13:06 UTC (9 hours ago)
        
 (HTM) web link (www.undeadly.org)
 (TXT) w3m dump (www.undeadly.org)
        
       | nubinetwork wrote:
       | So I can't bury a tarball inside a shell script anymore?
        
         | josephcsible wrote:
         | You still can; it just needs to go at the end:
         | 
         | > It remains possible to put arbitrary bytes _AFTER_ the parts
         | of the shell script that get parsed  & executed (like some
         | Solaris patch files do).
        
         | volkadav wrote:
         | Looks like you might be able to at the end of the file, reading
         | the commit message, just not willy-nilly in the middle. :)
        
       | lupusreal wrote:
       | Does this break those self-extracting script/tar files? I forget
       | how those are done, I haven't seen one in many years.
        
         | zx2c4 wrote:
         | From the article: "It remains possible to put arbitrary bytes
         | _AFTER_ the parts of the shell script that get parsed  &
         | executed (like some Solaris patch files do). "
        
         | sneela wrote:
         | Are you talking about Shar?
         | 
         | https://en.wikipedia.org/wiki/Shar_(file_format)
        
           | ape4 wrote:
           | That was a neat idea back in the day but should disallowed
           | now. Running downloaded executables considered harmful.
        
             | Joker_vD wrote:
             | Not in the "Installation: just run `docker run kekw/our-
             | shiny-ai-chatbot` in your shell" world we're living today.
        
               | nucleardog wrote:
               | I think the better example is the all-too-common:
               | "Installation: Just run `curl -sL
               | http://goo.gl/hsjdiNgtehsn | sudo bash`"
        
             | osmsucks wrote:
             | > Running downloaded executables considered harmful
             | 
             | Most executables are downloaded. :)
        
         | 73kl4453dz wrote:
         | They were generally uuencoded or similar
        
         | jancsika wrote:
         | If you don't know anything about OpenBSD, here's a fun thing:
         | 
         | 1. Randomly choose "yes" or "no" to this question.
         | 
         | 2. Read the post and get the answer.
         | 
         | 3. Repeat until you begin to get a tingly "Spidey sense" that
         | overrides your random-choice.
         | 
         | My Spidey sense here was, "Yes, because OpenBSD would have
         | already thought about and covered that use-case." And indeed,
         | toward the end of the post, that contingency is covered and
         | documented.
         | 
         | Note: if you try this at your job and sense that the company
         | will almost always choose the worst option, you should probably
         | leave that job.
        
       | bell-cot wrote:
       | Kudos to OpenBSD!
       | 
       | Similar to the olde-tyme "-o noexec" and "-o nosuid" options for
       | `mount`, there should be easy, no-exceptions ways to blanket ban
       | other types of simply obvious red-flag activity.
        
       | sneela wrote:
       | > This was in snapshots for more than 2 months, and only spotted
       | one other program depending on the behaviour (and that test
       | program did not observe that it was therefore depending in
       | incorrect behaviour!!)
       | 
       | Fascinating. I wonder what that program is, and why it depends on
       | the NUL character.
        
       | mcculley wrote:
       | "We are in a post-Postel world" is a great way to put it. This
       | needs to be repeated by everyone working with file formats or
       | accepting untrusted input.
        
         | nabla9 wrote:
         | Agreed.
         | 
         | When every implementation in wide use has their own quirks, you
         | must support them all to make your program widely used. Every
         | special case is yet another potential bug to chase down.
         | 
         | It also allows "Embrace, extend, and extinguish" -strategy that
         | Microsoft used so successfully to assfuck the internet over a
         | decade.
        
           | pjmlp wrote:
           | I think you mean Google.
        
             | nabla9 wrote:
             | No. The Microsoft. MS invented the term. DOJ found that MS
             | used "Embrace, extend, and extinguish" in internal
             | documents.
             | 
             | Younger people don't know how absolutely ruthless and
             | harmful Wintel monopoly was under Gates. Java did not work
             | on purpose. Javascript did not work for purpose.
             | <!--[if IE]>
             | 
             | everywhere.
             | 
             | They attempted to kill open web in the crib with their
             | blackbird project. Only MSN (The Microsoft Network) for
             | normal people.
        
               | bigstrat2003 wrote:
               | Agreed that it is a Microsoft term. But in my experience,
               | it is older people who incorrectly judge Microsoft
               | ruthlessness, not younger people. I am of an age where I
               | remember well what Microsoft was like in those days, and
               | it frankly was not as bad as people make it out to be.
               | Nor was it really worse than the ruthless tech companies
               | of today.
        
               | quesera wrote:
               | I was there too, and I disagree completely. Microsoft was
               | not just ruthless, they were ubiquitous. They sabotaged
               | any perceived competitors in anticompetitive, market- and
               | industry-damaging ways.
               | 
               | You (the generic "you") can complain all you want about
               | Apple today, but you have another perfectly viable
               | option. And Apple is (almost-entirely) happy to grow
               | market share on merits without salting the earth of any
               | rivals.
               | 
               | In Microsoft's heyday, that was not true. Those of us who
               | rejected MS back then did so at a much higher cost than
               | green chat bubbles.
               | 
               | It was worth it though. And we did win, eventually.
        
               | chasil wrote:
               | Microsoft could not win, although they tried very hard.
               | 
               | Windows was never going to scale down to the portable
               | devices that we now use (because defeating Apple would
               | have been very difficult, and AOSP made it
               | insurmountable).
               | 
               | Windows was never going to scale up to the top 500
               | supercomputer list (for largely economic reasons).
               | 
               | Microsoft itself has tacitly admitted that Azure is
               | better served by Linux, and we ponder why.
               | 
               | Did the DoJ actions against Microsoft really have an
               | impact? I don't know.
        
               | specialist wrote:
               | > _...without salting the earth of any rivals._
               | 
               | Microsoft embodied the adage "It's not enough to win.
               | Everyone else must lose."
        
               | pjmlp wrote:
               | Except it Google that morphed the Web into ChromeOS, with
               | the help of EVERYONE that ships it alongside their
               | applications, as they can't be bothered to learn cross-
               | platform frameworks.
               | 
               | Many of them people that used to complain about Micro$oft
               | and should know better.
        
               | IshKebab wrote:
               | Anyone who was around for the IE6 era knows how much
               | worse it was than the current Chrome era. It's not even
               | close.
        
               | nabla9 wrote:
               | You clearly don't remember or know about how bad it was
               | in IE6 era and before.
        
         | stackghost wrote:
         | "Postel" is not a term that carries any significance for me,
         | and Googling that word didn't turn anything up that seemed
         | relevant.
         | 
         | Who or what is a Postel?
        
           | teraflop wrote:
           | It's a reference to "Postel's law" which is a pretty well-
           | known principle in the networking world, and in software more
           | broadly. Named after Jon Postel, who edited and published
           | many of the RFCs describing core Internet protocols.
           | 
           | https://en.wikipedia.org/wiki/Robustness_principle
        
           | Ndymium wrote:
           | It's a reference to Jon Postel who wrote the following in RFC
           | 761[0]:                   TCP implementations should follow a
           | general principle of robustness:         be conservative in
           | what you do, be liberal in what you accept from
           | others.
           | 
           | Postel's Law is also known as the Robustness principle. [1]
           | 
           | [0] https://datatracker.ietf.org/doc/html/rfc761#section-2.10
           | 
           | [1] https://en.wikipedia.org/wiki/Robustness_principle
        
             | arcanemachiner wrote:
             | I've always felt that this was a misguided principle, to be
             | avoided _when possible_. When designing APIs, I think about
             | this principle a lot.
             | 
             | My philosophy is more along the lines of "I will
             | begrudgingly give you enough rope to hang yourself, but I
             | won't give you enough to hang everybody else."
        
               | quesera wrote:
               | HTML parsing is the modern-ish layer-uplifted example of
               | liberal acceptance.
               | 
               | I won't argue that this hasn't been a disaster for
               | technologists, but there are many arguments that this was
               | core to the success of HTML and consequently the web.
               | 
               | Which, yes, could be considered its own separate
               | disaster, but here we are!
        
               | pas wrote:
               | It makes sense in a "costumer obsessed" way. The user
               | agent tries to show content, tries to send requests and
               | receive the response on behalf of the client (costumer),
               | and _ceteris paribus_ it 's better for the client if the
               | system works even if there's some small error that can be
               | worked around, right?
               | 
               | but of course this leads to the tragedy of anticommons,
               | too many people have an effective "veto" (every shitty
               | middlebox, every "so easy to use" 30 line library that
               | got waaay to popular now contributes to ossification of
               | the stack.
               | 
               | what's the solution? similarly careless adoption of new
               | protocols? and hoping for the best? maybe putting an
               | emphasis on provable correctness, and if something is not
               | conformant to the relevant standard then not considering
               | it "broken" for the "if it ain't broken don't touch it"
               | principle?
        
             | IshKebab wrote:
             | Ironically it leads to less robust systems in the long
             | term.
        
             | thaumasiotes wrote:
             | > Postel's Law is also known as the Robustness principle.
             | 
             | Really? It seems like it's obviously just a description of
             | how natural language works.+ But in that case, there's an
             | enforcement mechanism (not well understood) that causes
             | everyone to be conservative in what they send.
             | 
             | We can observe, by the natural language 'analogy', that the
             | consequence of following this principle is that you never
             | have backwards compatibility. Otherwise things generally
             | work.
             | 
             | + Notably, it has nothing to do with how math works, making
             | it a strange choice for programming.
        
           | komon wrote:
           | A reference to Postel's Law: be conservative in what you
           | produce and liberal in what you accept.
           | 
           | The law references that you should strive to follow all
           | standards in your own output, but you should make a best
           | effort to accept content that may break a standard.
           | 
           | This is useful in the context of open standards and evolving
           | ecosystems since it allows peers speaking different versions
           | of a protocol to continue to communicate.
           | 
           | The assertion being made here is that the world has become
           | too fraught with exploiting this attitude for it to continue
           | being a useful rule
        
             | godshatter wrote:
             | What would have been the result of John Postel advocating
             | for conservative inputs, I wonder? I'm wondering if the
             | most common protocols would have been bypassed if they had
             | all done this by other protocols that allowed more liberal
             | inputs.
        
               | Joker_vD wrote:
               | Yep. Which is why Postel law is, sadly, more like a law
               | of nature (see also "worse is better") than an
               | engineering principle you may or may not follow.
        
               | mrighele wrote:
               | I know it is a single example and we should extrapolate
               | much out of it, but in the case of html those who
               | accepted more liberal input (html4/5) won over over those
               | that were more conservative (xhtml).
        
               | jancsika wrote:
               | Am I correct that malformed pages in xhtml would have
               | triggered the browser to output a red XML error and fail
               | to render the page at all?
        
               | Calavar wrote:
               | Yes, but only if you served the XHTML with the proper
               | MIME type of application/xhtml+xml. Nearly everyone
               | served it as text/html, which would lead to the document
               | being intepreted as this weird pseudo XHTML/HTML4 hybrid
               | dialect with all sorts of brower idiosyncrasies [1].
               | 
               | [1] https://www.hixie.ch/advocacy/xhtml
        
               | 0cf8612b2e1e wrote:
               | I would almost argue a failing of so many standards is
               | the lack of surrounding tooling. Is this implementation
               | correct? Who knows! Try it against this other version and
               | see if they kind of agree. More specifications need to
               | require test suites.
        
               | edflsafoiewq wrote:
               | Not really, since in the end HTML5 defined a precise
               | parsing algorithm that AFAIK everyone follows.
        
               | quesera wrote:
               | HTML5 was born in an era of decent HTML authoring
               | tooling. Very few people write HTML by hand nowadays.
               | This was not true of earlier versions.
               | 
               | Also note that HTML5 codified into liberal acceptance
               | some of the "lazy" manual errors that people made in the
               | early days (many of which were strictly and noisily
               | rejected in XHTML, for example).
        
               | mikaraento wrote:
               | RFC 9413 referenced in a parent mentions HTML. It points
               | out that formats meant to be human-authored may benefit
               | more from being liberally accepted.
               | 
               | I also read that XHTML made template authoring hard, as
               | the template itself might not be valid XHTML and/or
               | different template inputs might make output invalid. (I
               | sadly can't find the source of this point right now, but
               | I can't claim credit for it).
        
               | arp242 wrote:
               | HTML is rather different because it's authored by people.
               | It's typically (though not always!) a good idea to not be
               | too pedantic about accepting user input if you can. XHTML
               | (served with the correct Content-Type) will completely
               | error out if you made a typo and didn't test carefully
               | enough. Useful in dev cycle? Sure. In production? Less
               | so. "The entire page goes tits up because you used <br>
               | instead of <br />" is just not helpful (and also:
               | needlessly pedantic).
               | 
               | But that doesn't really apply to protocols like TCP.
               | Postel's "law" is best understood in the context of 1980,
               | when TCP had been around for a while but without a real
               | standard, everyone was kind of experimenting, and there
               | were tons of little incompatibilities. In this context,
               | it was reasonable and practical advice.
               | 
               | For a lot of other things though: not so much. "Fail
               | fast" is typically the better approach, which will
               | benefit everyone, _especially_ the people implementing
               | the protocols.
               | 
               | This is also why Sendmail became the de-facto standard
               | around the same time by the way: it was bug-compatible
               | with everything else. Later this become a liability
               | (sendmail.cf!), but originally it was a great feature.
        
               | miki123211 wrote:
               | Probably more convoluted protocols, because there are
               | always things that you do accept and that can be used to
               | negotiate protocol extensions.
               | 
               | Imagine a protocol where both sides have to speak JSON
               | with a rigidly-defined structure, and none of the sides
               | is allowed to ask whether the other supports any
               | extension. Such a protocol looks impossible to extend,
               | but that is not the case, you can indicate that you speak
               | a "relaxed" version of that protocol by e.g. following
               | your first left brace by a predefined, large number of
               | whitespace characters. If you see a client doing this,
               | you know they won't drop the connection if you include a
               | supported_extensions field, and you're still able to
               | speak the rigid version to strict clients.
        
               | quesera wrote:
               | This made me laugh, because it's even _more_ terrible
               | than the most ridiculous chicanery we had to vomit into
               | HTML and CSS over the years (most of which was the fault
               | of MSIE6).
        
           | CoastalCoder wrote:
           | Adding to the sibling comments, this is briefly covered in
           | Eric Raymond's _wonderful_ book,  "The Art of Unix
           | Programming" [0].
           | 
           | [0] https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming
        
           | ok123456 wrote:
           | The fact that googling Postel was worthless also indicates
           | we're in a post-google search world.
        
             | AStonesThrow wrote:
             | Bing had no trouble at all finding him from my device.
        
             | Brian_K_White wrote:
             | 2nd result on kagi was about him but in the form of another
             | critic.
             | 
             | https://datatracker.ietf.org/doc/draft-thomson-postel-was-
             | wr...
             | 
             | Hard disagree.
             | 
             | It's a valid argument, but I say it's merely an argument,
             | not an argument that wins or should win.
             | 
             | But also, I say that detecting out of spec or unexpected
             | input and handling it in any other way than crashing IS
             | adhering to Postel.
             | 
             | Refusing to process a request is better than munging the
             | data according to your own creative interpretation of
             | reasonable or likely, and then processing that munged data.
             | 
             | I consider that to be within Postel to return a nice error
             | (or not if that would be a security divulgence). Failing
             | Postel would be to crash or do anything unintended.
        
             | skybrian wrote:
             | Google's results for "Postel's law" and "Jon Postel" are
             | fine. "Postel" is ambiguous, a fairly common surname, so
             | websites of unrelated companies show up, and a
             | disambiguating page on wikipedia that links to Jon Postel
             | and several other people.
        
               | ok123456 wrote:
               | I thought the whole point of letting Google surveil your
               | entire life was they would know that if you're interested
               | in computing and networks, to the point of participating
               | on news.hackernews.com, then they'd know that if you're
               | searching for "Postel," you'd probably want Postel's law
               | to be on the first page.
               | 
               | We're back at pre-1998 search, where we have to specify
               | more and more context just to get results that aren't
               | noise.
        
             | stackghost wrote:
             | I'm actually astounded at how quickly the quality of Google
             | search results has tanked in recent years.
        
           | runjake wrote:
           | Jon Postel was instrumental in making the Internet what it is
           | today.
           | 
           | https://en.wikipedia.org/wiki/Jon_Postel
           | 
           | The Wikipedia article is kinda unclear and doesn't provide
           | the proper context, so:
           | 
           | - Ran IANA, which assigned IP addresses for the Internet.
           | 
           | - Editor of RFCs, which are documents that defined protocols
           | in use by the Internet.
           | 
           | - He wrote a bunch of important RFCs that defined how some
           | very important protocols should work.
           | 
           | - Created or helped create SMTP, DNS, TCP/IP, ARPANET, etc.
        
         | cesarb wrote:
         | > "We are in a post-Postel world" is a great way to put it.
         | 
         | See also RFC 9413 (https://www.rfc-
         | editor.org/rfc/rfc9413.html), originally called "draft-thomson-
         | postel-was-wrong" (https://datatracker.ietf.org/doc/draft-
         | thomson-postel-was-wr...).
        
         | Brian_K_White wrote:
         | There is no such thing as a post Postel world. But handling the
         | input in any other way than crashing or ub IS perfectly Postel.
         | 
         | Deciding that nul is invalid data, and refusing to allow it,
         | and refusing to munge the data and proceed based on the munged
         | data that you essentially made up, as long as whatever you did
         | do instead was graceful and intentional, to me that is
         | perfectly Postel.
        
       | chasil wrote:
       | I was going to check the status of mksh (the Android system
       | shell), but the project page returns:
       | 
       | "Unavailable For Legal Reasons - Sorry, no detailled error
       | message available."
       | 
       | http://www.mirbsd.org/mksh.htm
       | 
       | The Android system shell is now abandoned? This is also in rhel9
       | basesos.
        
         | chaosite wrote:
         | Looks fine here, maybe they're blocking your IP range for some
         | reason?
        
         | kbolino wrote:
         | It's blocked for me too, but only on my home Internet
         | (Xfinity), not my phone (Google Fi/T-Mobile).
        
           | chasil wrote:
           | I see it on my T-Mobile device also. Strange.
        
           | torstenvl wrote:
           | Works fine for me on Xfinity Home via WiFi, Xfinity Mobile,
           | T-Mobile, and Visible by Verizon.
        
             | kbolino wrote:
             | Whatever the issue was, it seems to have been resolved
             | sometime after I last checked.
        
         | tux3 wrote:
         | Works from an EU IP, so whatever it is, it's probably not GDPR?
        
         | fragmede wrote:
         | What's your browser? The server is using an old TLS version
         | which is no longer supported, and some clients will try https
         | and fail there and not try http.
        
           | chasil wrote:
           | I'm using Edge on my corporate desktop.
           | 
           | Edge first tries TLS and comes back with: "SSL handshake
           | error '-1' sslerr='1' sslerrdesc='error:1425F102:SSL
           | routines:ssl_choose_client_version:unsupported protocol'
           | sslerrfunc='607' sslerrreason='258'"
           | 
           | Setting to http:// results the the above error, along with
           | "httpd/3.30A Server at www.mirbsd.org Port 80" - I think that
           | the target itself is blocking me.
        
         | blueflow wrote:
         | > Android system shell
         | 
         | This hurt a little.
        
         | talideon wrote:
         | Fine for me. I just got a HTTP warning and nothing else.
         | 
         | ~~I believe Android uses toybox, not mksh.~~ It does use
         | toybox, but toybox doesn't appear to include a shell.
        
       | sph wrote:
       | Is this in reference to something? Judging from the comments, NUL
       | bytes in shell scripts are a common occurrence that everybody is
       | celebrating this change as if it were ground breaking.
       | 
       | I mean, it's a good idea, but I wonder what am I missing here.
       | Also what do they mean by post-Postel?
        
         | semiquaver wrote:
         | Postel's Law:
         | https://datatracker.ietf.org/doc/html/rfc761#section-2.10
        
         | JimDabell wrote:
         | Postel's Law, also known as the Robustness Principle:
         | 
         | > be conservative in what you do, be liberal in what you accept
         | from others
         | 
         | It's intended as a way to maximise compatibility, and people
         | have generally followed it when designing protocols and file
         | formats. However it's led to many security vulnerabilities and
         | has caused a lot of compatibility problems itself. These days a
         | lot of people are realising that it's more harmful than
         | helpful.
        
         | BlackFly wrote:
         | Early spec of TCP had a section on the robustness principle
         | that was generally known as Postel's law
         | (https://datatracker.ietf.org/doc/html/rfc761#section-2.10). At
         | the time and until recently this was considered good design.
         | Nowadays people generally want servers to be stricter in what
         | they accept since decades of experience dealing with diverging
         | interpretations of a specification create problems for
         | interoperability.
        
           | eesmith wrote:
           | "until recently"? More than 10 years just going by HN.
           | https://news.ycombinator.com/item?id=5161214
           | 
           | I think HTML showed the problem with Postel's principle.
           | Quoting "Postel's Law is not for you" at
           | http://trevorjim.com/postels-law-is-not-for-you/ from 2011
           | 
           | > The next version of HTML, HTML5, should considerably reduce
           | the problem of browser incompatibilities. It does this, in
           | part, by rejecting Postel's Law for browser implementors.
           | Instead of allowing browsers to be liberal when dealing with
           | "flawed" markup, HTML5 requires them to parse it exactly as
           | in the HTML5 specification, and that specification is given
           | much more precisely than before, in the form of a
           | deterministic state machine, in fact. HTML5 is trying to give
           | implementors no leeway at all in this, in the name of browser
           | compatibility.
        
             | cesarb wrote:
             | > "until recently"? More than 10 years just going by HN.
             | 
             | The TCP protocol is from the 1970s (according to Wikipedia,
             | it's from 1974, which is 50 years ago). Something which
             | only happened 10 years ago is recent.
        
               | eesmith wrote:
               | The robustness principle dates to RFC 761 from January
               | 1980, not 1974, making it only 44 years ago.
               | https://www.rfc-editor.org/rfc/rfc761#section-2.10
               | 
               | The citations I gave you were ones I knew existed. I know
               | there was criticism in the early 2000s because we were
               | debating it back then, but I don't have those citations
               | handy.
               | 
               | Checking now, the Wikipedia entries points to criticism
               | in RFC 3117, from 2001, at
               | https://datatracker.ietf.org/doc/html/rfc3117 :
               | 
               | > Counter-intuitively, Postel's robustness principle ("be
               | conservative in what you send, liberal in what you
               | accept") often leads to deployment problems.
               | 
               | That's why I knew to question was 'until recently' was
               | supposed to me.
        
       | saagarjha wrote:
       | > There appears to be one piece of software which is
       | misinterpreting guidance of this, and trying to depend upon
       | embedded NUL.
       | 
       | Curious what this is
        
         | semiquaver wrote:
         | I wonder if it's https://justine.lol/ape.html / cosmopolitan
         | libc
        
           | eesmith wrote:
           | Shouldn't be. See the "exit 1" in your link? That's the end
           | of the shell script, and as the OpenBSD link says;
           | 
           | > It remains possible to put arbitrary bytes _AFTER_ the
           | parts of the shell script that get parsed  & executed (like
           | some Solaris patch files do). But you can't put arbirary
           | bytes in the middle,
        
             | oguz-ismail wrote:
             | It is. Binaries generated by cosmocc have NUL in the
             | middle.
        
               | comex wrote:
               | Ah, indeed. Here are the first 16 bytes of one:
               | 
               | 4d 5a 71 46 70 44 3d 27 0a 00 00 10 00 f8 00 00
               | |MZqFpD='........|
               | 
               | There are already nul bytes here, and there are a lot
               | more before the single quote gets closed at offset 0x200.
        
               | eesmith wrote:
               | And I can confirm a NUL in 11th byte of my hello.c a.out:
               | >>> s[:11]       b"MZqFpD='\n\n\x00"
               | 
               | Looking closer, I missed the content of "BIOS BOOT
               | SECTOR".
        
           | chubot wrote:
           | I'm pretty sure it is, I remember reading something about
           | this
           | 
           | Yeah I found it here
           | 
           | https://news.ycombinator.com/item?id=41030960
           | 
           | 2019 bug - https://austingroupbugs.net/view.php?id=1250
           | 
           | https://justine.lol/cosmo3/
           | 
           | > This is an idea whose time has come; POSIX even changed
           | their rules about binary in shell scripts specifically to let
           | us do it.
           | 
           | FWIW I agree with this OpenBSD change, which says more
           | pointedly
           | 
           |  _All the shells are written in C, and majority of them use C
           | strings for everything, which means they cannot embed a NUL,
           | so this is not surprising. It is quite unbelievable there are
           | people trying to rewrite history on a lark, and expecting the
           | world to follow alone._
           | 
           | i.e. it's not worth it to change a bunch of old code in order
           | to allow making code more esoteric.
           | 
           | We want systems code to be more predictable, reliable, and
           | less esoteric ... not more esoteric
        
             | asveikau wrote:
             | > POSIX even changed their rules about binary in shell
             | scripts specifically to let us do it.
             | 
             | I'd seen this quote around. The fact that the standards
             | were changed to allow it never struck me as a good
             | indication that it should be relied upon. It seems rather
             | backwards of how these standards work.
             | 
             | I got flamed on HN once for saying cosmopolitan libc
             | shouldn't be used for production because it relies on weird
             | behaviors and implementation quirks that aren't really an
             | ABI.
        
               | comex wrote:
               | Looking at this further, the standards change doesn't
               | even match what Cosmopolitan is doing.
               | 
               | From the 'changed their rules' link, the 'Desired Action'
               | is to add this text: "The input file may be of any type,
               | but the initial portion of the file intended to be parsed
               | according to the shell grammar [..] shall not contain the
               | NUL character."
               | 
               | This handles things like shar archives where you have a
               | shell script at the beginning, then an exit command, then
               | binary gunk.
               | 
               | But Cosmopolitan binaries are not just shell scripts with
               | binary. They're hybrids of shell script and DOS
               | executable. And apparently this requires putting nul
               | bytes right near the beginning (see my other comment,
               | https://news.ycombinator.com/item?id=41640331), in the
               | "portion [..] intended to be parsed according to the
               | shell grammar". Which explicitly violates the new text.
               | 
               | I can understand why this hack is needed for what
               | Cosmopolitan is trying to accomplish, but it makes no
               | sense to claim POSIX blessed it.
        
               | chubot wrote:
               | Yeah exactly, that was my reading too! The claim in the
               | link doesn't match what the POSIX bug says
               | 
               | If it did, then would be a sign that the POSIX process is
               | not working well
               | 
               | Because POSIX is supposed to be descriptive of what
               | exists, not prescribe new behavior
        
               | enriquto wrote:
               | > POSIX is supposed to be descriptive of what exists, not
               | prescribe new behavior
               | 
               | Well, it can be argued that ape already existed when that
               | POSIX change was written :)
               | 
               | Everything is working as intended. Nothing to see here.
               | Move along. Move along.
        
           | tiffanyh wrote:
           | Just yesterday I asked @jart, here on HN, about Cosmo &
           | OpenBSD.
           | 
           | https://news.ycombinator.com/item?id=41627889
           | 
           | APE was mentioned and some interesting tidbits in the GitHub
           | link provided in the HN comment above.
        
       | Taikonerd wrote:
       | On a similar note, I sometimes think about how newline characters
       | are allowed in filenames, and how that can break simple...
       | for each $filename in `ls`
       | 
       | loops -- because in many contexts, UNIX treats newlines as a
       | delimiter.
       | 
       | Is there any legitimate use for filenames with newlines?
        
         | Joker_vD wrote:
         | Sticky notes on the desktop :) Who needs data storage when you
         | can store it all in the metadata?
        
         | IsTom wrote:
         | You can also create files named e.g. '--help' (if you're not
         | particularly malicious) and with globbing it'll cause e.g. 'ls
         | *' to print help.
        
           | jasonjayr wrote:
           | touch -- '-f ..'
           | 
           | (If you want to lay an evil trap)
           | 
           | Remember that in most option parsing libraries, putting '--'
           | in your arguments stops option parsing, so you can safely
           | run:                   rm -- '-f ..'
        
         | bityard wrote:
         | Well, knowing how to deal with wacky input and corner cases are
         | a requirement of learning ANY programming language. Bourne-
         | style shells are no exception.
         | 
         | Your example has illegal syntax, but the biggest issue is that
         | you should never parse the output of ls. The shell has built-in
         | globbing. This is how you would loop over all entries (files,
         | dirs, symlinks, etc) in the current directory without getting
         | tripped up by whitespace:                   for e in *; do echo
         | "got: $e"; done
        
           | Taikonerd wrote:
           | > knowing how to deal with wacky input and corner cases are a
           | requirement of learning ANY programming language.
           | 
           | In general, I agree. But if there's a corner case that
           | occasionally breaks naive code but otherwise doesn't do
           | anything, then I'm going to think, "maybe we should just
           | remove that corner case."
        
             | bell-cot wrote:
             | Replace "maybe" with " _OBVIOUSLY_ ". Keeping useless-but-
             | hazardous "features" in any language is as idiotic as
             | keeping a heap of oily rags in the furniture factory
             | warehouse.
        
             | zokier wrote:
             | David Wheeler has been complaining (and suggesting fixes)
             | about this for a long time:
             | https://dwheeler.com/essays/fixing-unix-linux-
             | filenames.html
             | 
             | safename LSM https://lwn.net/Articles/686789/
        
               | Taikonerd wrote:
               | Thank you, I had wondered if there was something like
               | safename.
        
         | chuckadams wrote:
         | > Is there any legitimate use for filenames with newlines?
         | 
         | IMHO no, but they can exist, so you need to handle them without
         | blowing up. Also, even spaces are considered delimiters here,
         | which is why it's bad form to parse the output of ls.
         | $ touch "foo bar baz"         $ for f in `ls`; do echo $f; done
         | foo         bar         baz              # always use double
         | quotes, though they aren't needed here         $ for f in *; do
         | echo "$f"; done          foo bar baz
         | 
         | At least the OS guarantees you won't run into NUL though.
        
           | kstrauser wrote:
           | I'm not in a place where I can easily check. What happens
           | there if the file name contains a quote?
        
             | chuckadams wrote:
             | It's fine, the content of an expanded variable isn't parsed
             | further:                   $ touch "foo \"bar baz"; for f
             | in *; do echo "$f"; done         foo "bar baz
             | # quotes don't affect it either         $ touch "foo \"bar
             | baz"; for f in *; do echo $f; done         foo "bar baz
             | 
             | Though once you start passing args with quotes to other
             | scripts, things get ugly. Rule of thumb is to always pass
             | with "$@", and if that isn't enough to preserve quoting for
             | whatever use case, write them out to a tempfile instead, or
             | don't use a shell script for it in the first place.
        
               | kstrauser wrote:
               | What about in the case of                 for f in `ls`;
               | do echo "$f"; done
               | 
               | Same behavior, for the same reason?
        
               | chuckadams wrote:
               | The quotes are preserved, but backquote expansion fills
               | the argument list using any whitespace as a delimiter.
               | $ for f in `ls`; do echo "$f"; done         foo
               | "bar         baz
               | 
               | If you absolutely must parse ls (let's assume it's some
               | other script that outputs items with spaces) and the
               | output can contain spaces, you have a few options:
               | $ ls | while read f; do echo "$f"; done         foo "bar
               | baz              # parens keep the IFS change isolated to
               | a subshell         $ (IFS="\n"; for f in `ls`; do echo
               | "$f"; done)         foo "bar baz
               | 
               | But if your filenames contain newlines, you'll really
               | want to stick with the glob expansion, or output custom
               | delimiters and set IFS to that.
        
               | ChoHag wrote:
               | > If you absolutely must parse ls
               | 
               | ... stop and rethink your options. You may be able to get
               | away with parsing the first columns of ls -l but even
               | then a pathologically named file could make itself look
               | like a line of ls output.
               | 
               | It's simply not possible in all cases. If you can
               | constrain your input then you may be able to make use of
               | it but in the general case, that's why xargs and find
               | grew a -0 option.
               | 
               | Or glob.
        
               | chuckadams wrote:
               | Agreed when it comes to ls, but this applies to any
               | script whose output you capture. I personally prefer
               | "while read" loops but I'm probably screwed if someone
               | smuggles in a newline.
        
               | unqueued wrote:
               | If you are iterating over a lot of files, a read while
               | loop can be a major bottleneck. As long as you use the
               | null options from find and pipe into xargs, you are
               | usually safe.
               | 
               | I've found it can reduce minutes down to seconds for
               | large filesets.
               | 
               | If you have to process a large number of files, you only
               | have to call your utility once, instead of once per file.
               | But you can use xargs to group the files and put them
               | directly into the argv of a program.
               | 
               | Something like:                 # Set the setgid bit for
               | owner and group of all folders       find . -type d
               | -print0 | xargs -0 chmod g+s            # Make the
               | targets of symlinks immutable       find . -type l -print
               | 0 | xargs -0 readlink -z | xargs -0 chattr +i
               | 
               | Way faster. But there are lots of caveats. Make sure your
               | programs support it. Maybe read the xargs man page.
        
               | folmar wrote:
               | `find` is almost always easier, but you can get quite far
               | with `ls -Q` if you can assume GNU ls.
        
           | unqueued wrote:
           | There is a pretty good syntax for dealing with nasty
           | filenames, if you must: ANSI-C quoting[1].
           | 
           | If you have to output in a shellscript in this format, use
           | printf %q
           | 
           | from man printf:                      %q     ARGUMENT is
           | printed in a format that can be reused as shell input,
           | escaping non-printable                   characters with the
           | proposed POSIX $'' syntax.
           | 
           | It is just $'<nasty ansi-c escaped chars>'
           | 
           | $ touch $'\nHello\tWorld\n' $ ls
           | 
           | One thing I do like about a filesystem that fully supports
           | POSIX filenames is that at the end of the day a filesystem is
           | supposed to represent data. I think it is totally sensible to
           | exclude certain characters, but that it should be done higher
           | up in the stack if possible. Or have a flag that is set at
           | mount time. Perhaps even by subvolume/dataset.
           | 
           | One thing I haven't seen mentioned is that POSIX filenames
           | are so permissive that they allow you to have bytes as
           | filenames that are invalid UTF-8. That's why the popular
           | ncdu[2] program does NOT use json as it's file format,
           | although most think it does. It's actually json but with raw
           | POSIX bytes in filename fields, which is outside of the
           | official json spec. That does not stop folks from using json
           | tools to parse ncdu output though.
           | 
           | Another standard that is also very permissive with filenames
           | is git. When I started exploring new ways to encode data into
           | a git repo, it was only natural that I encountered issues
           | with limitations of filesystems that I would check out in.
           | 
           | Try cloning this repo, and see if you are able to check it
           | out: https://github.com/benibela/nasty-files
           | 
           | It is amazing how many things it breaks.
           | 
           | If you are writing software that deals with git filenames or
           | POSIX filenames (that includes things like parsing a zip file
           | footer), you can not rely on your standard json encoding
           | function, because the input may contain invalid utf-8. So you
           | may need to do extra encoding/filtering.
           | 
           | [1]: https://www.gnu.org/s/bash/manual/html_node/ANSI_002dC-
           | Quoti...
           | 
           | [2]: https://dev.yorhel.nl/ncdu/jsonfmt
        
         | fragmede wrote:
         | A GUI file browser will display the filename with a newline in
         | it as a new line (and an icon above it) so as to be
         | asthetically pleasing.
        
         | xxpor wrote:
         | this is why things like `find -print0` exist, which is IMO the
         | easiest way to handle this robustly.
        
       | soupbowl wrote:
       | I wish FreeBSD replaced /bin/sh with OpenBSDs.
        
         | rollcat wrote:
         | FreeBSD made many cool moves in the 14.0 release, like finally
         | getting rid of sendmail and adopting DMA (the irony), so
         | perhaps there's a chance?
         | 
         | But FreeBSD has always been much less focused on
         | polish/cleanliness than OpenBSD; I mean - they have THREE
         | firewalls, wtf.
        
           | toast0 wrote:
           | > they have THREE firewalls, wtf.
           | 
           | I've not used ipf, but ipfw and pf have a different model and
           | different features (although in 14.0, there's more overlap).
           | I have to use them both.
        
       | chrisfinazzo wrote:
       | Related: The installer for iTunes 12.2.1 included a bug which
       | might recursively delete a volume if the path given as input
       | included incorrectly escaped spaces.
        
         | NewJazz wrote:
         | Reminds me of this...
         | 
         | https://hackaday.com/2024/01/20/how-a-steam-bug-once-deleted...
        
       | amiga386 wrote:
       | Here's the actual diff:
       | 
       | https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/ksh/shf.c....
       | 
       | And it looks like that covers all parsed parts of the shell
       | script or history file, _including heredocs_. I get the feeling
       | it 's going to break all shar archives with binary files (not
       | that they're particularly common). It will stop NULs being in the
       | script itself, but it won't stop them coming from other sources,
       | e.g.                   $ var=$(printf '\0hello')         -bash:
       | warning: command substitution: ignored null byte in input
       | $ echo $var         hello
       | 
       | It remains to be seen if this will be adopted by anyone else, or
       | if it'll be another reason to use OpenBSD only as a restricted
       | environment and not as a general computing platform.
       | 
       | > "If there is ONE THING the Unix world needs, it is for
       | bash/ksh/sh to stop diverging further"
       | 
       | > OpenBSD ksh: _diverges further_
        
         | matrix2003 wrote:
         | Eh - I actually like developing on OpenBSD _first_ , because of
         | restrictions like this. If it runs on OpenBSD, you are likely
         | to have fewer bugs around things like malloc.
         | 
         | OpenBSD is also really good about upstreaming bug fixes, which
         | is a good thing. Firefox used to be a dumpster fire of core
         | dumps on OpenBSD, and many issues were uncovered and fixed that
         | way.
        
         | chasil wrote:
         | The only thing that is _required_ to happen is that they all
         | obey the rules of the POSIX shell (when called as  /bin/sh).
         | 
         | Otherwise, anything goes.
         | 
         | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...
         | 
         | All the userland utilities must have the behavior (and
         | problems) specified here:
         | 
         | https://pubs.opengroup.org/onlinepubs/9699919799/utilities/
        
         | bell-cot wrote:
         | > Here's the actual diff:
         | 
         | Only 8 short, simple lines of c code. Beautiful.
        
       | whiterknight wrote:
       | Side note: tell your startup to switch its "hardware with Ubuntu
       | Linux inside" to BSD. You will have a much more stable and simple
       | platform that can last a long time.
        
         | quesera wrote:
         | The recommendation is solid, but FWIW no one looking for
         | stability would choose Ubuntu, among the Linuxen!
        
       | parasense wrote:
       | Is this going to murder those fancy shell scripts that self-
       | extract a program appended to the tail, which is really just an
       | encoded blob of some kind, presumably compressed, etc.. ???
        
         | talideon wrote:
         | Not if it was done competently. Shar files and the likes
         | shouldn't contain NULs, even if they contain compressed data.
         | The appended data should be binary safe.
        
           | Thiez wrote:
           | And in case your data does contain NULs, presumably one could
           | add a layer of base64 encoding. Not nice for the filesize,
           | but also much less likely to upset a text editor when the
           | script is opened (even in the absence of NUL bytes).
        
       | enriquto wrote:
       | Great. Now forbid spaces in filenames.
        
         | ben_bai wrote:
         | Funny enough filenames are just byte sequences. So almost
         | anything goes.
         | 
         | There was just some patch that added '/' protection, because
         | that's the only character that's not allowed in filenames.
         | 
         | https://github.com/openbsd/src/commit/46f7109a9e03df89b66ada...
        
       | klooney wrote:
       | Does this break the self extracting tarball trick, where you have
       | a bootstrap shell script with a binary payload appended?
        
         | oguz-ismail wrote:
         | No, they still work.
        
       | 2snakes wrote:
       | Surprised noone has mentioned the Crowdstrike issue, which was
       | due to NUL characters wasn't it?
        
       | raverbashing wrote:
       | > There appears to be one piece of software which is
       | misinterpreting guidance of this, and trying to depend upon
       | embedded NUL.
       | 
       | Big oof here. Why? How?
       | 
       | > If there is ONE THING the Unix world needs, it is for
       | bash/ksh/sh to stop diverging further by permitting STUPID INPUT
       | that cannot plausibly work in all other shells. We are in a post-
       | Postel world.
       | 
       | Amem
        
       | jrockway wrote:
       | I like the term post-Postel.
       | 
       | There are two reliability constraints that all software faces;
       | security and interoperability. The more lax you are about
       | validation, the more likely interoperability is. "That's weird,
       | I'll just do whatever" is doing SOMETHING, and it's often to the
       | end user's liking. But, you also enter a more and more undefined
       | state inside the software on the other side, and that's where
       | weird things happen. Weird things happening typically manifest as
       | security problems. So the more effort you go to to minimize the
       | possibility of entering a weird state, the more confidence you
       | have that your software is working as specified.
       | 
       | Postel's Law made a lot of sense to me when developing the early
       | Internet. A lot of people were reading imperfect RFCs, and it was
       | nice when your HP server could communicate with a Sun
       | workstation, even though maybe some bit in the TCP header was set
       | wrong. But now? You just gotta get it right and push a hotfix
       | when you realize you messed something up. (Sadly, I don't think
       | it's possible. Middleboxes are getting more and more popular. At
       | work, we make a product where the CLI talks to the server over
       | HTTP/2. We also install Zscaler on every workstation. Zscaler
       | simply blocks HTTP/2. So you can't use our product. Awkward.)
        
         | Thiez wrote:
         | This is also where Google went right with QUIC: encrypt as much
         | as possible to show middleboxes the least possible. This
         | combats ossification. Then again it seems likely middleboxes
         | will just block QUIC (or UDP in general).
        
       | 0xbadcafebee wrote:
       | > If there is ONE THING the Unix world needs, it is for
       | bash/ksh/sh to       > stop diverging further by permitting
       | STUPID INPUT that cannot       > plausibly work in all other
       | shells.  We are in a post-Postel world.       >        > It
       | remains possible to put arbitrary bytes *AFTER* the parts of the
       | > shell script that get parsed & executed (like some Solaris
       | patch files       > do).  But you can't put arbirary bytes in the
       | middle, ahead of shell       > script parsed lines, because
       | shells can't jump to arbitrary offsets       > inside the input
       | file, they go THROUGH all the 'valid shell script       > text
       | lines' to get there.            So here it is again, an example
       | of OpenBSD making software behavior saner for all of us.
       | 
       | I don't consider use of all caps over a minor issue to be sane
       | behavior. At best it's immaturity (trying to force your point
       | rather than persuade), and at worst it's an emotional imbalance
       | that effects judgement. That said, it's ksh, on OpenBSD, so I
       | couldn't care less what they do.
        
         | PufPufPuf wrote:
         | What a weird take. There are just a few emphasized words in the
         | commit message.
        
       | opk wrote:
       | I've always found the fact that zsh copes with NUL characters in
       | variables etc to be really useful. I can see why this approach
       | makes sense for OpenBSD but they can't prevent NULs appearing in
       | certain places like piped input.
        
       ___________________________________________________________________
       (page generated 2024-09-24 23:01 UTC)