[HN Gopher] OpenBSD now enforcing no invalid NUL characters in s...
___________________________________________________________________
OpenBSD now enforcing no invalid NUL characters in shell scripts
Author : CTOSian
Score : 152 points
Date : 2024-09-24 13:06 UTC (9 hours ago)
(HTM) web link (www.undeadly.org)
(TXT) w3m dump (www.undeadly.org)
| nubinetwork wrote:
| So I can't bury a tarball inside a shell script anymore?
| josephcsible wrote:
| You still can; it just needs to go at the end:
|
| > It remains possible to put arbitrary bytes _AFTER_ the parts
| of the shell script that get parsed & executed (like some
| Solaris patch files do).
| volkadav wrote:
| Looks like you might be able to at the end of the file, reading
| the commit message, just not willy-nilly in the middle. :)
| lupusreal wrote:
| Does this break those self-extracting script/tar files? I forget
| how those are done, I haven't seen one in many years.
| zx2c4 wrote:
| From the article: "It remains possible to put arbitrary bytes
| _AFTER_ the parts of the shell script that get parsed &
| executed (like some Solaris patch files do). "
| sneela wrote:
| Are you talking about Shar?
|
| https://en.wikipedia.org/wiki/Shar_(file_format)
| ape4 wrote:
| That was a neat idea back in the day but should disallowed
| now. Running downloaded executables considered harmful.
| Joker_vD wrote:
| Not in the "Installation: just run `docker run kekw/our-
| shiny-ai-chatbot` in your shell" world we're living today.
| nucleardog wrote:
| I think the better example is the all-too-common:
| "Installation: Just run `curl -sL
| http://goo.gl/hsjdiNgtehsn | sudo bash`"
| osmsucks wrote:
| > Running downloaded executables considered harmful
|
| Most executables are downloaded. :)
| 73kl4453dz wrote:
| They were generally uuencoded or similar
| jancsika wrote:
| If you don't know anything about OpenBSD, here's a fun thing:
|
| 1. Randomly choose "yes" or "no" to this question.
|
| 2. Read the post and get the answer.
|
| 3. Repeat until you begin to get a tingly "Spidey sense" that
| overrides your random-choice.
|
| My Spidey sense here was, "Yes, because OpenBSD would have
| already thought about and covered that use-case." And indeed,
| toward the end of the post, that contingency is covered and
| documented.
|
| Note: if you try this at your job and sense that the company
| will almost always choose the worst option, you should probably
| leave that job.
| bell-cot wrote:
| Kudos to OpenBSD!
|
| Similar to the olde-tyme "-o noexec" and "-o nosuid" options for
| `mount`, there should be easy, no-exceptions ways to blanket ban
| other types of simply obvious red-flag activity.
| sneela wrote:
| > This was in snapshots for more than 2 months, and only spotted
| one other program depending on the behaviour (and that test
| program did not observe that it was therefore depending in
| incorrect behaviour!!)
|
| Fascinating. I wonder what that program is, and why it depends on
| the NUL character.
| mcculley wrote:
| "We are in a post-Postel world" is a great way to put it. This
| needs to be repeated by everyone working with file formats or
| accepting untrusted input.
| nabla9 wrote:
| Agreed.
|
| When every implementation in wide use has their own quirks, you
| must support them all to make your program widely used. Every
| special case is yet another potential bug to chase down.
|
| It also allows "Embrace, extend, and extinguish" -strategy that
| Microsoft used so successfully to assfuck the internet over a
| decade.
| pjmlp wrote:
| I think you mean Google.
| nabla9 wrote:
| No. The Microsoft. MS invented the term. DOJ found that MS
| used "Embrace, extend, and extinguish" in internal
| documents.
|
| Younger people don't know how absolutely ruthless and
| harmful Wintel monopoly was under Gates. Java did not work
| on purpose. Javascript did not work for purpose.
| <!--[if IE]>
|
| everywhere.
|
| They attempted to kill open web in the crib with their
| blackbird project. Only MSN (The Microsoft Network) for
| normal people.
| bigstrat2003 wrote:
| Agreed that it is a Microsoft term. But in my experience,
| it is older people who incorrectly judge Microsoft
| ruthlessness, not younger people. I am of an age where I
| remember well what Microsoft was like in those days, and
| it frankly was not as bad as people make it out to be.
| Nor was it really worse than the ruthless tech companies
| of today.
| quesera wrote:
| I was there too, and I disagree completely. Microsoft was
| not just ruthless, they were ubiquitous. They sabotaged
| any perceived competitors in anticompetitive, market- and
| industry-damaging ways.
|
| You (the generic "you") can complain all you want about
| Apple today, but you have another perfectly viable
| option. And Apple is (almost-entirely) happy to grow
| market share on merits without salting the earth of any
| rivals.
|
| In Microsoft's heyday, that was not true. Those of us who
| rejected MS back then did so at a much higher cost than
| green chat bubbles.
|
| It was worth it though. And we did win, eventually.
| chasil wrote:
| Microsoft could not win, although they tried very hard.
|
| Windows was never going to scale down to the portable
| devices that we now use (because defeating Apple would
| have been very difficult, and AOSP made it
| insurmountable).
|
| Windows was never going to scale up to the top 500
| supercomputer list (for largely economic reasons).
|
| Microsoft itself has tacitly admitted that Azure is
| better served by Linux, and we ponder why.
|
| Did the DoJ actions against Microsoft really have an
| impact? I don't know.
| specialist wrote:
| > _...without salting the earth of any rivals._
|
| Microsoft embodied the adage "It's not enough to win.
| Everyone else must lose."
| pjmlp wrote:
| Except it Google that morphed the Web into ChromeOS, with
| the help of EVERYONE that ships it alongside their
| applications, as they can't be bothered to learn cross-
| platform frameworks.
|
| Many of them people that used to complain about Micro$oft
| and should know better.
| IshKebab wrote:
| Anyone who was around for the IE6 era knows how much
| worse it was than the current Chrome era. It's not even
| close.
| nabla9 wrote:
| You clearly don't remember or know about how bad it was
| in IE6 era and before.
| stackghost wrote:
| "Postel" is not a term that carries any significance for me,
| and Googling that word didn't turn anything up that seemed
| relevant.
|
| Who or what is a Postel?
| teraflop wrote:
| It's a reference to "Postel's law" which is a pretty well-
| known principle in the networking world, and in software more
| broadly. Named after Jon Postel, who edited and published
| many of the RFCs describing core Internet protocols.
|
| https://en.wikipedia.org/wiki/Robustness_principle
| Ndymium wrote:
| It's a reference to Jon Postel who wrote the following in RFC
| 761[0]: TCP implementations should follow a
| general principle of robustness: be conservative in
| what you do, be liberal in what you accept from
| others.
|
| Postel's Law is also known as the Robustness principle. [1]
|
| [0] https://datatracker.ietf.org/doc/html/rfc761#section-2.10
|
| [1] https://en.wikipedia.org/wiki/Robustness_principle
| arcanemachiner wrote:
| I've always felt that this was a misguided principle, to be
| avoided _when possible_. When designing APIs, I think about
| this principle a lot.
|
| My philosophy is more along the lines of "I will
| begrudgingly give you enough rope to hang yourself, but I
| won't give you enough to hang everybody else."
| quesera wrote:
| HTML parsing is the modern-ish layer-uplifted example of
| liberal acceptance.
|
| I won't argue that this hasn't been a disaster for
| technologists, but there are many arguments that this was
| core to the success of HTML and consequently the web.
|
| Which, yes, could be considered its own separate
| disaster, but here we are!
| pas wrote:
| It makes sense in a "costumer obsessed" way. The user
| agent tries to show content, tries to send requests and
| receive the response on behalf of the client (costumer),
| and _ceteris paribus_ it 's better for the client if the
| system works even if there's some small error that can be
| worked around, right?
|
| but of course this leads to the tragedy of anticommons,
| too many people have an effective "veto" (every shitty
| middlebox, every "so easy to use" 30 line library that
| got waaay to popular now contributes to ossification of
| the stack.
|
| what's the solution? similarly careless adoption of new
| protocols? and hoping for the best? maybe putting an
| emphasis on provable correctness, and if something is not
| conformant to the relevant standard then not considering
| it "broken" for the "if it ain't broken don't touch it"
| principle?
| IshKebab wrote:
| Ironically it leads to less robust systems in the long
| term.
| thaumasiotes wrote:
| > Postel's Law is also known as the Robustness principle.
|
| Really? It seems like it's obviously just a description of
| how natural language works.+ But in that case, there's an
| enforcement mechanism (not well understood) that causes
| everyone to be conservative in what they send.
|
| We can observe, by the natural language 'analogy', that the
| consequence of following this principle is that you never
| have backwards compatibility. Otherwise things generally
| work.
|
| + Notably, it has nothing to do with how math works, making
| it a strange choice for programming.
| komon wrote:
| A reference to Postel's Law: be conservative in what you
| produce and liberal in what you accept.
|
| The law references that you should strive to follow all
| standards in your own output, but you should make a best
| effort to accept content that may break a standard.
|
| This is useful in the context of open standards and evolving
| ecosystems since it allows peers speaking different versions
| of a protocol to continue to communicate.
|
| The assertion being made here is that the world has become
| too fraught with exploiting this attitude for it to continue
| being a useful rule
| godshatter wrote:
| What would have been the result of John Postel advocating
| for conservative inputs, I wonder? I'm wondering if the
| most common protocols would have been bypassed if they had
| all done this by other protocols that allowed more liberal
| inputs.
| Joker_vD wrote:
| Yep. Which is why Postel law is, sadly, more like a law
| of nature (see also "worse is better") than an
| engineering principle you may or may not follow.
| mrighele wrote:
| I know it is a single example and we should extrapolate
| much out of it, but in the case of html those who
| accepted more liberal input (html4/5) won over over those
| that were more conservative (xhtml).
| jancsika wrote:
| Am I correct that malformed pages in xhtml would have
| triggered the browser to output a red XML error and fail
| to render the page at all?
| Calavar wrote:
| Yes, but only if you served the XHTML with the proper
| MIME type of application/xhtml+xml. Nearly everyone
| served it as text/html, which would lead to the document
| being intepreted as this weird pseudo XHTML/HTML4 hybrid
| dialect with all sorts of brower idiosyncrasies [1].
|
| [1] https://www.hixie.ch/advocacy/xhtml
| 0cf8612b2e1e wrote:
| I would almost argue a failing of so many standards is
| the lack of surrounding tooling. Is this implementation
| correct? Who knows! Try it against this other version and
| see if they kind of agree. More specifications need to
| require test suites.
| edflsafoiewq wrote:
| Not really, since in the end HTML5 defined a precise
| parsing algorithm that AFAIK everyone follows.
| quesera wrote:
| HTML5 was born in an era of decent HTML authoring
| tooling. Very few people write HTML by hand nowadays.
| This was not true of earlier versions.
|
| Also note that HTML5 codified into liberal acceptance
| some of the "lazy" manual errors that people made in the
| early days (many of which were strictly and noisily
| rejected in XHTML, for example).
| mikaraento wrote:
| RFC 9413 referenced in a parent mentions HTML. It points
| out that formats meant to be human-authored may benefit
| more from being liberally accepted.
|
| I also read that XHTML made template authoring hard, as
| the template itself might not be valid XHTML and/or
| different template inputs might make output invalid. (I
| sadly can't find the source of this point right now, but
| I can't claim credit for it).
| arp242 wrote:
| HTML is rather different because it's authored by people.
| It's typically (though not always!) a good idea to not be
| too pedantic about accepting user input if you can. XHTML
| (served with the correct Content-Type) will completely
| error out if you made a typo and didn't test carefully
| enough. Useful in dev cycle? Sure. In production? Less
| so. "The entire page goes tits up because you used <br>
| instead of <br />" is just not helpful (and also:
| needlessly pedantic).
|
| But that doesn't really apply to protocols like TCP.
| Postel's "law" is best understood in the context of 1980,
| when TCP had been around for a while but without a real
| standard, everyone was kind of experimenting, and there
| were tons of little incompatibilities. In this context,
| it was reasonable and practical advice.
|
| For a lot of other things though: not so much. "Fail
| fast" is typically the better approach, which will
| benefit everyone, _especially_ the people implementing
| the protocols.
|
| This is also why Sendmail became the de-facto standard
| around the same time by the way: it was bug-compatible
| with everything else. Later this become a liability
| (sendmail.cf!), but originally it was a great feature.
| miki123211 wrote:
| Probably more convoluted protocols, because there are
| always things that you do accept and that can be used to
| negotiate protocol extensions.
|
| Imagine a protocol where both sides have to speak JSON
| with a rigidly-defined structure, and none of the sides
| is allowed to ask whether the other supports any
| extension. Such a protocol looks impossible to extend,
| but that is not the case, you can indicate that you speak
| a "relaxed" version of that protocol by e.g. following
| your first left brace by a predefined, large number of
| whitespace characters. If you see a client doing this,
| you know they won't drop the connection if you include a
| supported_extensions field, and you're still able to
| speak the rigid version to strict clients.
| quesera wrote:
| This made me laugh, because it's even _more_ terrible
| than the most ridiculous chicanery we had to vomit into
| HTML and CSS over the years (most of which was the fault
| of MSIE6).
| CoastalCoder wrote:
| Adding to the sibling comments, this is briefly covered in
| Eric Raymond's _wonderful_ book, "The Art of Unix
| Programming" [0].
|
| [0] https://en.wikipedia.org/wiki/The_Art_of_Unix_Programming
| ok123456 wrote:
| The fact that googling Postel was worthless also indicates
| we're in a post-google search world.
| AStonesThrow wrote:
| Bing had no trouble at all finding him from my device.
| Brian_K_White wrote:
| 2nd result on kagi was about him but in the form of another
| critic.
|
| https://datatracker.ietf.org/doc/draft-thomson-postel-was-
| wr...
|
| Hard disagree.
|
| It's a valid argument, but I say it's merely an argument,
| not an argument that wins or should win.
|
| But also, I say that detecting out of spec or unexpected
| input and handling it in any other way than crashing IS
| adhering to Postel.
|
| Refusing to process a request is better than munging the
| data according to your own creative interpretation of
| reasonable or likely, and then processing that munged data.
|
| I consider that to be within Postel to return a nice error
| (or not if that would be a security divulgence). Failing
| Postel would be to crash or do anything unintended.
| skybrian wrote:
| Google's results for "Postel's law" and "Jon Postel" are
| fine. "Postel" is ambiguous, a fairly common surname, so
| websites of unrelated companies show up, and a
| disambiguating page on wikipedia that links to Jon Postel
| and several other people.
| ok123456 wrote:
| I thought the whole point of letting Google surveil your
| entire life was they would know that if you're interested
| in computing and networks, to the point of participating
| on news.hackernews.com, then they'd know that if you're
| searching for "Postel," you'd probably want Postel's law
| to be on the first page.
|
| We're back at pre-1998 search, where we have to specify
| more and more context just to get results that aren't
| noise.
| stackghost wrote:
| I'm actually astounded at how quickly the quality of Google
| search results has tanked in recent years.
| runjake wrote:
| Jon Postel was instrumental in making the Internet what it is
| today.
|
| https://en.wikipedia.org/wiki/Jon_Postel
|
| The Wikipedia article is kinda unclear and doesn't provide
| the proper context, so:
|
| - Ran IANA, which assigned IP addresses for the Internet.
|
| - Editor of RFCs, which are documents that defined protocols
| in use by the Internet.
|
| - He wrote a bunch of important RFCs that defined how some
| very important protocols should work.
|
| - Created or helped create SMTP, DNS, TCP/IP, ARPANET, etc.
| cesarb wrote:
| > "We are in a post-Postel world" is a great way to put it.
|
| See also RFC 9413 (https://www.rfc-
| editor.org/rfc/rfc9413.html), originally called "draft-thomson-
| postel-was-wrong" (https://datatracker.ietf.org/doc/draft-
| thomson-postel-was-wr...).
| Brian_K_White wrote:
| There is no such thing as a post Postel world. But handling the
| input in any other way than crashing or ub IS perfectly Postel.
|
| Deciding that nul is invalid data, and refusing to allow it,
| and refusing to munge the data and proceed based on the munged
| data that you essentially made up, as long as whatever you did
| do instead was graceful and intentional, to me that is
| perfectly Postel.
| chasil wrote:
| I was going to check the status of mksh (the Android system
| shell), but the project page returns:
|
| "Unavailable For Legal Reasons - Sorry, no detailled error
| message available."
|
| http://www.mirbsd.org/mksh.htm
|
| The Android system shell is now abandoned? This is also in rhel9
| basesos.
| chaosite wrote:
| Looks fine here, maybe they're blocking your IP range for some
| reason?
| kbolino wrote:
| It's blocked for me too, but only on my home Internet
| (Xfinity), not my phone (Google Fi/T-Mobile).
| chasil wrote:
| I see it on my T-Mobile device also. Strange.
| torstenvl wrote:
| Works fine for me on Xfinity Home via WiFi, Xfinity Mobile,
| T-Mobile, and Visible by Verizon.
| kbolino wrote:
| Whatever the issue was, it seems to have been resolved
| sometime after I last checked.
| tux3 wrote:
| Works from an EU IP, so whatever it is, it's probably not GDPR?
| fragmede wrote:
| What's your browser? The server is using an old TLS version
| which is no longer supported, and some clients will try https
| and fail there and not try http.
| chasil wrote:
| I'm using Edge on my corporate desktop.
|
| Edge first tries TLS and comes back with: "SSL handshake
| error '-1' sslerr='1' sslerrdesc='error:1425F102:SSL
| routines:ssl_choose_client_version:unsupported protocol'
| sslerrfunc='607' sslerrreason='258'"
|
| Setting to http:// results the the above error, along with
| "httpd/3.30A Server at www.mirbsd.org Port 80" - I think that
| the target itself is blocking me.
| blueflow wrote:
| > Android system shell
|
| This hurt a little.
| talideon wrote:
| Fine for me. I just got a HTTP warning and nothing else.
|
| ~~I believe Android uses toybox, not mksh.~~ It does use
| toybox, but toybox doesn't appear to include a shell.
| sph wrote:
| Is this in reference to something? Judging from the comments, NUL
| bytes in shell scripts are a common occurrence that everybody is
| celebrating this change as if it were ground breaking.
|
| I mean, it's a good idea, but I wonder what am I missing here.
| Also what do they mean by post-Postel?
| semiquaver wrote:
| Postel's Law:
| https://datatracker.ietf.org/doc/html/rfc761#section-2.10
| JimDabell wrote:
| Postel's Law, also known as the Robustness Principle:
|
| > be conservative in what you do, be liberal in what you accept
| from others
|
| It's intended as a way to maximise compatibility, and people
| have generally followed it when designing protocols and file
| formats. However it's led to many security vulnerabilities and
| has caused a lot of compatibility problems itself. These days a
| lot of people are realising that it's more harmful than
| helpful.
| BlackFly wrote:
| Early spec of TCP had a section on the robustness principle
| that was generally known as Postel's law
| (https://datatracker.ietf.org/doc/html/rfc761#section-2.10). At
| the time and until recently this was considered good design.
| Nowadays people generally want servers to be stricter in what
| they accept since decades of experience dealing with diverging
| interpretations of a specification create problems for
| interoperability.
| eesmith wrote:
| "until recently"? More than 10 years just going by HN.
| https://news.ycombinator.com/item?id=5161214
|
| I think HTML showed the problem with Postel's principle.
| Quoting "Postel's Law is not for you" at
| http://trevorjim.com/postels-law-is-not-for-you/ from 2011
|
| > The next version of HTML, HTML5, should considerably reduce
| the problem of browser incompatibilities. It does this, in
| part, by rejecting Postel's Law for browser implementors.
| Instead of allowing browsers to be liberal when dealing with
| "flawed" markup, HTML5 requires them to parse it exactly as
| in the HTML5 specification, and that specification is given
| much more precisely than before, in the form of a
| deterministic state machine, in fact. HTML5 is trying to give
| implementors no leeway at all in this, in the name of browser
| compatibility.
| cesarb wrote:
| > "until recently"? More than 10 years just going by HN.
|
| The TCP protocol is from the 1970s (according to Wikipedia,
| it's from 1974, which is 50 years ago). Something which
| only happened 10 years ago is recent.
| eesmith wrote:
| The robustness principle dates to RFC 761 from January
| 1980, not 1974, making it only 44 years ago.
| https://www.rfc-editor.org/rfc/rfc761#section-2.10
|
| The citations I gave you were ones I knew existed. I know
| there was criticism in the early 2000s because we were
| debating it back then, but I don't have those citations
| handy.
|
| Checking now, the Wikipedia entries points to criticism
| in RFC 3117, from 2001, at
| https://datatracker.ietf.org/doc/html/rfc3117 :
|
| > Counter-intuitively, Postel's robustness principle ("be
| conservative in what you send, liberal in what you
| accept") often leads to deployment problems.
|
| That's why I knew to question was 'until recently' was
| supposed to me.
| saagarjha wrote:
| > There appears to be one piece of software which is
| misinterpreting guidance of this, and trying to depend upon
| embedded NUL.
|
| Curious what this is
| semiquaver wrote:
| I wonder if it's https://justine.lol/ape.html / cosmopolitan
| libc
| eesmith wrote:
| Shouldn't be. See the "exit 1" in your link? That's the end
| of the shell script, and as the OpenBSD link says;
|
| > It remains possible to put arbitrary bytes _AFTER_ the
| parts of the shell script that get parsed & executed (like
| some Solaris patch files do). But you can't put arbirary
| bytes in the middle,
| oguz-ismail wrote:
| It is. Binaries generated by cosmocc have NUL in the
| middle.
| comex wrote:
| Ah, indeed. Here are the first 16 bytes of one:
|
| 4d 5a 71 46 70 44 3d 27 0a 00 00 10 00 f8 00 00
| |MZqFpD='........|
|
| There are already nul bytes here, and there are a lot
| more before the single quote gets closed at offset 0x200.
| eesmith wrote:
| And I can confirm a NUL in 11th byte of my hello.c a.out:
| >>> s[:11] b"MZqFpD='\n\n\x00"
|
| Looking closer, I missed the content of "BIOS BOOT
| SECTOR".
| chubot wrote:
| I'm pretty sure it is, I remember reading something about
| this
|
| Yeah I found it here
|
| https://news.ycombinator.com/item?id=41030960
|
| 2019 bug - https://austingroupbugs.net/view.php?id=1250
|
| https://justine.lol/cosmo3/
|
| > This is an idea whose time has come; POSIX even changed
| their rules about binary in shell scripts specifically to let
| us do it.
|
| FWIW I agree with this OpenBSD change, which says more
| pointedly
|
| _All the shells are written in C, and majority of them use C
| strings for everything, which means they cannot embed a NUL,
| so this is not surprising. It is quite unbelievable there are
| people trying to rewrite history on a lark, and expecting the
| world to follow alone._
|
| i.e. it's not worth it to change a bunch of old code in order
| to allow making code more esoteric.
|
| We want systems code to be more predictable, reliable, and
| less esoteric ... not more esoteric
| asveikau wrote:
| > POSIX even changed their rules about binary in shell
| scripts specifically to let us do it.
|
| I'd seen this quote around. The fact that the standards
| were changed to allow it never struck me as a good
| indication that it should be relied upon. It seems rather
| backwards of how these standards work.
|
| I got flamed on HN once for saying cosmopolitan libc
| shouldn't be used for production because it relies on weird
| behaviors and implementation quirks that aren't really an
| ABI.
| comex wrote:
| Looking at this further, the standards change doesn't
| even match what Cosmopolitan is doing.
|
| From the 'changed their rules' link, the 'Desired Action'
| is to add this text: "The input file may be of any type,
| but the initial portion of the file intended to be parsed
| according to the shell grammar [..] shall not contain the
| NUL character."
|
| This handles things like shar archives where you have a
| shell script at the beginning, then an exit command, then
| binary gunk.
|
| But Cosmopolitan binaries are not just shell scripts with
| binary. They're hybrids of shell script and DOS
| executable. And apparently this requires putting nul
| bytes right near the beginning (see my other comment,
| https://news.ycombinator.com/item?id=41640331), in the
| "portion [..] intended to be parsed according to the
| shell grammar". Which explicitly violates the new text.
|
| I can understand why this hack is needed for what
| Cosmopolitan is trying to accomplish, but it makes no
| sense to claim POSIX blessed it.
| chubot wrote:
| Yeah exactly, that was my reading too! The claim in the
| link doesn't match what the POSIX bug says
|
| If it did, then would be a sign that the POSIX process is
| not working well
|
| Because POSIX is supposed to be descriptive of what
| exists, not prescribe new behavior
| enriquto wrote:
| > POSIX is supposed to be descriptive of what exists, not
| prescribe new behavior
|
| Well, it can be argued that ape already existed when that
| POSIX change was written :)
|
| Everything is working as intended. Nothing to see here.
| Move along. Move along.
| tiffanyh wrote:
| Just yesterday I asked @jart, here on HN, about Cosmo &
| OpenBSD.
|
| https://news.ycombinator.com/item?id=41627889
|
| APE was mentioned and some interesting tidbits in the GitHub
| link provided in the HN comment above.
| Taikonerd wrote:
| On a similar note, I sometimes think about how newline characters
| are allowed in filenames, and how that can break simple...
| for each $filename in `ls`
|
| loops -- because in many contexts, UNIX treats newlines as a
| delimiter.
|
| Is there any legitimate use for filenames with newlines?
| Joker_vD wrote:
| Sticky notes on the desktop :) Who needs data storage when you
| can store it all in the metadata?
| IsTom wrote:
| You can also create files named e.g. '--help' (if you're not
| particularly malicious) and with globbing it'll cause e.g. 'ls
| *' to print help.
| jasonjayr wrote:
| touch -- '-f ..'
|
| (If you want to lay an evil trap)
|
| Remember that in most option parsing libraries, putting '--'
| in your arguments stops option parsing, so you can safely
| run: rm -- '-f ..'
| bityard wrote:
| Well, knowing how to deal with wacky input and corner cases are
| a requirement of learning ANY programming language. Bourne-
| style shells are no exception.
|
| Your example has illegal syntax, but the biggest issue is that
| you should never parse the output of ls. The shell has built-in
| globbing. This is how you would loop over all entries (files,
| dirs, symlinks, etc) in the current directory without getting
| tripped up by whitespace: for e in *; do echo
| "got: $e"; done
| Taikonerd wrote:
| > knowing how to deal with wacky input and corner cases are a
| requirement of learning ANY programming language.
|
| In general, I agree. But if there's a corner case that
| occasionally breaks naive code but otherwise doesn't do
| anything, then I'm going to think, "maybe we should just
| remove that corner case."
| bell-cot wrote:
| Replace "maybe" with " _OBVIOUSLY_ ". Keeping useless-but-
| hazardous "features" in any language is as idiotic as
| keeping a heap of oily rags in the furniture factory
| warehouse.
| zokier wrote:
| David Wheeler has been complaining (and suggesting fixes)
| about this for a long time:
| https://dwheeler.com/essays/fixing-unix-linux-
| filenames.html
|
| safename LSM https://lwn.net/Articles/686789/
| Taikonerd wrote:
| Thank you, I had wondered if there was something like
| safename.
| chuckadams wrote:
| > Is there any legitimate use for filenames with newlines?
|
| IMHO no, but they can exist, so you need to handle them without
| blowing up. Also, even spaces are considered delimiters here,
| which is why it's bad form to parse the output of ls.
| $ touch "foo bar baz" $ for f in `ls`; do echo $f; done
| foo bar baz # always use double
| quotes, though they aren't needed here $ for f in *; do
| echo "$f"; done foo bar baz
|
| At least the OS guarantees you won't run into NUL though.
| kstrauser wrote:
| I'm not in a place where I can easily check. What happens
| there if the file name contains a quote?
| chuckadams wrote:
| It's fine, the content of an expanded variable isn't parsed
| further: $ touch "foo \"bar baz"; for f
| in *; do echo "$f"; done foo "bar baz
| # quotes don't affect it either $ touch "foo \"bar
| baz"; for f in *; do echo $f; done foo "bar baz
|
| Though once you start passing args with quotes to other
| scripts, things get ugly. Rule of thumb is to always pass
| with "$@", and if that isn't enough to preserve quoting for
| whatever use case, write them out to a tempfile instead, or
| don't use a shell script for it in the first place.
| kstrauser wrote:
| What about in the case of for f in `ls`;
| do echo "$f"; done
|
| Same behavior, for the same reason?
| chuckadams wrote:
| The quotes are preserved, but backquote expansion fills
| the argument list using any whitespace as a delimiter.
| $ for f in `ls`; do echo "$f"; done foo
| "bar baz
|
| If you absolutely must parse ls (let's assume it's some
| other script that outputs items with spaces) and the
| output can contain spaces, you have a few options:
| $ ls | while read f; do echo "$f"; done foo "bar
| baz # parens keep the IFS change isolated to
| a subshell $ (IFS="\n"; for f in `ls`; do echo
| "$f"; done) foo "bar baz
|
| But if your filenames contain newlines, you'll really
| want to stick with the glob expansion, or output custom
| delimiters and set IFS to that.
| ChoHag wrote:
| > If you absolutely must parse ls
|
| ... stop and rethink your options. You may be able to get
| away with parsing the first columns of ls -l but even
| then a pathologically named file could make itself look
| like a line of ls output.
|
| It's simply not possible in all cases. If you can
| constrain your input then you may be able to make use of
| it but in the general case, that's why xargs and find
| grew a -0 option.
|
| Or glob.
| chuckadams wrote:
| Agreed when it comes to ls, but this applies to any
| script whose output you capture. I personally prefer
| "while read" loops but I'm probably screwed if someone
| smuggles in a newline.
| unqueued wrote:
| If you are iterating over a lot of files, a read while
| loop can be a major bottleneck. As long as you use the
| null options from find and pipe into xargs, you are
| usually safe.
|
| I've found it can reduce minutes down to seconds for
| large filesets.
|
| If you have to process a large number of files, you only
| have to call your utility once, instead of once per file.
| But you can use xargs to group the files and put them
| directly into the argv of a program.
|
| Something like: # Set the setgid bit for
| owner and group of all folders find . -type d
| -print0 | xargs -0 chmod g+s # Make the
| targets of symlinks immutable find . -type l -print
| 0 | xargs -0 readlink -z | xargs -0 chattr +i
|
| Way faster. But there are lots of caveats. Make sure your
| programs support it. Maybe read the xargs man page.
| folmar wrote:
| `find` is almost always easier, but you can get quite far
| with `ls -Q` if you can assume GNU ls.
| unqueued wrote:
| There is a pretty good syntax for dealing with nasty
| filenames, if you must: ANSI-C quoting[1].
|
| If you have to output in a shellscript in this format, use
| printf %q
|
| from man printf: %q ARGUMENT is
| printed in a format that can be reused as shell input,
| escaping non-printable characters with the
| proposed POSIX $'' syntax.
|
| It is just $'<nasty ansi-c escaped chars>'
|
| $ touch $'\nHello\tWorld\n' $ ls
|
| One thing I do like about a filesystem that fully supports
| POSIX filenames is that at the end of the day a filesystem is
| supposed to represent data. I think it is totally sensible to
| exclude certain characters, but that it should be done higher
| up in the stack if possible. Or have a flag that is set at
| mount time. Perhaps even by subvolume/dataset.
|
| One thing I haven't seen mentioned is that POSIX filenames
| are so permissive that they allow you to have bytes as
| filenames that are invalid UTF-8. That's why the popular
| ncdu[2] program does NOT use json as it's file format,
| although most think it does. It's actually json but with raw
| POSIX bytes in filename fields, which is outside of the
| official json spec. That does not stop folks from using json
| tools to parse ncdu output though.
|
| Another standard that is also very permissive with filenames
| is git. When I started exploring new ways to encode data into
| a git repo, it was only natural that I encountered issues
| with limitations of filesystems that I would check out in.
|
| Try cloning this repo, and see if you are able to check it
| out: https://github.com/benibela/nasty-files
|
| It is amazing how many things it breaks.
|
| If you are writing software that deals with git filenames or
| POSIX filenames (that includes things like parsing a zip file
| footer), you can not rely on your standard json encoding
| function, because the input may contain invalid utf-8. So you
| may need to do extra encoding/filtering.
|
| [1]: https://www.gnu.org/s/bash/manual/html_node/ANSI_002dC-
| Quoti...
|
| [2]: https://dev.yorhel.nl/ncdu/jsonfmt
| fragmede wrote:
| A GUI file browser will display the filename with a newline in
| it as a new line (and an icon above it) so as to be
| asthetically pleasing.
| xxpor wrote:
| this is why things like `find -print0` exist, which is IMO the
| easiest way to handle this robustly.
| soupbowl wrote:
| I wish FreeBSD replaced /bin/sh with OpenBSDs.
| rollcat wrote:
| FreeBSD made many cool moves in the 14.0 release, like finally
| getting rid of sendmail and adopting DMA (the irony), so
| perhaps there's a chance?
|
| But FreeBSD has always been much less focused on
| polish/cleanliness than OpenBSD; I mean - they have THREE
| firewalls, wtf.
| toast0 wrote:
| > they have THREE firewalls, wtf.
|
| I've not used ipf, but ipfw and pf have a different model and
| different features (although in 14.0, there's more overlap).
| I have to use them both.
| chrisfinazzo wrote:
| Related: The installer for iTunes 12.2.1 included a bug which
| might recursively delete a volume if the path given as input
| included incorrectly escaped spaces.
| NewJazz wrote:
| Reminds me of this...
|
| https://hackaday.com/2024/01/20/how-a-steam-bug-once-deleted...
| amiga386 wrote:
| Here's the actual diff:
|
| https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/bin/ksh/shf.c....
|
| And it looks like that covers all parsed parts of the shell
| script or history file, _including heredocs_. I get the feeling
| it 's going to break all shar archives with binary files (not
| that they're particularly common). It will stop NULs being in the
| script itself, but it won't stop them coming from other sources,
| e.g. $ var=$(printf '\0hello') -bash:
| warning: command substitution: ignored null byte in input
| $ echo $var hello
|
| It remains to be seen if this will be adopted by anyone else, or
| if it'll be another reason to use OpenBSD only as a restricted
| environment and not as a general computing platform.
|
| > "If there is ONE THING the Unix world needs, it is for
| bash/ksh/sh to stop diverging further"
|
| > OpenBSD ksh: _diverges further_
| matrix2003 wrote:
| Eh - I actually like developing on OpenBSD _first_ , because of
| restrictions like this. If it runs on OpenBSD, you are likely
| to have fewer bugs around things like malloc.
|
| OpenBSD is also really good about upstreaming bug fixes, which
| is a good thing. Firefox used to be a dumpster fire of core
| dumps on OpenBSD, and many issues were uncovered and fixed that
| way.
| chasil wrote:
| The only thing that is _required_ to happen is that they all
| obey the rules of the POSIX shell (when called as /bin/sh).
|
| Otherwise, anything goes.
|
| https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V...
|
| All the userland utilities must have the behavior (and
| problems) specified here:
|
| https://pubs.opengroup.org/onlinepubs/9699919799/utilities/
| bell-cot wrote:
| > Here's the actual diff:
|
| Only 8 short, simple lines of c code. Beautiful.
| whiterknight wrote:
| Side note: tell your startup to switch its "hardware with Ubuntu
| Linux inside" to BSD. You will have a much more stable and simple
| platform that can last a long time.
| quesera wrote:
| The recommendation is solid, but FWIW no one looking for
| stability would choose Ubuntu, among the Linuxen!
| parasense wrote:
| Is this going to murder those fancy shell scripts that self-
| extract a program appended to the tail, which is really just an
| encoded blob of some kind, presumably compressed, etc.. ???
| talideon wrote:
| Not if it was done competently. Shar files and the likes
| shouldn't contain NULs, even if they contain compressed data.
| The appended data should be binary safe.
| Thiez wrote:
| And in case your data does contain NULs, presumably one could
| add a layer of base64 encoding. Not nice for the filesize,
| but also much less likely to upset a text editor when the
| script is opened (even in the absence of NUL bytes).
| enriquto wrote:
| Great. Now forbid spaces in filenames.
| ben_bai wrote:
| Funny enough filenames are just byte sequences. So almost
| anything goes.
|
| There was just some patch that added '/' protection, because
| that's the only character that's not allowed in filenames.
|
| https://github.com/openbsd/src/commit/46f7109a9e03df89b66ada...
| klooney wrote:
| Does this break the self extracting tarball trick, where you have
| a bootstrap shell script with a binary payload appended?
| oguz-ismail wrote:
| No, they still work.
| 2snakes wrote:
| Surprised noone has mentioned the Crowdstrike issue, which was
| due to NUL characters wasn't it?
| raverbashing wrote:
| > There appears to be one piece of software which is
| misinterpreting guidance of this, and trying to depend upon
| embedded NUL.
|
| Big oof here. Why? How?
|
| > If there is ONE THING the Unix world needs, it is for
| bash/ksh/sh to stop diverging further by permitting STUPID INPUT
| that cannot plausibly work in all other shells. We are in a post-
| Postel world.
|
| Amem
| jrockway wrote:
| I like the term post-Postel.
|
| There are two reliability constraints that all software faces;
| security and interoperability. The more lax you are about
| validation, the more likely interoperability is. "That's weird,
| I'll just do whatever" is doing SOMETHING, and it's often to the
| end user's liking. But, you also enter a more and more undefined
| state inside the software on the other side, and that's where
| weird things happen. Weird things happening typically manifest as
| security problems. So the more effort you go to to minimize the
| possibility of entering a weird state, the more confidence you
| have that your software is working as specified.
|
| Postel's Law made a lot of sense to me when developing the early
| Internet. A lot of people were reading imperfect RFCs, and it was
| nice when your HP server could communicate with a Sun
| workstation, even though maybe some bit in the TCP header was set
| wrong. But now? You just gotta get it right and push a hotfix
| when you realize you messed something up. (Sadly, I don't think
| it's possible. Middleboxes are getting more and more popular. At
| work, we make a product where the CLI talks to the server over
| HTTP/2. We also install Zscaler on every workstation. Zscaler
| simply blocks HTTP/2. So you can't use our product. Awkward.)
| Thiez wrote:
| This is also where Google went right with QUIC: encrypt as much
| as possible to show middleboxes the least possible. This
| combats ossification. Then again it seems likely middleboxes
| will just block QUIC (or UDP in general).
| 0xbadcafebee wrote:
| > If there is ONE THING the Unix world needs, it is for
| bash/ksh/sh to > stop diverging further by permitting
| STUPID INPUT that cannot > plausibly work in all other
| shells. We are in a post-Postel world. > > It
| remains possible to put arbitrary bytes *AFTER* the parts of the
| > shell script that get parsed & executed (like some Solaris
| patch files > do). But you can't put arbirary bytes in the
| middle, ahead of shell > script parsed lines, because
| shells can't jump to arbitrary offsets > inside the input
| file, they go THROUGH all the 'valid shell script > text
| lines' to get there. So here it is again, an example
| of OpenBSD making software behavior saner for all of us.
|
| I don't consider use of all caps over a minor issue to be sane
| behavior. At best it's immaturity (trying to force your point
| rather than persuade), and at worst it's an emotional imbalance
| that effects judgement. That said, it's ksh, on OpenBSD, so I
| couldn't care less what they do.
| PufPufPuf wrote:
| What a weird take. There are just a few emphasized words in the
| commit message.
| opk wrote:
| I've always found the fact that zsh copes with NUL characters in
| variables etc to be really useful. I can see why this approach
| makes sense for OpenBSD but they can't prevent NULs appearing in
| certain places like piped input.
___________________________________________________________________
(page generated 2024-09-24 23:01 UTC)