hngopher.com

       [HN Gopher] What's New in POSIX 2024
       ___________________________________________________________________
        
       What's New in POSIX 2024
        
       Author : signa11
       Score  : 217 points
       Date   : 2024-10-29 00:42 UTC (22 hours ago)
        
 (HTM) web link (blog.toast.cafe)
 (TXT) w3m dump (blog.toast.cafe)
        
       | snvzz wrote:
       | This is a surprisingly greedy POSIX update.
        
         | BoingBoomTschak wrote:
         | As someone who truly limits himself to POSIX when he can, I
         | think they needed to push it forward to not become completely
         | obsolete. I'm really sad `mktemp -d` and `set -o nullglob`
         | didn't make the cut, but that's how it is, I guess.
        
           | ykonstant wrote:
           | A bespoke `mktempd` script is one of the first things I
           | install in a new system. Fortunately, it is not too hard to
           | make a `mktemp -d` compatible script with POSIX tools. `set
           | -o nullglob` is another story :D
        
             | pxeger1 wrote:
             | It's quite hard to write mktemp securely[1]. It would be
             | great if POSIX didn't make people attempt to do that error-
             | prone task themselves.
             | 
             | [1]: There's some explanation in this recent post:
             | https://dotat.at/@/2024-10-22-tmp.html
        
               | ykonstant wrote:
               | This is correct (though of course a decent `mktempd`
               | script will deal with the listed problems or crash loudly
               | on failure), and there are even more reasons to avoid
               | /tmp.
               | 
               | Unfortunately, it is one of the very few directories that
               | are somewhat POSIX-"guaranteed" writable by a non-root
               | user and the fact that on modern systems it is usually
               | mounted on a tmpfs makes it very attractive for pure
               | POSIX usage without rich array support.
               | 
               | If you have mount permissions, of course, you should tell
               | your `mktempd` to base its directory on a private tmpfs.
        
       | somat wrote:
       | Hopefully nothing, posix is, or at least it should be, a
       | descriptive standard. This is why posix is so terrible, and why
       | posix is so great.
       | 
       | The way I feel posix, and other descriptive standards work best
       | is when they describe what every one is already doing. This is
       | opposed to prescriptive standards which try focus on how the
       | "correct" way to do somthing, prescriptive standards tend to be
       | over engineered and may or may not actually work.
       | 
       | see also: descriptive and prescriptive dictionaries.
       | http://www.englishplus.com/news/news1100.htm
        
         | Flimm wrote:
         | Both prescriptive standards and descriptive standards have
         | their uses. If POSIX is a prescriptive standard, then maybe
         | another standard should exist that is descriptive.
        
           | lifthrasiir wrote:
           | Keep in mind that the Web standard eventually became
           | prescriptive because descriptive standards failed to catch
           | up. Likewise it can be argued that descriptive standards for
           | the common OS interface are no longer usable.
        
             | vacuity wrote:
             | To be crass, description is only useful for existing things
             | and prescription hinders making innovative things. I think
             | social forces make it natural that standards are treated
             | both descriptively and prescriptively, and that too leads
             | to angst. Case in point, POSIX was once more descriptive,
             | but then people wanted backwards compatibility for existing
             | and new OSes, which made it more prescriptive. The takeaway
             | is that ad-hoc things become permanent once they are too
             | difficult to remove, and then people are sad. Nothing is
             | immune, so just make reasonable attempts for the standard
             | and the culture to harmonize for a specific purpose.
        
         | zelphirkalt wrote:
         | That is also a way to never progress beyond the status quo.
        
       | Flimm wrote:
       | Yes! Finally! Let's treat filenames with new lines as errors! I'm
       | so delighted with this decision.
        
         | enriquto wrote:
         | Next: spaces
        
           | lifthrasiir wrote:
           | Still much better than mojibaked names.
        
             | enriquto wrote:
             | What do you mean?
        
               | _ZeD_ wrote:
               | What is the encoding of the filenames?
        
               | Joker_vD wrote:
               | I am personally not aware of any MBCS that could have a
               | 0x20 or 0x0D as a valid trailing byte. Are you?
        
               | lifthrasiir wrote:
               | I think my comment correctly contrasted mojibake from new
               | lines or spaces for that reason.
        
         | skissane wrote:
         | The original request was to ban all bytes between 1 and 31.
         | 
         | https://www.austingroupbugs.net/view.php?id=251
         | 
         | At some point they decided to narrow the change to just ban the
         | newline character.
         | 
         | Which I personally think is a pity. Allowing escape in file
         | names is a security risk because it enables you to embed
         | ECMA-48 escape sequences in file names. Secure terminal
         | emulators shouldn't be made vulnerable by arbitrary escape
         | sequences, but there are "too smart for their own good"
         | terminal emulators out there that have escape sequences that
         | let you do crazy things like run arbitrary executables.
        
           | ezoe wrote:
           | There are many non-UTF-8/16/32 character encoding used in the
           | wild which use these value in multi-byte character encoding.
           | These values are used in the wild.
           | 
           | I think the decision forbidding newline in pathname is also
           | wrong. It may break tons of existing code.
        
             | skissane wrote:
             | I wish Linux/etc had a mount option and/or superblock flag
             | called "allow only sane file names". And if you had that
             | set, then attempting to create a file whose name wasn't
             | valid UTF-8, or which contained C0 or C1 controls, would
             | fail. The small minority of people who really need pre-
             | Unicode encodings such as ISO 2022 could just not turn that
             | option on. And the majority who don't need anything like
             | that could reap the benefits of eliminating a whole
             | category of potential bugs and vulnerabilities.
        
             | Joker_vD wrote:
             | > There are many non-UTF-8/16/32 character encoding used in
             | the wild which use these value in multi-byte character
             | encoding.
             | 
             | Like what? I am genuinely curious: Shift-JIS, GB2312, Big5,
             | and all of the EUC variants do _not_ use bytes that
             | correspond to C0 characters in ASCII.
        
         | IshKebab wrote:
         | Why is that an issue?
        
           | shakna wrote:
           | Run a program to list a directory. Everything that interfaces
           | with that, will assume newline delimiters. Similar
           | assumptions are baked into a lot of software.
           | 
           | Enforcing that a newline isn't part of a path, ensures the
           | security of those systems that are commonly relied on.
        
             | oguz-ismail wrote:
             | Except no one's enforcing anything yet. Earlier versions of
             | POSIX allowed rejecting filenames containing newlines, the
             | newest version encourages it while mandating features
             | required to handle such filenames safely (find -print0,
             | xargs -0, read -d ''). So nothing's set in stone yet.
        
             | IshKebab wrote:
             | > Everything that interfaces with that, will assume newline
             | delimiters.
             | 
             | Well, only badly written programs. nushell handles this
             | fine, as will any program that doesn't try to do everything
             | as plain strings:                 ~> touch "foo\nbar"
             | ~> ls foo* | print
             | +---+------+------+------+----------+       | # | name |
             | type | size | modified |
             | +---+------+------+------+----------+       | 0 | foo  |
             | file |  0 B | now      |       |   | bar  |      |      |
             | |       +---+------+------+------+----------+
             | 
             | However after reading it they're only making them illegal
             | for the posix utilities from the 70s that aren't written
             | properly, so I think that makes sense.
        
         | devit wrote:
         | That's obviously impossible since it would break backward
         | compatibility and the users' existing filesystems (and the
         | Linux kernel will rightly never accept anything like that).
         | 
         | The only reasonable fix is to enhance bash and shell IDEs to
         | track for each variable whether it could possibly include all
         | filename-valid characters (e.g. if it comes from read with no
         | options then it can't contain \n) and warn (off by default
         | unless stderr is a terminal) if they can't and it's used as a
         | filename (conservatively determined when used as arguments to
         | processes), and also warn when using find without -print0, etc.
         | noninteractively and perhaps interactively as well.
        
       | imrejonk wrote:
       | This adds `set -o pipefail` to POSIX sh, which causes a whole
       | pipeline to fail (non-zero exit code) if one or more of the
       | commands in the pipeline fail.
        
         | throwaway984393 wrote:
         | Sad. Use of that option is almost always a mistake. It only
         | leads to undebuggable silent failures.
        
           | Joker_vD wrote:
           | I'd rather both have this option _and_ have it work reliably.
           | It 's ridiculous that                   export VAR=$(cmd1 |
           | cmd2)
           | 
           | does _not_ count as a pipefail when cmd1 or cmd2 fail but
           | VAR=$(cmd1 | cmd2)
           | 
           | does, so the "correct" way to set an environment variable
           | from a pipeline's output is actually
           | VAR=$(cmd1 | cmd2)         export VAR
        
           | ykonstant wrote:
           | Pipefail is useful and very hard to emulate on pure POSIX;
           | you need to create named fifos, break the pipeline into
           | individual redirections and check for error on each line.
           | 
           | And that is fine; but sometimes you want to treat a pipeline
           | as a "single command" and then you can use pipefail to abort
           | the pipeline on error. Then you can handle the error at the
           | granularity of the entire pipeline without caring which part
           | failed.
           | 
           | Lastly, I am confused as to the "silent" failures; maybe you
           | are thinking of combining this with `set -e`? Then yes, that
           | is bad and I recommend against the combination; but then
           | again, I and most advanced scripters recommend against
           | shotgunning `set -e` in the first place. Use it in specific
           | portions of the script when appropriate, and use proper error
           | handling otherwise.
        
             | zelphirkalt wrote:
             | Why does `set -e` make a pipeline fail silently?
        
               | ykonstant wrote:
               | `set -e` makes the script abort and is often used in lieu
               | of proper error handing:                 set -e
               | command       command [fails]       command
               | 
               | Whether the above reports error or not depends on the
               | command; when you have a pipeline failing in the above
               | way, it is even sneakier:                 set -e
               | command       command | command | command [fails]
               | command
               | 
               | You are reliant on _all_ commands in the pipeline being
               | verbose about failure to signal error.
               | 
               | None of the above is advisable. The advisable code is
               | error_handler() { proper error handling; }
               | command || error_handler "parameter"       command ||
               | error_handler "parameter"            { command | command
               | | command; } || error_handler "parameter"            {
               | set -e       exceptional section that needs to be bailed
               | out       set +e       }            command ||
               | error_handler "parameter"
        
               | skydhash wrote:
               | Error handling like that makes sense if you're writing a
               | program. But if you just want a script for an automation,
               | `set -e` is enough.
        
               | ykonstant wrote:
               | It is not; Greg's wiki further explains why, if the
               | silent failure problem above is not enough reason.
        
               | Joker_vD wrote:
               | Gee, imagine if shells with errexit option enabled wrote
               | some diagnostic output to stderr before exiting. "Add
               | your own error checking instead", how do I check which
               | piece of pipeline has failed, exactly? The PIPESTATUS
               | variable is bash-specific and was not standardized.
        
               | ykonstant wrote:
               | ? Why are you replying to me? My position was pretty
               | clear:
               | 
               | "Pipefail is useful and very hard to emulate on pure
               | POSIX; you need to create named fifos, break the pipeline
               | into individual redirections and check for error on each
               | line.
               | 
               | And that is fine; but sometimes you want to treat a
               | pipeline as a "single command" and then you can use
               | pipefail to abort the pipeline on error. Then you can
               | handle the error at the granularity of the entire
               | pipeline without caring which part failed."
               | 
               | By the way, I never script in Bash; I only script in
               | POSIX primitives using dash as my executable.
        
         | akdor1154 wrote:
         | Holy balls that's like Christmas!
        
         | rightbyte wrote:
         | Really? Wont that break piping grep?
        
           | WJW wrote:
           | Probably, so don't `set -o pipefail` in scripts that pipe
           | into grep.
        
             | rightbyte wrote:
             | Ah ok I read it as 'sets it by default' for some reason.
        
         | zelphirkalt wrote:
         | Does it? It is not mentioned anywhere in the post. Can you post
         | a reference to your source?
        
           | noselasd wrote:
           | The post only have a few highlights. The Posix specs are only
           | for paying IEEE customers though, but
           | https://pubs.opengroup.org/onlinepubs/9799919799/ mentions
           | it.
        
             | arp242 wrote:
             | That _is_ the POSIX spec, no?
             | 
             | It's at: https://pubs.opengroup.org/onlinepubs/9799919799/u
             | tilities/V...
             | 
             | (no permalink, search for "pipefail")
        
         | deskr wrote:
         | If you're writing scripts, use that and don't forget -e and -u
         | -e      Exit  immediately if a pipeline (which may consist of a
         | single simple command), a list, or a compound command (see
         | SHELL GRAMMAR above), exits with a non-zero status
         | -u      Treat  unset variables and parameters other than the
         | special parameters "@" and "*" as an error when performing
         | parameter expansion
        
           | ykonstant wrote:
           | For `set -u` I mostly agree. For `set -e` see my comment
           | below and Greg's wiki: http://mywiki.wooledge.org/BashFAQ/105
        
             | deskr wrote:
             | > and they still fail to catch even some remarkably simple
             | cases
             | 
             | I totally agree. Although I'd say that there isn't anything
             | "remarkably simple" about writing a bash script. Anything
             | in the shell scripting world that seems remarkably simple
             | is just because one hasn't realised the ghosts and horrors
             | that lurk in the shadows.
             | 
             | But I'll use -e anytime. It feels like having a protective
             | proton pack at least.
        
       | enriquto wrote:
       | > We've established that, yes, pathnames can include newlines. We
       | have not established why they can do that. After some
       | deliberation, the Austin Group could not find a single use-case
       | for newlines in pathnames besides breaking naive scripts.
       | Wouldn't it be nice if the naive scripts were just correct now?
       | Ok, that might be a bit much all at once. We're heading there
       | though!
       | 
       | Oh my god. This makes me so happy. This is the most lovely think
       | I've read in the world of computing since the unix gods decided
       | that newlines were to be a single character.
       | 
       | The philosophy underlying the sentence "Wouldn't it be nice if
       | the naive scripts were just correct now?" is incredibly positive.
       | We are surrounded by arrogant jerks who break old code by
       | aggressively enforcing stricter compliance of some stupid rules.
       | But here come these posix heros who do the exact opposite: make
       | old code correct! There is hope in mankind after all.
        
         | anal_reactor wrote:
         | It's a bandaid on a wider problem: the design of Unix shell is
         | bonkers and the whole thing should be deleted. Why? Because I
         | haven't seen any other tool ever have so many pitfalls. Take n
         | random languages and m random developers and tell them to loop
         | over a string array and print its contents, and count how many
         | correct programs you get on average per language. There will be
         | easy languages, then difficult languages, then a huge gap, then
         | Unix shell because in your random sample you managed to get one
         | guy who has PhD in bash.
        
           | blueflow wrote:
           | Someone needs to come up with a interactive shell first, one
           | that is comparable in usability. Then we can think about
           | replacing the unix shell.
           | 
           | I tried both python and lua interactively, but they are a
           | pain when it comes to handling files. You have to type much
           | more to get the same things done.
        
             | anal_reactor wrote:
             | The bigger issue is the sheer momentum of Unix shell. Even
             | if you come up with an alternative that is better by every
             | objectively measurable metric, it's still going to be a
             | monumental task to have it packages with commonly used
             | distros. Kinda like the "why can't the US switch to the
             | metric system" problem.
        
               | blueflow wrote:
               | People already use different shells, mksh, fish, and so
               | on. With fish there is a non-posix shell in wide use.
        
               | oguz-ismail wrote:
               | >wide use
               | 
               | Five people around the globe isn't wide use.
        
               | blueflow wrote:
               | I'm sure you might get more than 5 people on HN replying
               | to you that they are using fish right now. Say something
               | discrediting about fish and they show up.
        
               | fragmede wrote:
               | Heh, reminds me of how to get help with Linux back in the
               | day. If you directly asked for help, you'd be told to
               | RTFM. If you stayed confidently that Windows could do
               | something and that Linux sucks because it can't, you'd
               | get users tripping over themselves with details and
               | instructions,'just to prove you wrong.
               | 
               | Human psychology is fascinating!
        
               | azalemeth wrote:
               | There's a direct cost in money, time and lives that has
               | come from the US's adherence to their US Customary Units
               | (which are often different to the old imperial units).
               | People have literally died because of the confusion
               | caused by having multiple systems of units in common use
               | with ambiguous names (degrees, gallons, etc). Each year
               | industry worldwide spends an enormous amount of money
               | indirectly precisely because of this problem and it's
               | still incredibly unlikely to be fixed within my lifetime.
               | 
               | Bash-alternatives that are not completely compatible
               | frankly just don't have a chance.
        
               | stephenr wrote:
               | If it isn't distributed out of the box with every _nix-
               | like OS, it inherently_ isn't* "better by every
               | objectively measurable metric" - distribution of a
               | common, stable standard is a huge benefit in and of
               | itself.
        
               | blueflow wrote:
               | > distributed out of the box with every nix-like OS,
               | 
               | Python and lua are pretty close to that.
        
               | stephenr wrote:
               | > Python and lua are pretty close to that.
               | 
               | Python maybe _often_ installed by default but it 's
               | definitely not an essential/required package "out of the
               | box" on every install. Also, in a thread where one topic
               | is how POSIX shell handles whitespace in filenames, it's
               | hilarious (not in a good way) that someone suggests a
               | language that handles whitespace the wrong way in it's
               | own code. Yes, significant whitespace is objectively
               | wrong.
               | 
               | What OS/distro is Lua included on _out of the box_? That
               | doesn 't mean "available in a package". I mean literally
               | included in every single install and cannot reasonably be
               | omitted?
               | 
               | Regardless of the availability, the parent comment says
               | 
               | > better by every objectively measurable metric
               | 
               | Neither Python nor Lua are "better" than shell, at the
               | types of things shell is commonly used for - they're
               | objectively worse.
        
               | blueflow wrote:
               | Lua gets onto every other Linux distro as dependency of
               | some base system component. For example, rpm or pipewire
               | depend on lua. Ubuntu and Debian ship with pipewire per
               | default.
               | 
               | You should use the word "objectively" less.
        
               | consteval wrote:
               | Even outside of distribution, python and lua aren't
               | objectively better. For starters, they're much more
               | verbose.
        
               | blueflow wrote:
               | I just said that, scroll up.
        
             | nly wrote:
             | Oil shell?
             | 
             | https://www.oilshell.org/
             | 
             | Compatible with most bash scripts
        
             | throw16180339 wrote:
             | I certainly have my complaints about Powershell, but it's
             | got pretty good coverage, decent documentation, and cross
             | platform support.
        
               | felixgallo wrote:
               | if it weren't so irregular, inconsistent, spotty and
               | tasteless, it'd be a great option.
        
           | throwaway19972 wrote:
           | > the design of Unix shell is bonkers
           | 
           | Compared to what?
        
             | mdavid626 wrote:
             | Powershell?
        
               | oguz-ismail wrote:
               | Verbosity is a huge problem there
        
               | consteval wrote:
               | Modern programming language designers have a bad
               | relationship with verbosity. I don't know why they do
               | this.
               | 
               | It's a lang for an interactive shell, typing literally
               | translates to developer speed. I understand the want for
               | clarity and maybe that's nice in large scripts, but the
               | main goal is to be a shell. So, optimize for that. Also,
               | you probably shouldn't be using powershell for large
               | scripts anyway.
               | 
               | The only recent lang I've seen that has a handle on this
               | is Rust. You can tell they put a lot of thought into
               | having keywords be as short as possible while still being
               | descriptive.
        
               | ggm wrote:
               | FoundTheCamelCaseConvert.
               | 
               | My God next you will say getopt() --longform is the
               | bestest
        
               | throw16180339 wrote:
               | It's been years since I used Powershell, but IIRC there
               | are shortcuts for the common commands, e.g. cat, ls, mv,
               | rm, and such DTRT.
        
               | Diti wrote:
               | Those aliases are, I believe, only defined on Windows
               | PowerShell (the closed-source version 5; not PowerShell
               | 7). I wish those default aliases you mentioned weren't a
               | thing. Especially `curl` (people should use `iwr`
               | instead), which is an alias of `Invoke-WebRequest`,
               | because it makes the `curl.exe` shipped with Windows
               | nearly undiscoverable.
        
               | poincaredisk wrote:
               | PowerShell designer could learn from decades of
               | programming language progress and especially shell usage.
               | They could improve many aspects indeed. This doesn't mean
               | that the original design is "bonkers", only that it's not
               | perfect.
        
           | enriquto wrote:
           | > loop over a string array
           | 
           | Dear anal_reactor, what is a "string array"? I have used unix
           | shells since nearly 30 years and never heard about them. And
           | I consider myself a script-fu master!
           | 
           | There are two array-like constructions in the shell: list of
           | words (separated by spaces) and list of lines (separated by
           | newlines). Both cases are implemented as a single string, and
           | the shell makes it trivial to iterate through its components.
        
             | ManBeardPc wrote:
             | That is exactly the problem many people have with it.
             | Encoding ,,arrays" this way is foreign to everyone who
             | comes from ,,normal" programming languages. Both variants
             | lead to problems because either character can occur in
             | elements, worst case scenario they contain both at the same
             | time. I can see why this leads to confusion and bugs.
        
               | skydhash wrote:
               | It's like people saying they won't learn French because
               | it has a different grammatical structure. There's no
               | "normal" natural language. If you're used to the C-like
               | syntax, learning C-like language will be easy. But that's
               | not an argument to say Lisp is confusing.
        
               | ManBeardPc wrote:
               | That's why I put normal in quotes. There is however more
               | to it than having a different grammatical structure: It
               | works different from many commonly used languages that
               | have actual arrays/lists where elements can contain
               | anything the type allows. If you come from any of the
               | common modern programming languages (lets say Java,
               | Kotlin, C#, JS/TS, Python, Swift, Go, Rust, etc.) and
               | expect something similar (because many of them are very
               | similar) you will be confused. Using spaces or newlines
               | to encode elements in a single string is just not robust
               | and leads to easy to make mistakes.
        
               | skydhash wrote:
               | Most of these languages were created long after bash and
               | the other shells. The fact is that shell scripts allows
               | for unquoted strings and quoting is a specific operation,
               | not syntax. Also shell scripts were meant for
               | automations, not for writing general programs. The basic
               | units are commands, arguments, input, output, files,...
               | so the design makes these easy to manipulate.
               | 
               | I'm not saying that we can't improve, but I'm more in
               | favor of making the tool more apt to solve a problem than
               | making it easier to learn. Because the latter often wants
               | to forego the requirement of understanding the problem
               | space.
        
               | ManBeardPc wrote:
               | Yes, these are newer. I mainly wanted to make the point
               | that it is confusing if you are new to bash and come from
               | these newer languages with the wrong expectations. The
               | concise nature and many subtle details makes it very
               | difficult for beginners and infrequent users.
               | 
               | Compare this to the newer programming languages where you
               | explicitly call something with speaking names like
               | .Trim(), .EndsWith(), support from compiler and IDE.
               | 
               | In my experience automation and general programs often
               | are the same thing once things get more complicated. Bash
               | scripts usually grow rapidly and are a giant PITA to
               | maintain or refactor. Throw in build systems and helper
               | scripts and you quickly receive a giant pile of
               | spaghetti. Personally I just switch to one the mentioned
               | programming languages once it goes above a simple
               | sequence of operations.
               | 
               | Personally I don't see how to improve it much without
               | becoming a full blown programming language, at which
               | point it would probably make more sense to just release a
               | library for common automation tasks that is also
               | composable. Maybe I'm just not the right target audience.
        
               | skydhash wrote:
               | The issue with your otherwise good reply is that someone
               | are bringing expectations to an expert tool (programming
               | languages, software, OS) and blidly assuming that
               | everything will work as he thinks it should. Familiarity
               | helps with learning, but shouldn't replace it. Someone
               | new to bash should probably start with a book.
               | 
               | And for bigger automation projects, there are lots of
               | projects and programming languages that can help.
        
               | ManBeardPc wrote:
               | I agree it is an issue but it is how many people work and
               | think. Most of the time they are not even wrong. "Hey, I
               | have variables and loops, I know that!".
               | 
               | I would even make the case for expert tools being as
               | unsurprising and familiar as possible unless there is a
               | very good reason for them not to. Also they should be
               | robust against misuse and guide the user towards good
               | practices. There are always beginners, people that rarely
               | need to use it, people that do programming as "just a
               | job" and people that make mistakes because they are
               | distracted, tired or just human. Something like "rm -r /"
               | is a good reminder of that for many people.
               | 
               | Plus there are already a lot of tools required. Reading a
               | book about every tool I have to use would be unpractical
               | for most projects. Maybe more expert tools should just be
               | tools. The same way I can now just use Ubuntu and get a
               | working desktop system including drivers for most common
               | hardware. If I compare that to the past where I installed
               | a Linux distribution and then found out I lack a driver
               | for my network card but I need to download it from the
               | internet... I still can modify my system if I need to,
               | but it's nice that I don't have to. I think we can do
               | similar things with many parts of development and free
               | some capacity for other tasks.
        
           | dailykoder wrote:
           | Works on my machine!
        
           | akira2501 wrote:
           | > I haven't seen any other tool ever have so many pitfalls.
           | 
           | I haven't seen any other tool with so much general utility
           | and availability.
           | 
           | > to loop over a string array and print its contents
           | 
           | Is incredibly easy in bash and bash like shells. As
           | highlighted the issue is that tools like 'ls' don't create "a
           | string array." They create one giant string that has to be
           | parsed. The rules in the shell are different than in other
           | languages but it /will/ do most of the parsing for you, or
           | all of it, if you do it carefully.
           | 
           | This is a fine tradeoff. As evidenced by it's wide usage and
           | lack of convincing replacements.
        
             | anal_reactor wrote:
             | > I haven't seen any other tool with so much general
             | utility and availability.
             | 
             | > availability
             | 
             | That's the real reason why we use Unix shell. It's not
             | good, but it's available. Like a cheap hooker.
             | 
             | > but it /will/ do most of the parsing for you, or all of
             | it, if you do it carefully.
             | 
             | "It mostly works if you're careful" doesn't sound very
             | convincing to me.
        
               | stephenr wrote:
               | > but it's available. Like a cheap hooker.
               | 
               | Username checks out.
        
               | akira2501 wrote:
               | > "It mostly works if you're careful" doesn't sound very
               | convincing to me.
               | 
               | Would you rather write your own parser?
        
           | vbezhenar wrote:
           | The main problem is using text as a common format between
           | different applications.
           | 
           | First: text is not well defined. Is it ASCII? Is it UTF-8?
           | Some programs can spew UTF-32 with proper locale configured,
           | it's a mess.
           | 
           | Second: encoding and decoding of objects to text is not
           | defined at all. Those problems with filenames is just one
           | example. Using newline as a separator is a natural thing that
           | is easy to implement, yet it is wrong.
           | 
           | In my opinion two things should be done:
           | 
           | 1. Standardise on UTF-8. No other encodings allowed.
           | 
           | 2. Standardise on JSON. It is good enough to serve as
           | universal exchange format, tools like `jq` exist for some
           | time now.
           | 
           | So any utility must read and write JSON objects with some
           | standard env set. And shells can be developed with better
           | syntax to deal with JSON. This way you can write something
           | like
           | 
           | `ps aux | while read row; do echo ${row.user} ${row.pid};
           | done`
        
             | anal_reactor wrote:
             | True, but this would be immensely difficult to pull off,
             | because how do you convince other people to write programs
             | that produce actual working JSON?
        
             | ezoe wrote:
             | Don't even assume UTF-something is the only character
             | encoding. There are so many existing character encodings
             | before Unicode. It's still widely used.
        
             | nly wrote:
             | The primary purpose of command line program output is to
             | convey information to a human, not to other programs.
             | 
             | Command line scripting is _supposed_ to be adhoc and hack.
        
               | mdavid626 wrote:
               | I disagree that it _supposed_ to be adhoc and hack. Look
               | at PowerShell.
        
               | anthk wrote:
               | That under limited OSes such as DOS. Under Unix, piping
               | has been _the_ philosophy.
        
               | consteval wrote:
               | There are exchange formats that are well-defined enough
               | to be useful to many computers while also being readable
               | enough to be traversed by human eyes. There's no reason
               | to everything ad-hoc, you don't get much by that. You
               | also control the shell itself - there's no reason you
               | can't display object representations in a pretty way.
        
             | pif wrote:
             | > The main problem is using text as a common format between
             | different applications.
             | 
             | If you can't get the immensity of the cleverness of Unix
             | foundations, you should not talk about them.
             | 
             | That idea is what made it possible for you to type that
             | sentence in the first place.
        
             | arghwhat wrote:
             | What cursed madness have you hit that spits out UTF-32
             | under normal conditions?! That can only be a bug -
             | UTF-32/UCS-4 never saw external use, and has only ever been
             | used for in-memory fixed-width character representation,
             | e.g. runes in Go.
             | 
             | You never have to worry about whether you're dealing with
             | ASCII vs. UTF-8, but rather if you're dealing with UTF-8
             | vs. ISO-8859-1, or worse, Shift JIS or similar.
        
               | vbezhenar wrote:
               | I think that I hit that with Java:                   %
               | java -Dfile.encoding=UTF-32 Test | hexdump -C
               | 00000000  00 00 00 48 00 00 00 65  00 00 00 6c 00 00 00
               | 6c  |...H...e...l...l|         00000010  00 00 00 6f 00
               | 00 00 2c  00 00 00 20 00 00 00 77  |...o...,... ...w|
               | 00000020  00 00 00 6f 00 00 00 72  00 00 00 6c 00 00 00
               | 64  |...o...r...l...d|         00000030  00 00 00 0a
               | |....|         00000034
               | 
               | From quick googling it seems that glibc does not support
               | it, so it should not happen.
        
             | oneeyedpigeon wrote:
             | I think a lot of tools should support json as well as plain
             | text. Probably the latter by default, and the former with a
             | "-o json" or similar option. I'm fine with wc giving me
             | `5`, I'd prefer that to `{ "characters": 5 }`.
        
             | aloisklink wrote:
             | POSIX does actually define what a "text file" is, but the
             | definition is a bit unusual:
             | 
             | See https://pubs.opengroup.org/onlinepubs/9799919799/basede
             | fs/V1...
             | 
             | > 3.387 Text File
             | 
             | > A file that contains characters organized into zero or
             | more lines. The lines do not contain NUL characters and
             | none can exceed {LINE_MAX} bytes in length, including the
             | <newline> character.
             | 
             | So, if you have some non-printable characters like
             | BEL//ASCII 0x07, that's still a text file.
             | 
             | (and I believe what bytes count as a valid character depend
             | on your `LC_CTYPE`).
             | 
             | But the moment you have a line longer than {LINE_MAX} bytes
             | (which can depend on which POSIX environment you have),
             | suddenly your text file is now a binary file.
        
               | WJW wrote:
               | Kind of a weird definition indeed. One edge case: the
               | definition states the file must contain characters, so
               | presumably zero length files are out. But then how could
               | you have zero lines?
        
               | rascul wrote:
               | An empty file is not hard to make. It's just a matter of
               | creating the file and not writing to it.
        
               | WJW wrote:
               | Yes obviously. But the POSIX specification for a "text
               | file" as above is that it contains characters, which an
               | empty file by definition does not. So an empty file
               | cannot be a text file if you read that specification
               | strictly, and therefore you cannot have zero lines in a
               | text file. As soon as you have a single character there
               | is at least one line, and the amount of lines can only
               | stay the same or grow from there.
               | 
               | The definition should read "one or more lines" instead or
               | (probably better) specify that a text file contains "zero
               | or more characters".
        
               | rascul wrote:
               | Ahh I see what you're saying. I misunderstood at first.
        
               | Ukv wrote:
               | POSIX defines a line as:
               | 
               | > 3.185 Line
               | 
               | > A sequence of zero or more non-<newline> characters
               | plus a terminating <newline> character.
               | 
               | So a file with some characters but no trailing newline is
               | reported by `wc -l` as having zero lines.
        
             | poincaredisk wrote:
             | >It is good enough to serve as universal exchange format,
             | tools like `jq` exist for some time now.
             | 
             | Please don't use that underdefined joke of a spec. Define
             | "PosixJson" and use that instead. Right now it's not even
             | clear what the result of parsing {"a": 1234678901234567890}
             | is. Is this a parse error? A bigint? A float/double? Quiet
             | wraparound? Something else? I've seen all these behaviors
             | in real world JSON implementations across different
             | languages.
        
             | matrss wrote:
             | JSON itself is bad for a streaming interface, as is common
             | with CLI applications. You can't easily consume a JSON
             | array without first reading it in its entirety. JSONL would
             | be a better fit.
             | 
             | But then, how well would it work for ad-hoc usage, which is
             | probably one of the biggest uses of shells?
        
           | zelphirkalt wrote:
           | This should not be as downvoted as it is. In a way shell is
           | broken. The brokenness is in that it requires each command to
           | serialize and deserialize again, considering all the weird
           | things that can happen with the "all is a string" kind of
           | approach, instead of having a proper data interchange format
           | or even sending objects to next steps in the pipeline. This
           | behavior is what necessitates even thinking about the changes
           | listed in the post. We wouldn't even have that problem, if
           | the design of shell was better thought out. Now we are
           | dealing with decades of legacy built on these shaky
           | foundations. I hate to admit it, but seems at least this
           | aspect Powershell got right, whatever one may think about the
           | rest of it.
        
             | chasil wrote:
             | On my rhel7 system, the Debian dash shell is this large:
             | $ ll /bin/dash       -rwxr-xr-x. 1 root root 113536 Nov  5
             | 2018 /bin/dash
             | 
             | I happen to have an old powershell installed:
             | $ rpm -qi powershell | grep Size       Size        :
             | 126588370
             | 
             | A strict POSIX shell is always going to be vastly smaller,
             | for many reasons.
             | 
             | I would prefer that the POSIX shell was an LR-parsed
             | language, but you can't have everything.
        
         | nneonneo wrote:
         | Rather unfortunately, I happen to have a handful of files on my
         | machine with newlines in them (the filenames were
         | programmatically generated from a summary of their contents). I
         | loathe the possibility that my shell tools are going to
         | suddenly crash when confronted with these weird files, rather
         | than just producing some slightly silly output. I wish we'd
         | standardized the behaviour of just escaping such characters as
         | `\n/\r` or `^J/^M`...
        
           | nasretdinov wrote:
           | The thing is, it's hard to predict what would happen to those
           | scripts regardless... E.g. try naming your files "-rf" and
           | see how many things break :)
        
             | redserk wrote:
             | If one really wanted to embrace chaos, introduce this as a
             | new team file naming standard for "risk finding" files ;)
        
             | tetha wrote:
             | I do enjoy "ls *; touch -- -lisah; ls *" as a fun little
             | brainteaser for those uninitiated to this behavior.
        
             | ykonstant wrote:
             | A correct script will have no problems with "-rf" or any
             | other file name. I have (and recommend script writers make
             | their own) a directory hierarchy of "dangerous" file names
             | to test scripts.
             | 
             | For example, it contains a directory where all file and
             | subdirectory names are in unary, consisting only of
             | repetitions of the newline character. A correct script
             | should be able to enumerate, access and modify files in
             | there without issue.
        
             | nneonneo wrote:
             | export TMPDIR=" / "
             | 
             | to surprise the next person or script to do "rm -rf
             | $TMPDIR/foo"...
        
           | ykonstant wrote:
           | In academia, I get (and used to create) pdfs with names like:
           | 
           | "On the number of
           | 
           | associative foobars
           | 
           | of degree blah -
           | 
           | Johnson and Anderson.pdf"
           | 
           | all the time. It is very convenient for non-technical
           | academics to have a descriptive file name, and to be able to
           | see it entirely in the navigator they use newlines.
        
             | oneeyedpigeon wrote:
             | Oh god. I already get upset enough by spaces in a file
             | name, although I realise that fight is basically lost now!
        
               | enriquto wrote:
               | As a fellow spaces-in-filenames-hater, the fight is not
               | lost. We are on the brink of winning it; it's just a
               | mount option away!
               | 
               | While we cannot avoid that people hit the spacebar when
               | writing a filename on a gui, this does not mean _at all_
               | that the resulting filename itself need contain a plain
               | space character. Those spaces can and should be
               | transparently translated to non-breaking space characters
               | at some point. Maybe by the gui itself, or more robustly
               | by the filesystem. This would make everybody happy: gui
               | users and naive shell script writers.
        
               | poincaredisk wrote:
               | >Those spaces can and should be transparently translated
               | to non-breaking space characters at some point
               | 
               | Why? This just introduces more complexity and
               | interoperability headaches for seemingly no reason.
        
               | enriquto wrote:
               | > Why?
               | 
               | In order to preserve the sacrosanct simplicity of naive
               | shell scripts. Seems like a very noble goal to me.
               | 
               | The only unexpexted compexity arises when you want to
               | deal with filenames having mixed spaces and nbsps. But
               | I'd say that people who do that had it coming.
        
               | alexvitkov wrote:
               | If you want simple shell scripts to work, make an
               | actually good shell language without all the footguns.
               | 
               | The filesystem is way more important than /bin/sh and and
               | any complexity added there will trickle down to all
               | programs, not just shell scripts.
               | 
               | It's not worth adding hacks on the FS to patch defects in
               | poorly written shell scripts (which are being replaced en
               | masse with python/nodejs/even weirder yaml files/systemd
               | units/etc... anyways)
        
               | ksp-atlas wrote:
               | nushell uses real lists for things which means you don't
               | need to care about seperators except when dealing with
               | external system things
        
               | consteval wrote:
               | Whitespace in filenames in general is difficult to deal
               | with. Many, maybe most, programs get it wrong. It's not
               | just about shell scripts, many GUI programs fail to
               | handle those files properly too.
        
               | alexvitkov wrote:
               | When GUI programs mishandle filenames with spaces, IME
               | it's usually because they spawn a subshell in a naive way
               | (system("rm " + filename)).
               | 
               | To mishandle spaces you have to split an input w/
               | filenames by whitespace, which is not that common of an
               | operation outside of a shell.
        
               | InfiniteRand wrote:
               | My favorite file+space issues is spaces at the end of
               | file names, especially when you copy and paste text, or
               | text gets trimmed from an input box, or the person
               | forgets to trim space from an input box...
        
               | nradov wrote:
               | The vast majority of Windows and MacOS programs get it
               | right.
        
               | arp242 wrote:
               | Eh? It's really not a bother in pretty much any
               | programming language, and you don't really need to do
               | anything special for it. I don't know any program that
               | has any problems with it.
               | 
               | Even zsh has fixed this. It's just /bin/sh and bash that
               | are annoying.
        
               | lifthrasiir wrote:
               | Simplicity doesn't always mean stupidity. The simple but
               | functional shell that correctly handles whitespaces
               | without much hassle was already available since 90s,
               | namely rc which is also found in Plan 9. Adopting rc's
               | string concatenator `^` in POSIXy shells shouldn't be too
               | hard.
        
               | vbezhenar wrote:
               | Yep, works today:                   sh-3.2$ f='Hello
               | world'         sh-3.2$ echo $f         Hello world
               | sh-3.2$ for i in $f; do echo $i; done         Hello
               | world              sh-3.2$ f='Hello\xC2\xA0world'
               | sh-3.2$ echo $f         Hello world         sh-3.2$ for i
               | in $f; do echo $i; done         Hello world
        
               | stouset wrote:
               | Just always quote variable interpolation and you will
               | never have problems.                   sh-3.2$ f='Hello
               | world'         sh-3.2$ echo "${f}"         Hello world
               | sh-3.2$ for i in "${f}"; do echo "${i}"; done
               | Hello world         sh-3.2$
        
               | chasil wrote:
               | It would be really nice if there was a mount option that
               | would quietly remove spaces in filenames, or convert them
               | to an underscore.
               | 
               | If I had it, I would use it today.
        
               | curt15 wrote:
               | Didn't Windows name "Program Files" with a space to force
               | application developers to handle spaces in paths
               | properly?
        
               | ape4 wrote:
               | Not to mention C:\Program Files (x86)
        
               | account42 wrote:
               | And C:\Programme and other localized variants to force
               | people to go through the proper APIs instead of
               | hardcoding paths.
        
               | pjmlp wrote:
               | In theory yes, in practice to this day many people don't
               | bother how to learn how to deal with pathnames in a
               | proper way.
        
               | inetknght wrote:
               | Top difficulties in computer science:
               | 
               | 1. naming things
               | 
               | 2. cache coherency
               | 
               | 3. off-by-one errors
               | 
               | ???
               | 
               | 4. quoting pathnames
        
               | nneonneo wrote:
               | Eh, maybe. In practice I usually do all my moderately-
               | heavy filesystem scripting in Python these days, for
               | which pathname quoting is just a complete non-issue. Of
               | course, I still use a shell for quick-and-dirty stuff,
               | but usually only for pretty simple tasks where the
               | simplest quoting setup ("$i") suffices.
        
               | hawski wrote:
               | I would replace 4 with parameter expansion rules.
        
               | mmcdermott wrote:
               | For the longest time you could get away with this in cmd:
               | 
               | > dir c:\progra~1
               | 
               | So if forcing people to handle spaces was the goal, it
               | took a long time to force it.
        
               | arp242 wrote:
               | I'm pretty sure that still works. I forgot the exact
               | scenario, but my Windows CI on GitHub Actions output
               | shorte~1 pathna~1 like that in a script just a few months
               | ago. On one hand, the backwa~1 compati~1 is nice. On the
               | other hand, there's just so much depreca~1 cruft that
               | keeps popping up even on contemp~1 systems.
        
               | InfiniteRand wrote:
               | I just got used to installing things I need to interact
               | with in a program into a folder named C:\workspace
        
               | jonhohle wrote:
               | Convince (force?) your team to use make and soon everyone
               | will forget spaces in file names are even a thing!
        
               | taneliv wrote:
               | My team already uses `make` but there's no reason for me
               | to run it in my Downloads folders. File names in there
               | are sometimes wild. Yet I expect command line tools to
               | work with them. If they will cease to do so, I will have
               | to start using non-POSIX variants of those tools, I
               | guess.
        
             | Flimm wrote:
             | How do these non-technical academics even create a PDF file
             | with a name like that?
        
               | ykonstant wrote:
               | Right click, rename, enter, enter, enter (until the
               | entire file name is visible on the box)? That's how I did
               | it when I used Windows.
               | 
               | Edit: now I remember the most basic way: open the pdf,
               | select and copy the title, click on rename and paste from
               | clipboard. Works great to get the file name with the
               | newlines exactly as they are on the title!
        
               | zelphirkalt wrote:
               | Doesn't <enter> just confirm the typed input for the
               | filename and finish the renaming? How does that insert
               | newlines?
        
               | ykonstant wrote:
               | Shrug, I last used windows with Windows 7, so you are
               | probably right. That being said, at least two of the
               | students I am currently tutoring are on XP and one of my
               | colleagues as well :D
        
               | pino82 wrote:
               | No, it was always this way.
        
               | ykonstant wrote:
               | Right, I just remembered the main way to create those
               | filenames: open the pdf, select and copy the title,
               | close, rename the file and paste from clipboard.
        
               | astroid wrote:
               | Yes - I just tested on Win10+11 because I thought "there
               | is no way I didn't accidentally do something like this on
               | accident... and I would have remembered seeing a new line
               | in my file name when I made that mistake."
               | 
               | I just opened a folder in file explorer, clicked 'rename'
               | and then tried the following combinations: Enter L Ctrl +
               | Enter L Alt + Enter Win + Enter R Ctrl + Enter R Alt +
               | Enter
               | 
               | None of them let me put new lines in the filename - it
               | either did nothing, or 'closed' the rename view.
        
               | abenga wrote:
               | I don't know if this is a Linux thing, but when renaming
               | a file, when I press enter, I apply the new name, the
               | file manager doesn't add a newline.
        
             | jodrellblank wrote:
             | I don't know who "the Austin Group" mentioned in the
             | article are, but how come they "could not find a single
             | use-case for newlines in pathnames besides breaking naive
             | scripts" when legitimate use-cases are so easy to find?
             | 
             | (And if they're that incompetent, why does the article
             | imply they are worth quoting and listening to?)
        
               | gpderetta wrote:
               | It is [1] the joint working group that for the last 25+
               | years has been responsible for both the POSIX standard
               | and the Single Unix Specification. It emerged after the
               | UNIX wars as a consolidation of the various splintered
               | UNIX standardization efforts (POSIX itself, X/OPEN, OSF,
               | etc).
               | 
               | [1] https://en.wikipedia.org/wiki/Austin_Group
        
               | nilamo wrote:
               | Is that legitimate? A path name is just a unique
               | identifier for a file, IMO it doesn't make sense to put a
               | whole novel in there. If anything, a giant summary like
               | that should be in the meta tags?
        
               | jodrellblank wrote:
               | In what way is it not legitimate? It's not an accident,
               | bug or data corruption. Someone put it there for a
               | reason, and it benefits their use case. That's as
               | legitimate as it gets.
        
               | ykonstant wrote:
               | That's a core part of the problem: a path name is NOT
               | just a unique identifier for a file. Desktop operating
               | systems and their classical utilities conflate the
               | "unique identifier" and whatever "displayed title" of a
               | file though which the end user interacts with the file.
               | 
               | Users care about "titles" or "summaries" of files, not
               | "filesystem identifiers"; as long as the two are
               | conflated, non-technical users will use the identifier to
               | write titles and thus make the file easy to locate in an
               | interactive GUI. Meta tags are not even in the cognitive
               | horizon of most people.
        
             | ykonstant wrote:
             | I am interested in hearing the rationale for downvotes
             | explicitly. I am describing a reality that exists and must
             | be taken into account. Why are you downvoting?
        
           | jraph wrote:
           | They did the right thing for this: make the tools fail on
           | file creation, but not on existing files.
           | 
           | I guess it's still advisable to rename those files, I don't
           | know how things like cp, mv or rsync will behave when copying
           | such files in the future.
        
             | ykonstant wrote:
             | If your file system allows them, be careful with symlinks
             | though!
        
               | jraph wrote:
               | Why, specifically?
               | 
               | I'm convinced we will need to be careful with symbolic
               | links related to new line characters in filenames, but
               | I'm curious of which specific aspect you had in mind.
        
               | ykonstant wrote:
               | Oh, nothing specific to newlines. Just, when you rename
               | files to fix newlines, you need to check if they break
               | symlinks pointing to them.
               | 
               | For instance, I had project folders for my individual
               | research projects. In order to have a central repository
               | of resources and not have copies of multi-megabyte pdfs
               | in each folder, I put all referenced papers in a single
               | directory and symlinked them for each project that needed
               | them. Later, I wanted to rename the papers to remove
               | newlines. The symlinks complicated this process quite a
               | bit!
        
             | nneonneo wrote:
             | No, they did not do the right thing:
             | 
             | > the following utilities are now either encouraged to
             | error out if they are to create a filename that contains a
             | newline, and/or encouraged to error out if they are *about
             | to print a pathname that contains a newline* in a context
             | where newlines may be used as a separator
             | 
             | It then proceeds to list a bunch of utilities including
             | diff, file, find, grep, head, du, etc., none of which
             | create files directly.
             | 
             | These utilities could be updated to reject newlines in file
             | paths if they're going to print in a "newline delimited"
             | form - but for some of these utilities, that's the _only_
             | available form.
        
               | jraph wrote:
               | > error out if they are _about to print a pathname that
               | contains a newline_ in a context where newlines may be
               | used as a separator
               | 
               | But that's already broken. This is a situation where
               | filenames with newlines in them are indistinguishable
               | from two filenames in outputs. So instead of producing
               | subtly broken output, tools are _encouraged_ (not forced)
               | to explicitly fail with a lot of noise.
               | 
               | The "in a context where newlines may be used as a
               | separator" part of this sentence is very important.
               | 
               | IIUC the tools are still allowed to succeed in non broken
               | situations, for instance when a null separator is used
               | and not a new line character. And I can't imagine the
               | tools you listed will start breaking in situations that
               | worked (apart from file creation - indeed _this_ will
               | likely start breaking, and new line characters in
               | filename needs to be considered deprecated and things
               | using them to be fixed).
               | 
               | This is strictly better IMHO (if one thinks that newlines
               | in files are not worth the troubles given how things work
               | in POSIX, especially the part where things are line-based
               | and new line characters have quite some significance)
        
           | stouset wrote:
           | Dude, just fix the filenames.
        
         | ezoe wrote:
         | Don't assume UTF-8 is the only character encoding used in the
         | wild. There are character encoding with leading bytes not
         | easily detectable like UTF-8.
        
           | arghwhat wrote:
           | In 2024, if you don't get the correct result decoding a text
           | as UTF-8, the bug is the text, not the decoding. And luckily,
           | adoption of UTF-8 in the past 30+ years have gone will enough
           | that you don't need to worry.
           | 
           | Caveats for cursed hardware standards demanding two-byte
           | encodings like USB.
        
             | poincaredisk wrote:
             | I hope you're happy in your ivory tower, but I personally
             | work with a lot of files with other encoding, most often
             | that weird utf16 (Windows), sometimes also legacy files
             | with different ANSI encoding. Declaring "my decoder is
             | fine, it's the text that is buggy" is not going to score a
             | lot of points with my boss and clients.
        
               | zelphirkalt wrote:
               | Shouldn't hurt to tell clients to right their weird
               | proprietary software originated encodings though.
        
               | arghwhat wrote:
               | The only valid reason for still having files stored in
               | legacy ANSI encodings is that their only use is input to
               | software that has not been maintained for ~30 years and
               | cannot be updated. That's fine because they're just
               | binary inputs in a closed ecosystem that no one touches.
               | 
               | But if they are supposed to be treated as text, then yes
               | it's the text that's buggy - they should just be
               | converted to UTF-8 once and have the originals thrown
               | away.
               | 
               | UTF-16 is something that Microsoft has cursed us with by
               | inserting it into specifications (like USB) so that we
               | cannot get rid of it, even if it never made any sense
               | what so ever. But those are in effect explicit protocols
               | with a hard contract, very different from something where
               | you would "assume an encoding".
        
             | 1oooqooq wrote:
             | why people assume utf8 had only know locale encoding still?
             | 
             | you're probably guilty of the sin you preach and is showing
             | wrongly decoded utf8 and don't even know.
        
         | account42 wrote:
         | Their proposed solution is not compatible with reality though
         | where POSIX does not get to define what kind of files exist on
         | filesystems you need to work with.
         | 
         | All they did is introduce new error cases in C programs while
         | not actually fixing anything for shell scripts.
         | 
         | If anything, it's going to result in more exploits as people
         | write shell scripts with the assumption that newlines cannot
         | appear in filenames.
        
           | quotemstr wrote:
           | In the real world, nobody writes shell scripts that handle
           | newlines in filenames.
        
             | account42 wrote:
             | I do. Single files are handled with quotes around arguments
             | just fine. For lists of files you need to use NUL as a
             | separator. That's not really hard to do once you are aware
             | of the problem but ergonomics could be better - which is
             | something useful that POSIX could change.
        
         | zokier wrote:
         | But they did not make old code correct. Filenames are still
         | allowed to contain newlines. Shell scripts still need to be
         | prepared to deal with that. Nothing really changed, they just
         | added a feel-good half-measure.
        
           | quotemstr wrote:
           | It's a step in the right direction. You have to understand
           | that for decades a vocal group of Unix die-hards has opposed
           | any limitations whatsoever on the bytewise content of file
           | names. The newline restriction in this latest version of
           | POSIX may be modest, but it represents a dam breaking. When
           | (obviously) the sky doesn't fall, the next version of POSIX
           | will have a lot more filename cleanup.
        
           | janderland wrote:
           | This is pretty standard for a human run system. Gotta make
           | the human feel good about an idea before they can do said
           | idea.
           | 
           | If you're not familiar with humans, there are several manuals
           | available online.
        
         | hwc wrote:
         | Now do that with all whitespace!
        
       | relistan wrote:
       | The history at the beginning of this is not correct. Two
       | examples: the assertion that there was one compatible UNIX prior
       | to United States vs AT&T, the statement that GNU and BSD started
       | that same year. Very, very off.
        
         | unixhero wrote:
         | Okay, but you would add more value if you could also state what
         | is the correct order if things.
        
           | relistan wrote:
           | https://en.m.wikipedia.org/wiki/History_of_Unix#/media/File%.
           | .. is a good visual of (many of, not all) the various
           | versions of UNIX and when they were released. BSD was first
           | released in 1978. United States v. AT&T was implemented in
           | 1984 (judgment 1982) GNU was first created in 1983.
        
       | johnisgood wrote:
       | > Anyway, POSIX 2024 now requires c17, and does not require c89
       | 
       | I wish it would have been c99. What does c17 add exactly, more
       | C++-esque complexity or not? Why was it not c99 (or perhaps even
       | c11) over c17? Genuine questions.
        
         | lifthrasiir wrote:
         | > What does c17 add exactly, more C++-ish bullshit or not?
         | 
         | Multithreading support and such (atomics, thread-local storage
         | and a guarantee that `errno` is in TLS), explicitly aligned
         | types and allocations, dedicated types for strings known to be
         | Unicode, _Noreturn, _Generic, _Static_assert, anonymous structs
         | and unions in the nested position, quick_exit, timespec,
         | exclusive mode ("x") in f[re]open, CMPLX macros.
         | 
         | I'm not even sure which one can be C++-ish bullshit possibly
         | except for about two points:
         | 
         | - Multithreading does seem farfetched for causal users. In
         | fact, I do think it could have been minimized without any
         | actual harm, but multithreading itself needed to be specified
         | because it greatly affects a memory model. (Before C11, C had
         | no thread-aware memory model and different threading
         | implementations were subtly different beyond what the standard
         | stated.) Even JavaScript, originally without no notion of
         | threads, eventually got a thread-aware memory model due to
         | shared web workers. But that never meant JS itself need
         | multithreading support in its standard library, and C could
         | have done the same.
         | 
         | - `_Generic` is even more debatable, though I believe it was
         | the only way forward when we accept <tgmath.h>, which is known
         | to be a response to Fortran (other responses include
         | `restrict`) and was impossible to implement in the portable
         | manner before C11. As long as it retains its scary underline
         | and title case, I guess it's fine.
        
           | johnisgood wrote:
           | You quoted me before my edit, but fair enough. I do like the
           | "atomics" support.
           | 
           | > "guarantee that `errno` is in TLS"
           | 
           | I suppose that does not mean that I can just avoid setting
           | errno to 0 before calling a function after which I check for
           | errno, right?
           | 
           | Yeah, I do have an issue with stuff like "_Generic" but I
           | assume I can just simply not use it.
           | 
           | What is "quick_exit" exactly and what does it solve?
           | 
           | As for multithreading, I stick to phtread. Is any of the new
           | features a replacement for that or what?
           | 
           | At any rate, why C17 over C11 then?
        
             | lifthrasiir wrote:
             | C17 is a bugfix version of C11 (the next major revision
             | would be C23). The exact list of fixes is available in [1].
             | Mandating C11 instead of C17 when both are available seems
             | not really useful now.
             | 
             | You have the correct insight about errnos. The new
             | guarantee only means that other threads are not possible to
             | mess with your errnos, but cleaning errnos will be still
             | useful within an individual thread.
             | 
             | exit is not guaranteed to work correctly when called
             | simultaneously from multipe threads, while quick_exit will
             | be okay even in that situation. I think this behavior was
             | not even specified before C11, and only specified after
             | observing existing implementations.
             | 
             | It is expected that libc threading routines are thin
             | wrappers around pthread in Linux. That's why I do think it
             | can be minimized; the only actual problem before C11 was
             | the lack of thread-aware memory model. No need to actually
             | be able to create threads from libc to be honest,
             | especially given that each platform now almost always has a
             | single dominant threading implementation like pthread.
             | 
             | [1] https://www.open-
             | std.org/jtc1/sc22/wg14/www/docs/n2244.htm
        
               | johnisgood wrote:
               | My last question would be: is it "OK" to use phtread in
               | my code or are there any alternatives (i.e. "best way")
               | when using C17?
        
               | lifthrasiir wrote:
               | No, just use pthread. There are some useful pthread APIs
               | missing from C17 anyway too.
        
               | johnisgood wrote:
               | Thank you for your answers, it is much appreciated.
               | 
               | I suppose I will not use "quick_exit" either in that
               | case, I have many workers, there is a job queue mutex,
               | along with phtread_cond_wait and
               | phtread_mutex_{lock,unlock} and when the "job_quit_flag"
               | is set to true, that means all jobs are done and I am
               | ready to return NULL.
        
           | gpderetta wrote:
           | Most importantly posix already has existing multithreading
           | facilities in posix threads, so it is imperative that they
           | are reformulated in term of the C++11/C11 memory model.
        
           | cryptonector wrote:
           | > guarantee that `errno` is in TLS
           | 
           | I mean, that is already true.
        
       | ggm wrote:
       | File names with / in them
        
       | oguz-ismail wrote:
       | Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6
       | SRC != ls *.c
       | 
       | is fine in a makefile as far as POSIX is concerned, because:
       | 
       | > _Applications shall select target names from the set of
       | characters consisting solely of slashes, hyphens, periods,
       | underscores, digits, and alphabetics from the portable character
       | set_
        
       | guerrilla wrote:
       | Why was `isascii()` removed?
       | 
       | (Listed in the Sortix article linked in OP.)
        
         | oguz-ismail wrote:
         | It would yield false-positives with non-UTF-8 encoded text.
         | Big5 <https://en.wikipedia.org/wiki/Big5#Encoding> in
         | particular was notorious for using ASCII values for trailing
         | bytes. I don't know if it's still in use or if there are
         | others.
        
       | EdSchouten wrote:
       | strlcpy()!
        
       | pelorat wrote:
       | TIL the POSIX standard is still updated. Does it still suffer
       | from the issues that make Linux break POSIX compatibility in some
       | areas because they consider it a flawed standard?
        
       | chasil wrote:
       | - find(1p) now supports -print0       - xargs(1p) now supports
       | the -0 argument       - newlines in filenames now should throw
       | errors in many utilities       - a complier implementing the c17
       | standard is now required       - ulimit is expanded       -
       | renice can use relative values       - a timeout utility has been
       | added       - make adds support for $^ $+ ::= :::= != ?= +=
       | - logger is improved       - gettext is adopted       - readlink
       | and realpath are adopted       - rm now supports -d to remove
       | empty directories and -v for verbose       - various improvements
       | to printf, sed, test
        
         | greyw wrote:
         | Looks like the BSD-family will have some implementing to do.
        
           | sneed_chucker wrote:
           | Strict adherence to POSIX isn't a goal of any of the current
           | BSDs is it?
        
             | bryanlarsen wrote:
             | I'm confident they'd accept patches.
        
           | chasil wrote:
           | I just booted OpenBSD 7.0 (which is a bit dated).
           | 
           | The find utility has print0, and xargs has -0. Notibly, xargs
           | also has -P for running processes in parallel.
           | 
           | rm has both -d and -v.
           | 
           | The renice command appears to be able to use relative
           | adjustments with -n.
           | 
           | There is a timeout command.
           | 
           | There is a readlink command, but no realpath (but a manual
           | page exists for it as a system call).
        
       | pabs3 wrote:
       | Since old-POSIX systems will be in use for some time, I wonder
       | how many things will be able to switch to using the new
       | capabilities. And how many OSes already support all of the new
       | changes.
        
       | donatj wrote:
       | To build an internationalized shell script I'll need to compile
       | multiple .mo language files and distribute them along side the
       | script itself.
       | 
       | For shell scripts part of a large system, that's probably fine.
       | For small scripts, that's not very practical. You are not only
       | adding a compilation step, you're also requiring distribution of
       | multiple files. That's a pain.
       | 
       | It just kind of kills the convenience of a simple shell script. I
       | would probably end up writing a makefile to manage all of this
       | and at that point I am only a hop skip and jump away from using a
       | compiled language instead of shell.
        
       | quotemstr wrote:
       | > We've established that, yes, pathnames can include newlines. We
       | have not established why they can do that. After some
       | deliberation, the Austin Group could not find a single use-case
       | for newlines in pathnames besides breaking naive scripts.
       | Wouldn't it be nice if the naive scripts were just correct now?
       | 
       | Finally. Now let's do the rest:
       | https://dwheeler.com/essays/fixing-unix-linux-filenames.html
       | 
       | Filenames should be boring printable normalized UTF-8. I have
       | never, not once, seen a good reason that a filename should be
       | able to contain random binary gobbledygook
        
         | cryptonector wrote:
         | > Filenames should be boring printable normalized UTF-8. I have
         | never, not once, seen a good reason that a filename should be
         | able to contain random binary gobbledygook
         | 
         | Ensuring normalization is hard. Where should you do it? There's
         | only one good place: in the filesystem. But if you normalize on
         | create then you'd better use the same form that everyone else
         | uses, but, what's that? Input methods _generally_ produce NFC,
         | but there 's no guarantee that they will not produce something
         | else. HFS+ normalizes to NFD on create.
         | 
         | ZFS uses form-insensitivity -- much like case-insensitivity,
         | but for form. The reason ZFS went this was exactly that HFS+
         | and input methods differ as to forms. I pushed hard for this
         | way back when. IMO form-insensitivity is the best way forward.
         | 
         | But as for guaranteeing that filenames are UTF-8... that's much
         | harder. The best thing to do is to not allow the use of non-
         | UTF-8, non-ASCII, non-C locales -- not a guarantee, but pretty
         | good.
        
       | nh2 wrote:
       | > future editions will not require c17, but will simply require
       | whatever C specification version is the most modern and already
       | implemented by major toolchains
       | 
       | Is this really good?
       | 
       | If you can't rely on anything concrete being guaranteed, and it
       | is open to interpretation what "modern" or "major toolchains"
       | are, why have a standard?
        
       | cryptonector wrote:
       | > The problem is that pathnames2 (as per section 3.254 of POSIX
       | 2024) are just strings (meaning they can contain any bytes except
       | the NUL character), [...]
       | 
       | Pathnames can neither contain NUL nor '/'.
       | 
       | Re: `find -print0` / `xargs -0`:
       | 
       | > Previous POSIX releases have considered -print0 before, but
       | never ended up adopting it because using a null terminator meant
       | that any utility that would need to process that output would
       | need to have a new option to parse that type of output.
       | 
       | What nonsense. Just add the `-0` or similar options as needed.
       | 
       | > More precisely, this approach does not resolve our original
       | problem. xargs(1p) can't sort, and therefore we still have to
       | handle that logic separately, unless sort(1p) also grows this
       | support, even after read(1p). This problem continues with every
       | other type of use-case. Importantly, it breaks the
       | interoperability that POSIX was made to uphold.
       | 
       | More nonsense.
       | 
       | > A bunch of C functions3 are now encouraged to report EILSEQ if
       | the last component of a pathname to a file they are to create
       | contains a newline (put differently, they're to error out instead
       | of creating a filename that contains a newline).
       | 
       | Ok, that's tolerable. Ditto utilities (notice here they were able
       | to make a list of utilities).
        
         | chasil wrote:
         | Note that GNU sort has...
         | 
         | -z, --zero-terminated: end lines with 0 byte, not newline
        
       | InfiniteRand wrote:
       | I kind-of would like to see a POSIX-strict profile which
       | incorporates commonsense (by commonsense I mean avoiding things
       | that repeatedly over many years have tripped up programmers in
       | frustrating ways) things like no newline in file names. Operating
       | systems (or distributions) or could opt into this profile, and
       | then someone programming on such an operating system could rely
       | on the constraints of the profile and additional facilities could
       | be added on that might need to rely on those constraints.
       | Hopefully, gradually the use of the profile would spread.
        
       ___________________________________________________________________
       (page generated 2024-10-29 23:01 UTC)