[HN Gopher] What's New in POSIX 2024
___________________________________________________________________
What's New in POSIX 2024
Author : signa11
Score : 217 points
Date : 2024-10-29 00:42 UTC (22 hours ago)
(HTM) web link (blog.toast.cafe)
(TXT) w3m dump (blog.toast.cafe)
| snvzz wrote:
| This is a surprisingly greedy POSIX update.
| BoingBoomTschak wrote:
| As someone who truly limits himself to POSIX when he can, I
| think they needed to push it forward to not become completely
| obsolete. I'm really sad `mktemp -d` and `set -o nullglob`
| didn't make the cut, but that's how it is, I guess.
| ykonstant wrote:
| A bespoke `mktempd` script is one of the first things I
| install in a new system. Fortunately, it is not too hard to
| make a `mktemp -d` compatible script with POSIX tools. `set
| -o nullglob` is another story :D
| pxeger1 wrote:
| It's quite hard to write mktemp securely[1]. It would be
| great if POSIX didn't make people attempt to do that error-
| prone task themselves.
|
| [1]: There's some explanation in this recent post:
| https://dotat.at/@/2024-10-22-tmp.html
| ykonstant wrote:
| This is correct (though of course a decent `mktempd`
| script will deal with the listed problems or crash loudly
| on failure), and there are even more reasons to avoid
| /tmp.
|
| Unfortunately, it is one of the very few directories that
| are somewhat POSIX-"guaranteed" writable by a non-root
| user and the fact that on modern systems it is usually
| mounted on a tmpfs makes it very attractive for pure
| POSIX usage without rich array support.
|
| If you have mount permissions, of course, you should tell
| your `mktempd` to base its directory on a private tmpfs.
| somat wrote:
| Hopefully nothing, posix is, or at least it should be, a
| descriptive standard. This is why posix is so terrible, and why
| posix is so great.
|
| The way I feel posix, and other descriptive standards work best
| is when they describe what every one is already doing. This is
| opposed to prescriptive standards which try focus on how the
| "correct" way to do somthing, prescriptive standards tend to be
| over engineered and may or may not actually work.
|
| see also: descriptive and prescriptive dictionaries.
| http://www.englishplus.com/news/news1100.htm
| Flimm wrote:
| Both prescriptive standards and descriptive standards have
| their uses. If POSIX is a prescriptive standard, then maybe
| another standard should exist that is descriptive.
| lifthrasiir wrote:
| Keep in mind that the Web standard eventually became
| prescriptive because descriptive standards failed to catch
| up. Likewise it can be argued that descriptive standards for
| the common OS interface are no longer usable.
| vacuity wrote:
| To be crass, description is only useful for existing things
| and prescription hinders making innovative things. I think
| social forces make it natural that standards are treated
| both descriptively and prescriptively, and that too leads
| to angst. Case in point, POSIX was once more descriptive,
| but then people wanted backwards compatibility for existing
| and new OSes, which made it more prescriptive. The takeaway
| is that ad-hoc things become permanent once they are too
| difficult to remove, and then people are sad. Nothing is
| immune, so just make reasonable attempts for the standard
| and the culture to harmonize for a specific purpose.
| zelphirkalt wrote:
| That is also a way to never progress beyond the status quo.
| Flimm wrote:
| Yes! Finally! Let's treat filenames with new lines as errors! I'm
| so delighted with this decision.
| enriquto wrote:
| Next: spaces
| lifthrasiir wrote:
| Still much better than mojibaked names.
| enriquto wrote:
| What do you mean?
| _ZeD_ wrote:
| What is the encoding of the filenames?
| Joker_vD wrote:
| I am personally not aware of any MBCS that could have a
| 0x20 or 0x0D as a valid trailing byte. Are you?
| lifthrasiir wrote:
| I think my comment correctly contrasted mojibake from new
| lines or spaces for that reason.
| skissane wrote:
| The original request was to ban all bytes between 1 and 31.
|
| https://www.austingroupbugs.net/view.php?id=251
|
| At some point they decided to narrow the change to just ban the
| newline character.
|
| Which I personally think is a pity. Allowing escape in file
| names is a security risk because it enables you to embed
| ECMA-48 escape sequences in file names. Secure terminal
| emulators shouldn't be made vulnerable by arbitrary escape
| sequences, but there are "too smart for their own good"
| terminal emulators out there that have escape sequences that
| let you do crazy things like run arbitrary executables.
| ezoe wrote:
| There are many non-UTF-8/16/32 character encoding used in the
| wild which use these value in multi-byte character encoding.
| These values are used in the wild.
|
| I think the decision forbidding newline in pathname is also
| wrong. It may break tons of existing code.
| skissane wrote:
| I wish Linux/etc had a mount option and/or superblock flag
| called "allow only sane file names". And if you had that
| set, then attempting to create a file whose name wasn't
| valid UTF-8, or which contained C0 or C1 controls, would
| fail. The small minority of people who really need pre-
| Unicode encodings such as ISO 2022 could just not turn that
| option on. And the majority who don't need anything like
| that could reap the benefits of eliminating a whole
| category of potential bugs and vulnerabilities.
| Joker_vD wrote:
| > There are many non-UTF-8/16/32 character encoding used in
| the wild which use these value in multi-byte character
| encoding.
|
| Like what? I am genuinely curious: Shift-JIS, GB2312, Big5,
| and all of the EUC variants do _not_ use bytes that
| correspond to C0 characters in ASCII.
| IshKebab wrote:
| Why is that an issue?
| shakna wrote:
| Run a program to list a directory. Everything that interfaces
| with that, will assume newline delimiters. Similar
| assumptions are baked into a lot of software.
|
| Enforcing that a newline isn't part of a path, ensures the
| security of those systems that are commonly relied on.
| oguz-ismail wrote:
| Except no one's enforcing anything yet. Earlier versions of
| POSIX allowed rejecting filenames containing newlines, the
| newest version encourages it while mandating features
| required to handle such filenames safely (find -print0,
| xargs -0, read -d ''). So nothing's set in stone yet.
| IshKebab wrote:
| > Everything that interfaces with that, will assume newline
| delimiters.
|
| Well, only badly written programs. nushell handles this
| fine, as will any program that doesn't try to do everything
| as plain strings: ~> touch "foo\nbar"
| ~> ls foo* | print
| +---+------+------+------+----------+ | # | name |
| type | size | modified |
| +---+------+------+------+----------+ | 0 | foo |
| file | 0 B | now | | | bar | | |
| | +---+------+------+------+----------+
|
| However after reading it they're only making them illegal
| for the posix utilities from the 70s that aren't written
| properly, so I think that makes sense.
| devit wrote:
| That's obviously impossible since it would break backward
| compatibility and the users' existing filesystems (and the
| Linux kernel will rightly never accept anything like that).
|
| The only reasonable fix is to enhance bash and shell IDEs to
| track for each variable whether it could possibly include all
| filename-valid characters (e.g. if it comes from read with no
| options then it can't contain \n) and warn (off by default
| unless stderr is a terminal) if they can't and it's used as a
| filename (conservatively determined when used as arguments to
| processes), and also warn when using find without -print0, etc.
| noninteractively and perhaps interactively as well.
| imrejonk wrote:
| This adds `set -o pipefail` to POSIX sh, which causes a whole
| pipeline to fail (non-zero exit code) if one or more of the
| commands in the pipeline fail.
| throwaway984393 wrote:
| Sad. Use of that option is almost always a mistake. It only
| leads to undebuggable silent failures.
| Joker_vD wrote:
| I'd rather both have this option _and_ have it work reliably.
| It 's ridiculous that export VAR=$(cmd1 |
| cmd2)
|
| does _not_ count as a pipefail when cmd1 or cmd2 fail but
| VAR=$(cmd1 | cmd2)
|
| does, so the "correct" way to set an environment variable
| from a pipeline's output is actually
| VAR=$(cmd1 | cmd2) export VAR
| ykonstant wrote:
| Pipefail is useful and very hard to emulate on pure POSIX;
| you need to create named fifos, break the pipeline into
| individual redirections and check for error on each line.
|
| And that is fine; but sometimes you want to treat a pipeline
| as a "single command" and then you can use pipefail to abort
| the pipeline on error. Then you can handle the error at the
| granularity of the entire pipeline without caring which part
| failed.
|
| Lastly, I am confused as to the "silent" failures; maybe you
| are thinking of combining this with `set -e`? Then yes, that
| is bad and I recommend against the combination; but then
| again, I and most advanced scripters recommend against
| shotgunning `set -e` in the first place. Use it in specific
| portions of the script when appropriate, and use proper error
| handling otherwise.
| zelphirkalt wrote:
| Why does `set -e` make a pipeline fail silently?
| ykonstant wrote:
| `set -e` makes the script abort and is often used in lieu
| of proper error handing: set -e
| command command [fails] command
|
| Whether the above reports error or not depends on the
| command; when you have a pipeline failing in the above
| way, it is even sneakier: set -e
| command command | command | command [fails]
| command
|
| You are reliant on _all_ commands in the pipeline being
| verbose about failure to signal error.
|
| None of the above is advisable. The advisable code is
| error_handler() { proper error handling; }
| command || error_handler "parameter" command ||
| error_handler "parameter" { command | command
| | command; } || error_handler "parameter" {
| set -e exceptional section that needs to be bailed
| out set +e } command ||
| error_handler "parameter"
| skydhash wrote:
| Error handling like that makes sense if you're writing a
| program. But if you just want a script for an automation,
| `set -e` is enough.
| ykonstant wrote:
| It is not; Greg's wiki further explains why, if the
| silent failure problem above is not enough reason.
| Joker_vD wrote:
| Gee, imagine if shells with errexit option enabled wrote
| some diagnostic output to stderr before exiting. "Add
| your own error checking instead", how do I check which
| piece of pipeline has failed, exactly? The PIPESTATUS
| variable is bash-specific and was not standardized.
| ykonstant wrote:
| ? Why are you replying to me? My position was pretty
| clear:
|
| "Pipefail is useful and very hard to emulate on pure
| POSIX; you need to create named fifos, break the pipeline
| into individual redirections and check for error on each
| line.
|
| And that is fine; but sometimes you want to treat a
| pipeline as a "single command" and then you can use
| pipefail to abort the pipeline on error. Then you can
| handle the error at the granularity of the entire
| pipeline without caring which part failed."
|
| By the way, I never script in Bash; I only script in
| POSIX primitives using dash as my executable.
| akdor1154 wrote:
| Holy balls that's like Christmas!
| rightbyte wrote:
| Really? Wont that break piping grep?
| WJW wrote:
| Probably, so don't `set -o pipefail` in scripts that pipe
| into grep.
| rightbyte wrote:
| Ah ok I read it as 'sets it by default' for some reason.
| zelphirkalt wrote:
| Does it? It is not mentioned anywhere in the post. Can you post
| a reference to your source?
| noselasd wrote:
| The post only have a few highlights. The Posix specs are only
| for paying IEEE customers though, but
| https://pubs.opengroup.org/onlinepubs/9799919799/ mentions
| it.
| arp242 wrote:
| That _is_ the POSIX spec, no?
|
| It's at: https://pubs.opengroup.org/onlinepubs/9799919799/u
| tilities/V...
|
| (no permalink, search for "pipefail")
| deskr wrote:
| If you're writing scripts, use that and don't forget -e and -u
| -e Exit immediately if a pipeline (which may consist of a
| single simple command), a list, or a compound command (see
| SHELL GRAMMAR above), exits with a non-zero status
| -u Treat unset variables and parameters other than the
| special parameters "@" and "*" as an error when performing
| parameter expansion
| ykonstant wrote:
| For `set -u` I mostly agree. For `set -e` see my comment
| below and Greg's wiki: http://mywiki.wooledge.org/BashFAQ/105
| deskr wrote:
| > and they still fail to catch even some remarkably simple
| cases
|
| I totally agree. Although I'd say that there isn't anything
| "remarkably simple" about writing a bash script. Anything
| in the shell scripting world that seems remarkably simple
| is just because one hasn't realised the ghosts and horrors
| that lurk in the shadows.
|
| But I'll use -e anytime. It feels like having a protective
| proton pack at least.
| enriquto wrote:
| > We've established that, yes, pathnames can include newlines. We
| have not established why they can do that. After some
| deliberation, the Austin Group could not find a single use-case
| for newlines in pathnames besides breaking naive scripts.
| Wouldn't it be nice if the naive scripts were just correct now?
| Ok, that might be a bit much all at once. We're heading there
| though!
|
| Oh my god. This makes me so happy. This is the most lovely think
| I've read in the world of computing since the unix gods decided
| that newlines were to be a single character.
|
| The philosophy underlying the sentence "Wouldn't it be nice if
| the naive scripts were just correct now?" is incredibly positive.
| We are surrounded by arrogant jerks who break old code by
| aggressively enforcing stricter compliance of some stupid rules.
| But here come these posix heros who do the exact opposite: make
| old code correct! There is hope in mankind after all.
| anal_reactor wrote:
| It's a bandaid on a wider problem: the design of Unix shell is
| bonkers and the whole thing should be deleted. Why? Because I
| haven't seen any other tool ever have so many pitfalls. Take n
| random languages and m random developers and tell them to loop
| over a string array and print its contents, and count how many
| correct programs you get on average per language. There will be
| easy languages, then difficult languages, then a huge gap, then
| Unix shell because in your random sample you managed to get one
| guy who has PhD in bash.
| blueflow wrote:
| Someone needs to come up with a interactive shell first, one
| that is comparable in usability. Then we can think about
| replacing the unix shell.
|
| I tried both python and lua interactively, but they are a
| pain when it comes to handling files. You have to type much
| more to get the same things done.
| anal_reactor wrote:
| The bigger issue is the sheer momentum of Unix shell. Even
| if you come up with an alternative that is better by every
| objectively measurable metric, it's still going to be a
| monumental task to have it packages with commonly used
| distros. Kinda like the "why can't the US switch to the
| metric system" problem.
| blueflow wrote:
| People already use different shells, mksh, fish, and so
| on. With fish there is a non-posix shell in wide use.
| oguz-ismail wrote:
| >wide use
|
| Five people around the globe isn't wide use.
| blueflow wrote:
| I'm sure you might get more than 5 people on HN replying
| to you that they are using fish right now. Say something
| discrediting about fish and they show up.
| fragmede wrote:
| Heh, reminds me of how to get help with Linux back in the
| day. If you directly asked for help, you'd be told to
| RTFM. If you stayed confidently that Windows could do
| something and that Linux sucks because it can't, you'd
| get users tripping over themselves with details and
| instructions,'just to prove you wrong.
|
| Human psychology is fascinating!
| azalemeth wrote:
| There's a direct cost in money, time and lives that has
| come from the US's adherence to their US Customary Units
| (which are often different to the old imperial units).
| People have literally died because of the confusion
| caused by having multiple systems of units in common use
| with ambiguous names (degrees, gallons, etc). Each year
| industry worldwide spends an enormous amount of money
| indirectly precisely because of this problem and it's
| still incredibly unlikely to be fixed within my lifetime.
|
| Bash-alternatives that are not completely compatible
| frankly just don't have a chance.
| stephenr wrote:
| If it isn't distributed out of the box with every _nix-
| like OS, it inherently_ isn't* "better by every
| objectively measurable metric" - distribution of a
| common, stable standard is a huge benefit in and of
| itself.
| blueflow wrote:
| > distributed out of the box with every nix-like OS,
|
| Python and lua are pretty close to that.
| stephenr wrote:
| > Python and lua are pretty close to that.
|
| Python maybe _often_ installed by default but it 's
| definitely not an essential/required package "out of the
| box" on every install. Also, in a thread where one topic
| is how POSIX shell handles whitespace in filenames, it's
| hilarious (not in a good way) that someone suggests a
| language that handles whitespace the wrong way in it's
| own code. Yes, significant whitespace is objectively
| wrong.
|
| What OS/distro is Lua included on _out of the box_? That
| doesn 't mean "available in a package". I mean literally
| included in every single install and cannot reasonably be
| omitted?
|
| Regardless of the availability, the parent comment says
|
| > better by every objectively measurable metric
|
| Neither Python nor Lua are "better" than shell, at the
| types of things shell is commonly used for - they're
| objectively worse.
| blueflow wrote:
| Lua gets onto every other Linux distro as dependency of
| some base system component. For example, rpm or pipewire
| depend on lua. Ubuntu and Debian ship with pipewire per
| default.
|
| You should use the word "objectively" less.
| consteval wrote:
| Even outside of distribution, python and lua aren't
| objectively better. For starters, they're much more
| verbose.
| blueflow wrote:
| I just said that, scroll up.
| nly wrote:
| Oil shell?
|
| https://www.oilshell.org/
|
| Compatible with most bash scripts
| throw16180339 wrote:
| I certainly have my complaints about Powershell, but it's
| got pretty good coverage, decent documentation, and cross
| platform support.
| felixgallo wrote:
| if it weren't so irregular, inconsistent, spotty and
| tasteless, it'd be a great option.
| throwaway19972 wrote:
| > the design of Unix shell is bonkers
|
| Compared to what?
| mdavid626 wrote:
| Powershell?
| oguz-ismail wrote:
| Verbosity is a huge problem there
| consteval wrote:
| Modern programming language designers have a bad
| relationship with verbosity. I don't know why they do
| this.
|
| It's a lang for an interactive shell, typing literally
| translates to developer speed. I understand the want for
| clarity and maybe that's nice in large scripts, but the
| main goal is to be a shell. So, optimize for that. Also,
| you probably shouldn't be using powershell for large
| scripts anyway.
|
| The only recent lang I've seen that has a handle on this
| is Rust. You can tell they put a lot of thought into
| having keywords be as short as possible while still being
| descriptive.
| ggm wrote:
| FoundTheCamelCaseConvert.
|
| My God next you will say getopt() --longform is the
| bestest
| throw16180339 wrote:
| It's been years since I used Powershell, but IIRC there
| are shortcuts for the common commands, e.g. cat, ls, mv,
| rm, and such DTRT.
| Diti wrote:
| Those aliases are, I believe, only defined on Windows
| PowerShell (the closed-source version 5; not PowerShell
| 7). I wish those default aliases you mentioned weren't a
| thing. Especially `curl` (people should use `iwr`
| instead), which is an alias of `Invoke-WebRequest`,
| because it makes the `curl.exe` shipped with Windows
| nearly undiscoverable.
| poincaredisk wrote:
| PowerShell designer could learn from decades of
| programming language progress and especially shell usage.
| They could improve many aspects indeed. This doesn't mean
| that the original design is "bonkers", only that it's not
| perfect.
| enriquto wrote:
| > loop over a string array
|
| Dear anal_reactor, what is a "string array"? I have used unix
| shells since nearly 30 years and never heard about them. And
| I consider myself a script-fu master!
|
| There are two array-like constructions in the shell: list of
| words (separated by spaces) and list of lines (separated by
| newlines). Both cases are implemented as a single string, and
| the shell makes it trivial to iterate through its components.
| ManBeardPc wrote:
| That is exactly the problem many people have with it.
| Encoding ,,arrays" this way is foreign to everyone who
| comes from ,,normal" programming languages. Both variants
| lead to problems because either character can occur in
| elements, worst case scenario they contain both at the same
| time. I can see why this leads to confusion and bugs.
| skydhash wrote:
| It's like people saying they won't learn French because
| it has a different grammatical structure. There's no
| "normal" natural language. If you're used to the C-like
| syntax, learning C-like language will be easy. But that's
| not an argument to say Lisp is confusing.
| ManBeardPc wrote:
| That's why I put normal in quotes. There is however more
| to it than having a different grammatical structure: It
| works different from many commonly used languages that
| have actual arrays/lists where elements can contain
| anything the type allows. If you come from any of the
| common modern programming languages (lets say Java,
| Kotlin, C#, JS/TS, Python, Swift, Go, Rust, etc.) and
| expect something similar (because many of them are very
| similar) you will be confused. Using spaces or newlines
| to encode elements in a single string is just not robust
| and leads to easy to make mistakes.
| skydhash wrote:
| Most of these languages were created long after bash and
| the other shells. The fact is that shell scripts allows
| for unquoted strings and quoting is a specific operation,
| not syntax. Also shell scripts were meant for
| automations, not for writing general programs. The basic
| units are commands, arguments, input, output, files,...
| so the design makes these easy to manipulate.
|
| I'm not saying that we can't improve, but I'm more in
| favor of making the tool more apt to solve a problem than
| making it easier to learn. Because the latter often wants
| to forego the requirement of understanding the problem
| space.
| ManBeardPc wrote:
| Yes, these are newer. I mainly wanted to make the point
| that it is confusing if you are new to bash and come from
| these newer languages with the wrong expectations. The
| concise nature and many subtle details makes it very
| difficult for beginners and infrequent users.
|
| Compare this to the newer programming languages where you
| explicitly call something with speaking names like
| .Trim(), .EndsWith(), support from compiler and IDE.
|
| In my experience automation and general programs often
| are the same thing once things get more complicated. Bash
| scripts usually grow rapidly and are a giant PITA to
| maintain or refactor. Throw in build systems and helper
| scripts and you quickly receive a giant pile of
| spaghetti. Personally I just switch to one the mentioned
| programming languages once it goes above a simple
| sequence of operations.
|
| Personally I don't see how to improve it much without
| becoming a full blown programming language, at which
| point it would probably make more sense to just release a
| library for common automation tasks that is also
| composable. Maybe I'm just not the right target audience.
| skydhash wrote:
| The issue with your otherwise good reply is that someone
| are bringing expectations to an expert tool (programming
| languages, software, OS) and blidly assuming that
| everything will work as he thinks it should. Familiarity
| helps with learning, but shouldn't replace it. Someone
| new to bash should probably start with a book.
|
| And for bigger automation projects, there are lots of
| projects and programming languages that can help.
| ManBeardPc wrote:
| I agree it is an issue but it is how many people work and
| think. Most of the time they are not even wrong. "Hey, I
| have variables and loops, I know that!".
|
| I would even make the case for expert tools being as
| unsurprising and familiar as possible unless there is a
| very good reason for them not to. Also they should be
| robust against misuse and guide the user towards good
| practices. There are always beginners, people that rarely
| need to use it, people that do programming as "just a
| job" and people that make mistakes because they are
| distracted, tired or just human. Something like "rm -r /"
| is a good reminder of that for many people.
|
| Plus there are already a lot of tools required. Reading a
| book about every tool I have to use would be unpractical
| for most projects. Maybe more expert tools should just be
| tools. The same way I can now just use Ubuntu and get a
| working desktop system including drivers for most common
| hardware. If I compare that to the past where I installed
| a Linux distribution and then found out I lack a driver
| for my network card but I need to download it from the
| internet... I still can modify my system if I need to,
| but it's nice that I don't have to. I think we can do
| similar things with many parts of development and free
| some capacity for other tasks.
| dailykoder wrote:
| Works on my machine!
| akira2501 wrote:
| > I haven't seen any other tool ever have so many pitfalls.
|
| I haven't seen any other tool with so much general utility
| and availability.
|
| > to loop over a string array and print its contents
|
| Is incredibly easy in bash and bash like shells. As
| highlighted the issue is that tools like 'ls' don't create "a
| string array." They create one giant string that has to be
| parsed. The rules in the shell are different than in other
| languages but it /will/ do most of the parsing for you, or
| all of it, if you do it carefully.
|
| This is a fine tradeoff. As evidenced by it's wide usage and
| lack of convincing replacements.
| anal_reactor wrote:
| > I haven't seen any other tool with so much general
| utility and availability.
|
| > availability
|
| That's the real reason why we use Unix shell. It's not
| good, but it's available. Like a cheap hooker.
|
| > but it /will/ do most of the parsing for you, or all of
| it, if you do it carefully.
|
| "It mostly works if you're careful" doesn't sound very
| convincing to me.
| stephenr wrote:
| > but it's available. Like a cheap hooker.
|
| Username checks out.
| akira2501 wrote:
| > "It mostly works if you're careful" doesn't sound very
| convincing to me.
|
| Would you rather write your own parser?
| vbezhenar wrote:
| The main problem is using text as a common format between
| different applications.
|
| First: text is not well defined. Is it ASCII? Is it UTF-8?
| Some programs can spew UTF-32 with proper locale configured,
| it's a mess.
|
| Second: encoding and decoding of objects to text is not
| defined at all. Those problems with filenames is just one
| example. Using newline as a separator is a natural thing that
| is easy to implement, yet it is wrong.
|
| In my opinion two things should be done:
|
| 1. Standardise on UTF-8. No other encodings allowed.
|
| 2. Standardise on JSON. It is good enough to serve as
| universal exchange format, tools like `jq` exist for some
| time now.
|
| So any utility must read and write JSON objects with some
| standard env set. And shells can be developed with better
| syntax to deal with JSON. This way you can write something
| like
|
| `ps aux | while read row; do echo ${row.user} ${row.pid};
| done`
| anal_reactor wrote:
| True, but this would be immensely difficult to pull off,
| because how do you convince other people to write programs
| that produce actual working JSON?
| ezoe wrote:
| Don't even assume UTF-something is the only character
| encoding. There are so many existing character encodings
| before Unicode. It's still widely used.
| nly wrote:
| The primary purpose of command line program output is to
| convey information to a human, not to other programs.
|
| Command line scripting is _supposed_ to be adhoc and hack.
| mdavid626 wrote:
| I disagree that it _supposed_ to be adhoc and hack. Look
| at PowerShell.
| anthk wrote:
| That under limited OSes such as DOS. Under Unix, piping
| has been _the_ philosophy.
| consteval wrote:
| There are exchange formats that are well-defined enough
| to be useful to many computers while also being readable
| enough to be traversed by human eyes. There's no reason
| to everything ad-hoc, you don't get much by that. You
| also control the shell itself - there's no reason you
| can't display object representations in a pretty way.
| pif wrote:
| > The main problem is using text as a common format between
| different applications.
|
| If you can't get the immensity of the cleverness of Unix
| foundations, you should not talk about them.
|
| That idea is what made it possible for you to type that
| sentence in the first place.
| arghwhat wrote:
| What cursed madness have you hit that spits out UTF-32
| under normal conditions?! That can only be a bug -
| UTF-32/UCS-4 never saw external use, and has only ever been
| used for in-memory fixed-width character representation,
| e.g. runes in Go.
|
| You never have to worry about whether you're dealing with
| ASCII vs. UTF-8, but rather if you're dealing with UTF-8
| vs. ISO-8859-1, or worse, Shift JIS or similar.
| vbezhenar wrote:
| I think that I hit that with Java: %
| java -Dfile.encoding=UTF-32 Test | hexdump -C
| 00000000 00 00 00 48 00 00 00 65 00 00 00 6c 00 00 00
| 6c |...H...e...l...l| 00000010 00 00 00 6f 00
| 00 00 2c 00 00 00 20 00 00 00 77 |...o...,... ...w|
| 00000020 00 00 00 6f 00 00 00 72 00 00 00 6c 00 00 00
| 64 |...o...r...l...d| 00000030 00 00 00 0a
| |....| 00000034
|
| From quick googling it seems that glibc does not support
| it, so it should not happen.
| oneeyedpigeon wrote:
| I think a lot of tools should support json as well as plain
| text. Probably the latter by default, and the former with a
| "-o json" or similar option. I'm fine with wc giving me
| `5`, I'd prefer that to `{ "characters": 5 }`.
| aloisklink wrote:
| POSIX does actually define what a "text file" is, but the
| definition is a bit unusual:
|
| See https://pubs.opengroup.org/onlinepubs/9799919799/basede
| fs/V1...
|
| > 3.387 Text File
|
| > A file that contains characters organized into zero or
| more lines. The lines do not contain NUL characters and
| none can exceed {LINE_MAX} bytes in length, including the
| <newline> character.
|
| So, if you have some non-printable characters like
| BEL//ASCII 0x07, that's still a text file.
|
| (and I believe what bytes count as a valid character depend
| on your `LC_CTYPE`).
|
| But the moment you have a line longer than {LINE_MAX} bytes
| (which can depend on which POSIX environment you have),
| suddenly your text file is now a binary file.
| WJW wrote:
| Kind of a weird definition indeed. One edge case: the
| definition states the file must contain characters, so
| presumably zero length files are out. But then how could
| you have zero lines?
| rascul wrote:
| An empty file is not hard to make. It's just a matter of
| creating the file and not writing to it.
| WJW wrote:
| Yes obviously. But the POSIX specification for a "text
| file" as above is that it contains characters, which an
| empty file by definition does not. So an empty file
| cannot be a text file if you read that specification
| strictly, and therefore you cannot have zero lines in a
| text file. As soon as you have a single character there
| is at least one line, and the amount of lines can only
| stay the same or grow from there.
|
| The definition should read "one or more lines" instead or
| (probably better) specify that a text file contains "zero
| or more characters".
| rascul wrote:
| Ahh I see what you're saying. I misunderstood at first.
| Ukv wrote:
| POSIX defines a line as:
|
| > 3.185 Line
|
| > A sequence of zero or more non-<newline> characters
| plus a terminating <newline> character.
|
| So a file with some characters but no trailing newline is
| reported by `wc -l` as having zero lines.
| poincaredisk wrote:
| >It is good enough to serve as universal exchange format,
| tools like `jq` exist for some time now.
|
| Please don't use that underdefined joke of a spec. Define
| "PosixJson" and use that instead. Right now it's not even
| clear what the result of parsing {"a": 1234678901234567890}
| is. Is this a parse error? A bigint? A float/double? Quiet
| wraparound? Something else? I've seen all these behaviors
| in real world JSON implementations across different
| languages.
| matrss wrote:
| JSON itself is bad for a streaming interface, as is common
| with CLI applications. You can't easily consume a JSON
| array without first reading it in its entirety. JSONL would
| be a better fit.
|
| But then, how well would it work for ad-hoc usage, which is
| probably one of the biggest uses of shells?
| zelphirkalt wrote:
| This should not be as downvoted as it is. In a way shell is
| broken. The brokenness is in that it requires each command to
| serialize and deserialize again, considering all the weird
| things that can happen with the "all is a string" kind of
| approach, instead of having a proper data interchange format
| or even sending objects to next steps in the pipeline. This
| behavior is what necessitates even thinking about the changes
| listed in the post. We wouldn't even have that problem, if
| the design of shell was better thought out. Now we are
| dealing with decades of legacy built on these shaky
| foundations. I hate to admit it, but seems at least this
| aspect Powershell got right, whatever one may think about the
| rest of it.
| chasil wrote:
| On my rhel7 system, the Debian dash shell is this large:
| $ ll /bin/dash -rwxr-xr-x. 1 root root 113536 Nov 5
| 2018 /bin/dash
|
| I happen to have an old powershell installed:
| $ rpm -qi powershell | grep Size Size :
| 126588370
|
| A strict POSIX shell is always going to be vastly smaller,
| for many reasons.
|
| I would prefer that the POSIX shell was an LR-parsed
| language, but you can't have everything.
| nneonneo wrote:
| Rather unfortunately, I happen to have a handful of files on my
| machine with newlines in them (the filenames were
| programmatically generated from a summary of their contents). I
| loathe the possibility that my shell tools are going to
| suddenly crash when confronted with these weird files, rather
| than just producing some slightly silly output. I wish we'd
| standardized the behaviour of just escaping such characters as
| `\n/\r` or `^J/^M`...
| nasretdinov wrote:
| The thing is, it's hard to predict what would happen to those
| scripts regardless... E.g. try naming your files "-rf" and
| see how many things break :)
| redserk wrote:
| If one really wanted to embrace chaos, introduce this as a
| new team file naming standard for "risk finding" files ;)
| tetha wrote:
| I do enjoy "ls *; touch -- -lisah; ls *" as a fun little
| brainteaser for those uninitiated to this behavior.
| ykonstant wrote:
| A correct script will have no problems with "-rf" or any
| other file name. I have (and recommend script writers make
| their own) a directory hierarchy of "dangerous" file names
| to test scripts.
|
| For example, it contains a directory where all file and
| subdirectory names are in unary, consisting only of
| repetitions of the newline character. A correct script
| should be able to enumerate, access and modify files in
| there without issue.
| nneonneo wrote:
| export TMPDIR=" / "
|
| to surprise the next person or script to do "rm -rf
| $TMPDIR/foo"...
| ykonstant wrote:
| In academia, I get (and used to create) pdfs with names like:
|
| "On the number of
|
| associative foobars
|
| of degree blah -
|
| Johnson and Anderson.pdf"
|
| all the time. It is very convenient for non-technical
| academics to have a descriptive file name, and to be able to
| see it entirely in the navigator they use newlines.
| oneeyedpigeon wrote:
| Oh god. I already get upset enough by spaces in a file
| name, although I realise that fight is basically lost now!
| enriquto wrote:
| As a fellow spaces-in-filenames-hater, the fight is not
| lost. We are on the brink of winning it; it's just a
| mount option away!
|
| While we cannot avoid that people hit the spacebar when
| writing a filename on a gui, this does not mean _at all_
| that the resulting filename itself need contain a plain
| space character. Those spaces can and should be
| transparently translated to non-breaking space characters
| at some point. Maybe by the gui itself, or more robustly
| by the filesystem. This would make everybody happy: gui
| users and naive shell script writers.
| poincaredisk wrote:
| >Those spaces can and should be transparently translated
| to non-breaking space characters at some point
|
| Why? This just introduces more complexity and
| interoperability headaches for seemingly no reason.
| enriquto wrote:
| > Why?
|
| In order to preserve the sacrosanct simplicity of naive
| shell scripts. Seems like a very noble goal to me.
|
| The only unexpexted compexity arises when you want to
| deal with filenames having mixed spaces and nbsps. But
| I'd say that people who do that had it coming.
| alexvitkov wrote:
| If you want simple shell scripts to work, make an
| actually good shell language without all the footguns.
|
| The filesystem is way more important than /bin/sh and and
| any complexity added there will trickle down to all
| programs, not just shell scripts.
|
| It's not worth adding hacks on the FS to patch defects in
| poorly written shell scripts (which are being replaced en
| masse with python/nodejs/even weirder yaml files/systemd
| units/etc... anyways)
| ksp-atlas wrote:
| nushell uses real lists for things which means you don't
| need to care about seperators except when dealing with
| external system things
| consteval wrote:
| Whitespace in filenames in general is difficult to deal
| with. Many, maybe most, programs get it wrong. It's not
| just about shell scripts, many GUI programs fail to
| handle those files properly too.
| alexvitkov wrote:
| When GUI programs mishandle filenames with spaces, IME
| it's usually because they spawn a subshell in a naive way
| (system("rm " + filename)).
|
| To mishandle spaces you have to split an input w/
| filenames by whitespace, which is not that common of an
| operation outside of a shell.
| InfiniteRand wrote:
| My favorite file+space issues is spaces at the end of
| file names, especially when you copy and paste text, or
| text gets trimmed from an input box, or the person
| forgets to trim space from an input box...
| nradov wrote:
| The vast majority of Windows and MacOS programs get it
| right.
| arp242 wrote:
| Eh? It's really not a bother in pretty much any
| programming language, and you don't really need to do
| anything special for it. I don't know any program that
| has any problems with it.
|
| Even zsh has fixed this. It's just /bin/sh and bash that
| are annoying.
| lifthrasiir wrote:
| Simplicity doesn't always mean stupidity. The simple but
| functional shell that correctly handles whitespaces
| without much hassle was already available since 90s,
| namely rc which is also found in Plan 9. Adopting rc's
| string concatenator `^` in POSIXy shells shouldn't be too
| hard.
| vbezhenar wrote:
| Yep, works today: sh-3.2$ f='Hello
| world' sh-3.2$ echo $f Hello world
| sh-3.2$ for i in $f; do echo $i; done Hello
| world sh-3.2$ f='Hello\xC2\xA0world'
| sh-3.2$ echo $f Hello world sh-3.2$ for i
| in $f; do echo $i; done Hello world
| stouset wrote:
| Just always quote variable interpolation and you will
| never have problems. sh-3.2$ f='Hello
| world' sh-3.2$ echo "${f}" Hello world
| sh-3.2$ for i in "${f}"; do echo "${i}"; done
| Hello world sh-3.2$
| chasil wrote:
| It would be really nice if there was a mount option that
| would quietly remove spaces in filenames, or convert them
| to an underscore.
|
| If I had it, I would use it today.
| curt15 wrote:
| Didn't Windows name "Program Files" with a space to force
| application developers to handle spaces in paths
| properly?
| ape4 wrote:
| Not to mention C:\Program Files (x86)
| account42 wrote:
| And C:\Programme and other localized variants to force
| people to go through the proper APIs instead of
| hardcoding paths.
| pjmlp wrote:
| In theory yes, in practice to this day many people don't
| bother how to learn how to deal with pathnames in a
| proper way.
| inetknght wrote:
| Top difficulties in computer science:
|
| 1. naming things
|
| 2. cache coherency
|
| 3. off-by-one errors
|
| ???
|
| 4. quoting pathnames
| nneonneo wrote:
| Eh, maybe. In practice I usually do all my moderately-
| heavy filesystem scripting in Python these days, for
| which pathname quoting is just a complete non-issue. Of
| course, I still use a shell for quick-and-dirty stuff,
| but usually only for pretty simple tasks where the
| simplest quoting setup ("$i") suffices.
| hawski wrote:
| I would replace 4 with parameter expansion rules.
| mmcdermott wrote:
| For the longest time you could get away with this in cmd:
|
| > dir c:\progra~1
|
| So if forcing people to handle spaces was the goal, it
| took a long time to force it.
| arp242 wrote:
| I'm pretty sure that still works. I forgot the exact
| scenario, but my Windows CI on GitHub Actions output
| shorte~1 pathna~1 like that in a script just a few months
| ago. On one hand, the backwa~1 compati~1 is nice. On the
| other hand, there's just so much depreca~1 cruft that
| keeps popping up even on contemp~1 systems.
| InfiniteRand wrote:
| I just got used to installing things I need to interact
| with in a program into a folder named C:\workspace
| jonhohle wrote:
| Convince (force?) your team to use make and soon everyone
| will forget spaces in file names are even a thing!
| taneliv wrote:
| My team already uses `make` but there's no reason for me
| to run it in my Downloads folders. File names in there
| are sometimes wild. Yet I expect command line tools to
| work with them. If they will cease to do so, I will have
| to start using non-POSIX variants of those tools, I
| guess.
| Flimm wrote:
| How do these non-technical academics even create a PDF file
| with a name like that?
| ykonstant wrote:
| Right click, rename, enter, enter, enter (until the
| entire file name is visible on the box)? That's how I did
| it when I used Windows.
|
| Edit: now I remember the most basic way: open the pdf,
| select and copy the title, click on rename and paste from
| clipboard. Works great to get the file name with the
| newlines exactly as they are on the title!
| zelphirkalt wrote:
| Doesn't <enter> just confirm the typed input for the
| filename and finish the renaming? How does that insert
| newlines?
| ykonstant wrote:
| Shrug, I last used windows with Windows 7, so you are
| probably right. That being said, at least two of the
| students I am currently tutoring are on XP and one of my
| colleagues as well :D
| pino82 wrote:
| No, it was always this way.
| ykonstant wrote:
| Right, I just remembered the main way to create those
| filenames: open the pdf, select and copy the title,
| close, rename the file and paste from clipboard.
| astroid wrote:
| Yes - I just tested on Win10+11 because I thought "there
| is no way I didn't accidentally do something like this on
| accident... and I would have remembered seeing a new line
| in my file name when I made that mistake."
|
| I just opened a folder in file explorer, clicked 'rename'
| and then tried the following combinations: Enter L Ctrl +
| Enter L Alt + Enter Win + Enter R Ctrl + Enter R Alt +
| Enter
|
| None of them let me put new lines in the filename - it
| either did nothing, or 'closed' the rename view.
| abenga wrote:
| I don't know if this is a Linux thing, but when renaming
| a file, when I press enter, I apply the new name, the
| file manager doesn't add a newline.
| jodrellblank wrote:
| I don't know who "the Austin Group" mentioned in the
| article are, but how come they "could not find a single
| use-case for newlines in pathnames besides breaking naive
| scripts" when legitimate use-cases are so easy to find?
|
| (And if they're that incompetent, why does the article
| imply they are worth quoting and listening to?)
| gpderetta wrote:
| It is [1] the joint working group that for the last 25+
| years has been responsible for both the POSIX standard
| and the Single Unix Specification. It emerged after the
| UNIX wars as a consolidation of the various splintered
| UNIX standardization efforts (POSIX itself, X/OPEN, OSF,
| etc).
|
| [1] https://en.wikipedia.org/wiki/Austin_Group
| nilamo wrote:
| Is that legitimate? A path name is just a unique
| identifier for a file, IMO it doesn't make sense to put a
| whole novel in there. If anything, a giant summary like
| that should be in the meta tags?
| jodrellblank wrote:
| In what way is it not legitimate? It's not an accident,
| bug or data corruption. Someone put it there for a
| reason, and it benefits their use case. That's as
| legitimate as it gets.
| ykonstant wrote:
| That's a core part of the problem: a path name is NOT
| just a unique identifier for a file. Desktop operating
| systems and their classical utilities conflate the
| "unique identifier" and whatever "displayed title" of a
| file though which the end user interacts with the file.
|
| Users care about "titles" or "summaries" of files, not
| "filesystem identifiers"; as long as the two are
| conflated, non-technical users will use the identifier to
| write titles and thus make the file easy to locate in an
| interactive GUI. Meta tags are not even in the cognitive
| horizon of most people.
| ykonstant wrote:
| I am interested in hearing the rationale for downvotes
| explicitly. I am describing a reality that exists and must
| be taken into account. Why are you downvoting?
| jraph wrote:
| They did the right thing for this: make the tools fail on
| file creation, but not on existing files.
|
| I guess it's still advisable to rename those files, I don't
| know how things like cp, mv or rsync will behave when copying
| such files in the future.
| ykonstant wrote:
| If your file system allows them, be careful with symlinks
| though!
| jraph wrote:
| Why, specifically?
|
| I'm convinced we will need to be careful with symbolic
| links related to new line characters in filenames, but
| I'm curious of which specific aspect you had in mind.
| ykonstant wrote:
| Oh, nothing specific to newlines. Just, when you rename
| files to fix newlines, you need to check if they break
| symlinks pointing to them.
|
| For instance, I had project folders for my individual
| research projects. In order to have a central repository
| of resources and not have copies of multi-megabyte pdfs
| in each folder, I put all referenced papers in a single
| directory and symlinked them for each project that needed
| them. Later, I wanted to rename the papers to remove
| newlines. The symlinks complicated this process quite a
| bit!
| nneonneo wrote:
| No, they did not do the right thing:
|
| > the following utilities are now either encouraged to
| error out if they are to create a filename that contains a
| newline, and/or encouraged to error out if they are *about
| to print a pathname that contains a newline* in a context
| where newlines may be used as a separator
|
| It then proceeds to list a bunch of utilities including
| diff, file, find, grep, head, du, etc., none of which
| create files directly.
|
| These utilities could be updated to reject newlines in file
| paths if they're going to print in a "newline delimited"
| form - but for some of these utilities, that's the _only_
| available form.
| jraph wrote:
| > error out if they are _about to print a pathname that
| contains a newline_ in a context where newlines may be
| used as a separator
|
| But that's already broken. This is a situation where
| filenames with newlines in them are indistinguishable
| from two filenames in outputs. So instead of producing
| subtly broken output, tools are _encouraged_ (not forced)
| to explicitly fail with a lot of noise.
|
| The "in a context where newlines may be used as a
| separator" part of this sentence is very important.
|
| IIUC the tools are still allowed to succeed in non broken
| situations, for instance when a null separator is used
| and not a new line character. And I can't imagine the
| tools you listed will start breaking in situations that
| worked (apart from file creation - indeed _this_ will
| likely start breaking, and new line characters in
| filename needs to be considered deprecated and things
| using them to be fixed).
|
| This is strictly better IMHO (if one thinks that newlines
| in files are not worth the troubles given how things work
| in POSIX, especially the part where things are line-based
| and new line characters have quite some significance)
| stouset wrote:
| Dude, just fix the filenames.
| ezoe wrote:
| Don't assume UTF-8 is the only character encoding used in the
| wild. There are character encoding with leading bytes not
| easily detectable like UTF-8.
| arghwhat wrote:
| In 2024, if you don't get the correct result decoding a text
| as UTF-8, the bug is the text, not the decoding. And luckily,
| adoption of UTF-8 in the past 30+ years have gone will enough
| that you don't need to worry.
|
| Caveats for cursed hardware standards demanding two-byte
| encodings like USB.
| poincaredisk wrote:
| I hope you're happy in your ivory tower, but I personally
| work with a lot of files with other encoding, most often
| that weird utf16 (Windows), sometimes also legacy files
| with different ANSI encoding. Declaring "my decoder is
| fine, it's the text that is buggy" is not going to score a
| lot of points with my boss and clients.
| zelphirkalt wrote:
| Shouldn't hurt to tell clients to right their weird
| proprietary software originated encodings though.
| arghwhat wrote:
| The only valid reason for still having files stored in
| legacy ANSI encodings is that their only use is input to
| software that has not been maintained for ~30 years and
| cannot be updated. That's fine because they're just
| binary inputs in a closed ecosystem that no one touches.
|
| But if they are supposed to be treated as text, then yes
| it's the text that's buggy - they should just be
| converted to UTF-8 once and have the originals thrown
| away.
|
| UTF-16 is something that Microsoft has cursed us with by
| inserting it into specifications (like USB) so that we
| cannot get rid of it, even if it never made any sense
| what so ever. But those are in effect explicit protocols
| with a hard contract, very different from something where
| you would "assume an encoding".
| 1oooqooq wrote:
| why people assume utf8 had only know locale encoding still?
|
| you're probably guilty of the sin you preach and is showing
| wrongly decoded utf8 and don't even know.
| account42 wrote:
| Their proposed solution is not compatible with reality though
| where POSIX does not get to define what kind of files exist on
| filesystems you need to work with.
|
| All they did is introduce new error cases in C programs while
| not actually fixing anything for shell scripts.
|
| If anything, it's going to result in more exploits as people
| write shell scripts with the assumption that newlines cannot
| appear in filenames.
| quotemstr wrote:
| In the real world, nobody writes shell scripts that handle
| newlines in filenames.
| account42 wrote:
| I do. Single files are handled with quotes around arguments
| just fine. For lists of files you need to use NUL as a
| separator. That's not really hard to do once you are aware
| of the problem but ergonomics could be better - which is
| something useful that POSIX could change.
| zokier wrote:
| But they did not make old code correct. Filenames are still
| allowed to contain newlines. Shell scripts still need to be
| prepared to deal with that. Nothing really changed, they just
| added a feel-good half-measure.
| quotemstr wrote:
| It's a step in the right direction. You have to understand
| that for decades a vocal group of Unix die-hards has opposed
| any limitations whatsoever on the bytewise content of file
| names. The newline restriction in this latest version of
| POSIX may be modest, but it represents a dam breaking. When
| (obviously) the sky doesn't fall, the next version of POSIX
| will have a lot more filename cleanup.
| janderland wrote:
| This is pretty standard for a human run system. Gotta make
| the human feel good about an idea before they can do said
| idea.
|
| If you're not familiar with humans, there are several manuals
| available online.
| hwc wrote:
| Now do that with all whitespace!
| relistan wrote:
| The history at the beginning of this is not correct. Two
| examples: the assertion that there was one compatible UNIX prior
| to United States vs AT&T, the statement that GNU and BSD started
| that same year. Very, very off.
| unixhero wrote:
| Okay, but you would add more value if you could also state what
| is the correct order if things.
| relistan wrote:
| https://en.m.wikipedia.org/wiki/History_of_Unix#/media/File%.
| .. is a good visual of (many of, not all) the various
| versions of UNIX and when they were released. BSD was first
| released in 1978. United States v. AT&T was implemented in
| 1984 (judgment 1982) GNU was first created in 1983.
| johnisgood wrote:
| > Anyway, POSIX 2024 now requires c17, and does not require c89
|
| I wish it would have been c99. What does c17 add exactly, more
| C++-esque complexity or not? Why was it not c99 (or perhaps even
| c11) over c17? Genuine questions.
| lifthrasiir wrote:
| > What does c17 add exactly, more C++-ish bullshit or not?
|
| Multithreading support and such (atomics, thread-local storage
| and a guarantee that `errno` is in TLS), explicitly aligned
| types and allocations, dedicated types for strings known to be
| Unicode, _Noreturn, _Generic, _Static_assert, anonymous structs
| and unions in the nested position, quick_exit, timespec,
| exclusive mode ("x") in f[re]open, CMPLX macros.
|
| I'm not even sure which one can be C++-ish bullshit possibly
| except for about two points:
|
| - Multithreading does seem farfetched for causal users. In
| fact, I do think it could have been minimized without any
| actual harm, but multithreading itself needed to be specified
| because it greatly affects a memory model. (Before C11, C had
| no thread-aware memory model and different threading
| implementations were subtly different beyond what the standard
| stated.) Even JavaScript, originally without no notion of
| threads, eventually got a thread-aware memory model due to
| shared web workers. But that never meant JS itself need
| multithreading support in its standard library, and C could
| have done the same.
|
| - `_Generic` is even more debatable, though I believe it was
| the only way forward when we accept <tgmath.h>, which is known
| to be a response to Fortran (other responses include
| `restrict`) and was impossible to implement in the portable
| manner before C11. As long as it retains its scary underline
| and title case, I guess it's fine.
| johnisgood wrote:
| You quoted me before my edit, but fair enough. I do like the
| "atomics" support.
|
| > "guarantee that `errno` is in TLS"
|
| I suppose that does not mean that I can just avoid setting
| errno to 0 before calling a function after which I check for
| errno, right?
|
| Yeah, I do have an issue with stuff like "_Generic" but I
| assume I can just simply not use it.
|
| What is "quick_exit" exactly and what does it solve?
|
| As for multithreading, I stick to phtread. Is any of the new
| features a replacement for that or what?
|
| At any rate, why C17 over C11 then?
| lifthrasiir wrote:
| C17 is a bugfix version of C11 (the next major revision
| would be C23). The exact list of fixes is available in [1].
| Mandating C11 instead of C17 when both are available seems
| not really useful now.
|
| You have the correct insight about errnos. The new
| guarantee only means that other threads are not possible to
| mess with your errnos, but cleaning errnos will be still
| useful within an individual thread.
|
| exit is not guaranteed to work correctly when called
| simultaneously from multipe threads, while quick_exit will
| be okay even in that situation. I think this behavior was
| not even specified before C11, and only specified after
| observing existing implementations.
|
| It is expected that libc threading routines are thin
| wrappers around pthread in Linux. That's why I do think it
| can be minimized; the only actual problem before C11 was
| the lack of thread-aware memory model. No need to actually
| be able to create threads from libc to be honest,
| especially given that each platform now almost always has a
| single dominant threading implementation like pthread.
|
| [1] https://www.open-
| std.org/jtc1/sc22/wg14/www/docs/n2244.htm
| johnisgood wrote:
| My last question would be: is it "OK" to use phtread in
| my code or are there any alternatives (i.e. "best way")
| when using C17?
| lifthrasiir wrote:
| No, just use pthread. There are some useful pthread APIs
| missing from C17 anyway too.
| johnisgood wrote:
| Thank you for your answers, it is much appreciated.
|
| I suppose I will not use "quick_exit" either in that
| case, I have many workers, there is a job queue mutex,
| along with phtread_cond_wait and
| phtread_mutex_{lock,unlock} and when the "job_quit_flag"
| is set to true, that means all jobs are done and I am
| ready to return NULL.
| gpderetta wrote:
| Most importantly posix already has existing multithreading
| facilities in posix threads, so it is imperative that they
| are reformulated in term of the C++11/C11 memory model.
| cryptonector wrote:
| > guarantee that `errno` is in TLS
|
| I mean, that is already true.
| ggm wrote:
| File names with / in them
| oguz-ismail wrote:
| Nitpick re: https://blog.toast.cafe/posix2024-xcu#fn:6
| SRC != ls *.c
|
| is fine in a makefile as far as POSIX is concerned, because:
|
| > _Applications shall select target names from the set of
| characters consisting solely of slashes, hyphens, periods,
| underscores, digits, and alphabetics from the portable character
| set_
| guerrilla wrote:
| Why was `isascii()` removed?
|
| (Listed in the Sortix article linked in OP.)
| oguz-ismail wrote:
| It would yield false-positives with non-UTF-8 encoded text.
| Big5 <https://en.wikipedia.org/wiki/Big5#Encoding> in
| particular was notorious for using ASCII values for trailing
| bytes. I don't know if it's still in use or if there are
| others.
| EdSchouten wrote:
| strlcpy()!
| pelorat wrote:
| TIL the POSIX standard is still updated. Does it still suffer
| from the issues that make Linux break POSIX compatibility in some
| areas because they consider it a flawed standard?
| chasil wrote:
| - find(1p) now supports -print0 - xargs(1p) now supports
| the -0 argument - newlines in filenames now should throw
| errors in many utilities - a complier implementing the c17
| standard is now required - ulimit is expanded -
| renice can use relative values - a timeout utility has been
| added - make adds support for $^ $+ ::= :::= != ?= +=
| - logger is improved - gettext is adopted - readlink
| and realpath are adopted - rm now supports -d to remove
| empty directories and -v for verbose - various improvements
| to printf, sed, test
| greyw wrote:
| Looks like the BSD-family will have some implementing to do.
| sneed_chucker wrote:
| Strict adherence to POSIX isn't a goal of any of the current
| BSDs is it?
| bryanlarsen wrote:
| I'm confident they'd accept patches.
| chasil wrote:
| I just booted OpenBSD 7.0 (which is a bit dated).
|
| The find utility has print0, and xargs has -0. Notibly, xargs
| also has -P for running processes in parallel.
|
| rm has both -d and -v.
|
| The renice command appears to be able to use relative
| adjustments with -n.
|
| There is a timeout command.
|
| There is a readlink command, but no realpath (but a manual
| page exists for it as a system call).
| pabs3 wrote:
| Since old-POSIX systems will be in use for some time, I wonder
| how many things will be able to switch to using the new
| capabilities. And how many OSes already support all of the new
| changes.
| donatj wrote:
| To build an internationalized shell script I'll need to compile
| multiple .mo language files and distribute them along side the
| script itself.
|
| For shell scripts part of a large system, that's probably fine.
| For small scripts, that's not very practical. You are not only
| adding a compilation step, you're also requiring distribution of
| multiple files. That's a pain.
|
| It just kind of kills the convenience of a simple shell script. I
| would probably end up writing a makefile to manage all of this
| and at that point I am only a hop skip and jump away from using a
| compiled language instead of shell.
| quotemstr wrote:
| > We've established that, yes, pathnames can include newlines. We
| have not established why they can do that. After some
| deliberation, the Austin Group could not find a single use-case
| for newlines in pathnames besides breaking naive scripts.
| Wouldn't it be nice if the naive scripts were just correct now?
|
| Finally. Now let's do the rest:
| https://dwheeler.com/essays/fixing-unix-linux-filenames.html
|
| Filenames should be boring printable normalized UTF-8. I have
| never, not once, seen a good reason that a filename should be
| able to contain random binary gobbledygook
| cryptonector wrote:
| > Filenames should be boring printable normalized UTF-8. I have
| never, not once, seen a good reason that a filename should be
| able to contain random binary gobbledygook
|
| Ensuring normalization is hard. Where should you do it? There's
| only one good place: in the filesystem. But if you normalize on
| create then you'd better use the same form that everyone else
| uses, but, what's that? Input methods _generally_ produce NFC,
| but there 's no guarantee that they will not produce something
| else. HFS+ normalizes to NFD on create.
|
| ZFS uses form-insensitivity -- much like case-insensitivity,
| but for form. The reason ZFS went this was exactly that HFS+
| and input methods differ as to forms. I pushed hard for this
| way back when. IMO form-insensitivity is the best way forward.
|
| But as for guaranteeing that filenames are UTF-8... that's much
| harder. The best thing to do is to not allow the use of non-
| UTF-8, non-ASCII, non-C locales -- not a guarantee, but pretty
| good.
| nh2 wrote:
| > future editions will not require c17, but will simply require
| whatever C specification version is the most modern and already
| implemented by major toolchains
|
| Is this really good?
|
| If you can't rely on anything concrete being guaranteed, and it
| is open to interpretation what "modern" or "major toolchains"
| are, why have a standard?
| cryptonector wrote:
| > The problem is that pathnames2 (as per section 3.254 of POSIX
| 2024) are just strings (meaning they can contain any bytes except
| the NUL character), [...]
|
| Pathnames can neither contain NUL nor '/'.
|
| Re: `find -print0` / `xargs -0`:
|
| > Previous POSIX releases have considered -print0 before, but
| never ended up adopting it because using a null terminator meant
| that any utility that would need to process that output would
| need to have a new option to parse that type of output.
|
| What nonsense. Just add the `-0` or similar options as needed.
|
| > More precisely, this approach does not resolve our original
| problem. xargs(1p) can't sort, and therefore we still have to
| handle that logic separately, unless sort(1p) also grows this
| support, even after read(1p). This problem continues with every
| other type of use-case. Importantly, it breaks the
| interoperability that POSIX was made to uphold.
|
| More nonsense.
|
| > A bunch of C functions3 are now encouraged to report EILSEQ if
| the last component of a pathname to a file they are to create
| contains a newline (put differently, they're to error out instead
| of creating a filename that contains a newline).
|
| Ok, that's tolerable. Ditto utilities (notice here they were able
| to make a list of utilities).
| chasil wrote:
| Note that GNU sort has...
|
| -z, --zero-terminated: end lines with 0 byte, not newline
| InfiniteRand wrote:
| I kind-of would like to see a POSIX-strict profile which
| incorporates commonsense (by commonsense I mean avoiding things
| that repeatedly over many years have tripped up programmers in
| frustrating ways) things like no newline in file names. Operating
| systems (or distributions) or could opt into this profile, and
| then someone programming on such an operating system could rely
| on the constraints of the profile and additional facilities could
| be added on that might need to rely on those constraints.
| Hopefully, gradually the use of the profile would spread.
___________________________________________________________________
(page generated 2024-10-29 23:01 UTC)