[HN Gopher] Why not parse `ls` and what to do instead
___________________________________________________________________
Why not parse `ls` and what to do instead
Author : nomilk
Score : 121 points
Date : 2024-06-23 05:24 UTC (2 days ago)
(HTM) web link (unix.stackexchange.com)
(TXT) w3m dump (unix.stackexchange.com)
| fellerts wrote:
| The title omits the final '?' which is important, because the
| rant and its replies didn't settle the matter.
|
| Shellcheck's page on parsing ls links to the article the author
| is nitpicking on, but it also links to the answer to "what to do
| instead": use find(1), unless you really can't.
| https://mywiki.wooledge.org/BashFAQ/020
| noobermin wrote:
| Posts like these are like the main character threads on twitter
| where someone says, "men don't do x" or "women aren't like y." It
| just feels like people outside of you who have no understanding
| of your context seem intent on making up rules for how you should
| code things.
|
| Perhaps it would help to translate this into something more like,
| "what pitfalls do you run into if you parse `ls`" but it's hard
| to get past the initial language.
| qsort wrote:
| When we say "don't do X" we mean "the obvious way is wrong". If
| you have enough knowledge to ignore the advice, you likely are
| already aware of the problems with the obvious solution.
|
| I'm pretty sure you can come up with scenarios where parsing
| the output of "ls" is indeed the simplest solution, but that
| kind of article is supposed to discourage people who _don 't_
| know better from going "oh, I know, I'll just parse the output
| of ls". As a general advice, people should indeed be pointed
| towards "man find" or "man opendir 3".
| g15jv2dp wrote:
| What to do instead: use pwsh to completely obviate all these
| issues.
| Qwertious wrote:
| Or, once it's API-stable, use nushell.
| DonHopkins wrote:
| Python has had an API-stable module for listing directories
| for decades, you know.
| Aerbil313 wrote:
| Do you recommend it? I feel like I'd get RSI from pressing
| shift when using it. https://learn.microsoft.com/en-
| us/powershell/module/microsof...
| gfv wrote:
| Powershell is mostly case-insensitive, and most of the core
| cmdlets have short aliases. Try `Get-Alias` (or `gal`) to
| learn more.
| shiandow wrote:
| It's case insensitive for what it's worth. My main problem is
| trying to figure out which utilities they've bundled into
| which command.
| g15jv2dp wrote:
| Yes, I absolutely recommend it. I use it every day.
|
| Commands and flags are case-insensitive.
| DonHopkins wrote:
| Isn't it ironic that Powershell from Microsoft is so much
| vastly superior than bash, not because it's great or even
| better than Python, but because bash is such a terribly low bar
| to beat, that it totally undermines the "Unix Philosophy"?
|
| Who would have thought that little old Microsoft, purveyors of
| MSDOS CMD.EXE, would have leapfrogged Unix and come out with
| something so important and fundamental as a shell that was
| superior to all of Unix's "standard" sh/csh/bash/whatever
| shells in so many ways, all of which historically used to be
| and ridiculously still are touted by Unix Supremacists as one
| of its greatest strengths?
|
| You see, Microsoft is willing to look at the flaws in their own
| software, and the virtues of their competitors' software, then
| admit that they made mistakes, and their competitors did
| something right, and finally fix their own shit, unlike so many
| fanatical monolinguistic Unix evangelists.
|
| They did the exact same thing to Java and JavaScript, leaving
| Visual Basic and CMD.EXE behind in the dustbin of history --
| just like Unix should leave bash behind -- resulting in great
| cross platform languages like C# and TypeScript.
|
| Edit: that reinforces my point that taking so long to get there
| is a hell of a lot better than taking MUCH LONGER to NOT get
| there.
|
| Maybe bash's legacy inertia is a problem, not a virtue. Is
| certainly isn't getting a JSON parser in the foreseeable
| future. The ironic point is that even Microsoft's power shell
| has much less legacy inertia, and therefore is so much better,
| in such a shorter amount of time.
| crispyambulance wrote:
| > Isn't it ironic that Powershell from Microsoft is so much
| vastly superior than bash
|
| I agree that powershell is _now_ better than bash. But it
| took SO LONG to get there. Moreover, bash has had a 12 year
| head-start (ok, 30 if you count earlier unix shells). Bash
| has legacy inertia. Even though you can now supposedly run
| powershell in linux, I don 't know anyone who does. Does
| anybody?
|
| That said, I think powershell is great for utility-knife uses
| on windows machines.
| g15jv2dp wrote:
| > Even though you can now supposedly run powershell in
| linux, I don't know anyone who does. Does anybody?
|
| I do. I replaced all of the automation scripts on my rpi
| with pwsh scripts, and I'm not regretting it. Not having to
| deal with decades of cruft in argument parsing and string
| handling, learning little DSLs for every command, etc. is
| so worth it.
| kbolino wrote:
| At this point, all PowerShell has accomplished is creating a
| separate ecosystem. The designers set out to make a "better"
| shell and yet refused to ever learn the things they were
| allegedly "improving".
|
| Basic features are still lacking from PowerShell that have
| been in UNIX shells since the very beginning:
| https://github.com/PowerShell/PowerShell/issues/3316
|
| But hey, that's a fixable problem, right? No, because
| PowerShell is so suffused with arrogance about its
| superiority that anything, no matter how simple it was to do
| in a UNIX shell, has to be cross-examined, re-imagined, and
| bent over the wheel of PowerShell's superiority, before
| ultimately getting ignored or rejected anyway.
|
| PowerShell is a language unto itself. It is not a replacement
| for bash/zsh/etc because nobody who knows the latter well can
| easily migrate to the former, and that's _by design_.
| g15jv2dp wrote:
| Some very strong sentiments about a shell...
| kbolino wrote:
| I want there to be something better than the UNIX shells,
| at least when it comes to error handling and data
| parsing. PowerShell was _supposed_ to be that tool, but
| it seems to have lost sight of that goal somewhere along
| the way.
| alganet wrote:
| Alternative shells or higher languages don't solve _all_ the
| issues.
|
| I won't install a new shell to generate a file list on my CI
| server. I won't install a new shell on remote machines. Ever.
|
| These structured shells also require commands to be aware of
| them, either via some plugin that structures their raw I/O
| output or some convention. They solve _some_ command output
| structuring but not _all_ the general problem.
|
| So, the answer is good. It promotes the idea that one should be
| careful when machine parsing output meant for humans.
| g15jv2dp wrote:
| > I won't install a new shell to generate a file list on my
| CI server. I won't install a new shell on remote machines.
| Ever.
|
| Uh... that's on you? Why do you intentionally hinder
| yourself?
|
| > These structured shells also require commands to be aware
| of them, either via some plugin that structures their raw I/O
| output or some convention. They solve _some_ command output
| structuring but not _all_ the general problem.
|
| Okay. It doesn't solve literally every single problem, that
| is true. It's still miles ahead. And when interfacing with
| non-pwsh commands, you just fall back to text parsing/output.
| alganet wrote:
| > Uh... that's on you? Why do you intentionally hinder
| yourself?
|
| Hinder myself? An ephemeral cloud machine would not keep my
| custom shell anyway. By having to install it _every single
| time I connect_ I just loose precious time.
|
| I want to be familiar with tools that are _already_
| installed everywhere.
|
| The shell is supposed to be a bottom feeder, lowest common
| denominator, barely usable tool. That way, it can build
| soon and get stable real fast. That (unintentional)
| strategy placed it as a core infrastructural piece...
| everywhere.
|
| Of course, there's scripting and using it on the terminal.
| But we're talking about scripting, right? Parsing ls and
| stuff. I want the fast, lean, simple `dash` to parse my
| fast, lean simple scripts. pwsh is fine for the terminal
| leather seats.
| Kwpolska wrote:
| Ephemeral cloud machines are created from images. Build
| your own image with the tools you need.
| eikenberry wrote:
| If you're going to skip using the standard shell that is
| installed everywhere by default, then you should go ahead and
| use a full language with easily distributed binaries.
| hawski wrote:
| I think that when someone uses ls instead of a glob it means they
| most probably don't understand shell. I don't see any advantage
| of parsing ls output when glob is available. Shell is finicky
| enough to not invite more trouble. Same with word splitting, one
| of the reasons to use shell functions, because then you have "$@"
| which makes sense and any other way to do it is something I can't
| comprehend.
|
| Maybe I also don't understand shell, but as it was said before:
| when in doubt switch to a better defined language. Thank heavens
| for awk.
| DonHopkins wrote:
| When you don't want to waste your time and sanity and happiness
| being in doubt and then throwing away all you've done and
| switching to a new language in mid stream, just don't even
| start using a terribly crippled shell scripting language in the
| first place, also and especially including awk.
|
| The tired old "stick to bash because it's already installed
| everywhere" argument is just as weak and misleading and
| pernicious as the "stick to Internet Explorer because it's
| already installed everywhere" argument.
|
| It's not like it isn't trivial to install Python on any system
| you'll encounter, unless you're programming an Analytical
| Engine or Jacquard Loom with punched cards.
| cassianoleal wrote:
| In most places where I run shell scripts, there is no Python.
| There could be if I really wanted it but it's generally
| unnecessary waste.
|
| On top of it, shell is better than Python for many things,
| not to mention faster.
|
| It's also, as you mentioned, ubiquitous.
|
| In the end, choose the tool that makes more sense. For me, a
| lot of the time, that's a shell script. Other times it may be
| Python, or Go, or Ruby, or any of the other tools in the box.
| DonHopkins wrote:
| A waste of what, disk space? I'd much rather waste a few
| megabytes of disk space than hours or days of my time,
| which is much more precious. And what are you doing on
| those servers, anyway? Installing huge amounts of software,
| I bet. So install a little more!
|
| For decades, on most Windows computers I run web browsers,
| there's always Internet Explorer. So do you still always
| use IE because installing Chrome is "wasteful"? It's a hell
| of a lot bigger and more wasteful than Python. As I already
| said, that is a weak and misleading and pernicious
| argument.
|
| So what exactly is bash better than Python at, besides just
| starting up, which only matters if you write millions of
| little bash and awk and sed and find and tr and jq and curl
| scripts that all call each other, because none of them are
| powerful or integrated enough to solve the problem on their
| own.
|
| Bash forces you to represent everything as strings, parsing
| and serializing and re-parsing them again and again. Even
| something as simple as manipulating json requires forking
| off a ridiculous number of processes, and parsing and
| serializing the JSON again and again, instead of simply
| keeping and manipulating it as efficient native data
| structures.
|
| It makes absolutely no sense to choose a tool that you know
| is going to hit the wall soon, so you have to throw out
| everything you've done and rewrite it in another language.
| And you don't seem to realize that when you're duct-taping
| together all these other half-assed languages with their
| quirky non-standard incompatible byzantine flourishes of
| command line parameters and weak antique domain specific
| languages, like find, awk, sed, jq, curl, etc, you're ping-
| ponging between many different inadequate half-assed
| languages, and paying the price for starting up and
| shutting down each of their interpreters many times over,
| and serializing and deserializing and escaping and
| unescaping their command line parameters, stdin, and
| stdout, which totally blows away bash's quick start-up
| advantage.
|
| You're arguing for learning and cobbling together a dozen
| or so different half-assed languages and flimsy tools, none
| of which you can also use to do general purpose
| programming, user interfaces, machine learning, web servers
| and clients, etc.
|
| Why learn the quirks and limitations of all those shitty
| complex tools, and pay the cognitive price and resource
| overhead of stringing them all together, when you can
| simply learn one tool that can do all of that much more
| efficiently in one process, without any quirks and
| limitations and duct tape, and is much easier to debug and
| maintain?
| nolist_policy wrote:
| Plus jq and curl might not even be installed. And I never
| got warm with jq, so if I need to parse json from shell I
| reach for... python. Really.
| radiator wrote:
| Alternatively, maybe you can get warmer with JMESPath,
| which has jp as its command line interface
| https://github.com/jmespath/jp .
|
| The good thing about the JMESPath syntax is that it is
| the standard one when processing JSON in software like
| Ansible, Grafana, perhaps some more.
| mywittyname wrote:
| I'm an avid jq user. There are certainly situations where
| it's better to use python because it's just more sane and
| easier to read/write, but jq does a few things extremely
| well, namely, compressing json, and converting json files
| consisting of big-ass arrays into line delimited json
| files.
| cassianoleal wrote:
| > For decades, on most Windows computers I run web
| browsers, there's always Internet Explorer. As I already
| said, that is a weak and misleading and pernicious
| argument.
|
| On its own, I agree. But you glossed over everything else
| I said, so I'm not going to entertain your weak argument.
|
| You seem to ignore that different users, different use
| cases, different environments, etc. all need to be taken
| into account when choosing a tool.
|
| Like I said, for most of my use cases where I use shell
| scripting, it's the best tool for the job. If you don't
| believe me, or think you know better about my
| circumstances than I do, all the power to you.
| sqeaky wrote:
| > You seem to ignore that different users, different use
| cases, different environments, etc. all need to be taken
| into account when choosing a tool.
|
| I have worked on projects that are extremely sensitive to
| extra dependencies and projects that aren't.
|
| Sometimes I am in an underground bunker and each
| dependency goes through an 18 month Department of Defense
| vetting process, and "Just install python" is equivalent
| to "just don't do the project". Other times I have worked
| on projects where tech debt was an afterthought because
| we didn't know if the code would still be around in a
| week and re-writing was a real option, so bringing in a
| dependency for a single command was worthwhile if we
| could solve the problem _now_.
|
| There is appetite for risk, desire for control, need for
| flexibility, and many other factors just as you stated
| that DonHopkins is ignoring or unaware of.
| Pesthuf wrote:
| People new to *nix make the mistake of thinking this stuff is
| well designed, makes sense and that things work well together.
|
| They learn. We all do.
| larodi wrote:
| People new to the internet think alike. Still, not a day
| passes and we are once again reminded how fragile yet amazing
| this all information theory stuff is.
| lukan wrote:
| Coincidently I discovered the unix haters handbook today:
|
| https://web.mit.edu/~simsong/www/ugh.pdf
| pikzel wrote:
| "The Macintosh on which I type this has 64MB: Unix was not
| designed for the Mac. What kind of challenge is there when
| you have that much RAM?"
|
| Love it.
| Narishma wrote:
| I don't understand what they mean in that quote. Neither
| Unix nor the Mac were designed for that much RAM.
| II2II wrote:
| Judging from the context, the user interface was fine in
| the days of limited resources (a 16 kiloword PDP-11 was
| cited) but then modern computers have the resources for
| better user interfaces.
|
| They clearly didn't realize that even more modern Unix
| kernels would require hundreds of megabytes just to boot.
| nsguy wrote:
| OT ... I worked with Simson briefly ages ago. Smart dude.
| This book happened later and I've never seen it before.
| Small world I guess.
| boricj wrote:
| People new to *nix don't realize that it's a 55 year old
| design that keeps accumulating cruft.
| hifromwork wrote:
| Of course, but the same (with a bit lower number of years)
| can be said about Windows, or HTTP, or the web with its
| HTML+JS+CSS unholy trinity, or email, or anything old and
| important really. It's scary how much of our modern
| infrastructure hinges on hacks made tens of years ago.
| com2kid wrote:
| One of the original demos showing off PowerShell was well
| structured output from its version of ls.
|
| That was 17 years ago!
| CJefferson wrote:
| Sometimes I want all filenames from a subdirectory, without the
| subdirectory name.
|
| I can do (ignoring parsing issues): for name
| in $(cd subdir; ls); do echo "$name"; done
|
| This isn't easy to do with globbing (as far as I know)
| rascul wrote:
| One alternative: for name in subdir/*; do
| basename "$name"; done
| Izkata wrote:
| Also since subdir is hardcoded, you can reliably type it a
| second time to chop off however much of the start you want:
| for name in subdir/subsubdir/*; do echo
| "${name#subdir/}" # subsubdir/foo done
| xolox wrote:
| Note this string replacement is not anchored (right?)
| which can end up biting you badly (depending on
| circumstances of course).
| chuckadams wrote:
| It's anchored on the left. ${name#subdir/} will turn
| 'subdir/abc' into 'abc', but will not touch
| foo/subdir/bar. I don't think bash even has syntax to
| replace in the middle of an expansion, I always pull out
| sed for that.
| xolox wrote:
| Thanks for clarifying, I learned something new today!
|
| Edit: It turns out that Bash does substitutions in the
| middle of strings using the
| ${string/substring/replacement} and
| ${string//substring/replacement} syntax, for more details
| see https://tldp.org/LDP/abs/html/string-
| manipulation.html
| silvestrov wrote:
| I'd really like if the "find" command supported this much
| easier, so if I write find some/dir/here
| -name '*.gz'
|
| then I could get the filenames without the "some/dir/here"
| prefix.
|
| It would also be nice if "find" (and "stat") could output the
| full info for a file in JSON format so I could use "jq" to
| filter and extract the needed info safely instead of having
| to split whitespace seperated columns.
| ykonstant wrote:
| Why would you do this work when stat (and GNU find) can
| `printf` the exact needed information without any parsing?
| silvestrov wrote:
| If I need filesize and filename then I still need to
| parse a filename that might contain all kinds of weird
| ascii control characters or weird unicode.
|
| JSON makes that a lot less fragile.
| ykonstant wrote:
| I don't get it; I need a concrete example.
| williamcotton wrote:
| What about: find . -name '*.hs' -exec
| basename {} \;
| rascul wrote:
| You could get mixed up here because find is recursive by
| default and basename won't show that files might be in
| different subdirectories.
| hawski wrote:
| If you are gonna do a subshell (cd subdir; ls) you can wrap
| the whole loop: (cd subdir for name in
| *; do echo "$name" done)
|
| But I prefer: for name in subdir/*; do
| name="${name#*/}" echo "$name" done
| chasil wrote:
| This is really easy to do with a shell pattern.
| $ x=/some/really/long/path/to/my/file.txt $ echo
| "${x##*/}" file.txt
| englishspot wrote:
| I just use find. it's a little longer but gives me the full
| paths and is more consistent. also works well if you need to
| recurse. something like `find . -type f | while read -r
| filepath; do whatever "${filepath}"; done`
| gnuvince wrote:
| Is there a reason to prefer `while read; ...;done` over
| find's -exec or piping into xargs?
| xolox wrote:
| Both `find -exec` and xargs expect an executable command
| whereas `while read; ...; done` executes inline shell code.
|
| Of course you can pass `sh -c '...'` (or Bash or $SHELL) to
| `find -exec` or xargs but then you easily get into quoting
| hell for anything non-trivial, especially if you need to
| share state from the parent process to the (grand) child
| process.
|
| You can actually get `find -exec` and xargs to execute a
| function defined in the parent shell script (the one that's
| running the `find -exec` or xargs child process) using
| `export -f` but to me this feels like a somewhat obscure
| use case versus just using an inline while loop.
| retrogeek wrote:
| I will sometimes use the "| while read" syntax with find.
| One reason for doing so is that the "-exec" option to find
| uses {} to represent the found path, and it can only be
| used ONCE. Sometimes I need to use the found path more than
| once in what I'm executing, and capturing it via a read
| into a reusable variable is the easiest option for that.
| I'd say I use "-exec" and "| while read" about equally,
| actually. And I admittedly almost NEVER use xargs.
| ykonstant wrote:
| This will fail for files with newlines.
| michaeldh wrote:
| How common are they?
| anamexis wrote:
| This whole post is about uncommon things that can break
| naive file parsing.
| chasil wrote:
| Wow, you people are really young.
|
| http://www.etalabs.net/sh_tricks.html
| stouset wrote:
| I love this example, because it highlights how absolutely
| cursed shell is if you ever want to do anything correctly or
| robustly.
|
| In your example, newlines and spaces in your filenames will
| ruin things. Better is find ... -print0 |
| while read -r -d $'\0'; do ...; done
|
| This works in most cases, but it can still run into problems.
| Let's say you want to modify a variable inside the loop (this
| is a toy example, please don't nit that there are easier ways
| of doing _this specific_ task). declare -a
| list=() find ... -print0 | while read -r -d
| $'\0' filename; do list+=("${filename}")
| done
|
| The variable `list` isn't updated at the end of the loop,
| because the loop is done in a subshell and the subshell
| doesn't propagate its environment changes back into the outer
| shell. So we have to avoid the subshell by reading in from
| process substitution instead. declare -a
| list=() while read -r -d $'\0' filename; do
| list+=("${filename}") done < <(find ... -print0)
|
| Even this isn't perfect. If the command inside the process
| substitution exits with an error, that error will be
| swallowed and your script won't exit even with `set -o
| errexit` or `shopt -s inherit_errexit` (both of which you
| should always use). The script will continue on as if the
| command inside the subshell suceeded, just with no output.
| What you have to do is read it into a variable first, and
| then use that variable as standard input.
| files="$(find ... -print0)" declare -a list=()
| while read -r -d $'\0' filename; do
| list+=("${filename}") done <<< "${files}"
|
| I think there's an alternative to this that lets you keep the
| original pipe version when `shopt -s lastpipe` is set, but I
| couldn't get it to work with a little experimentation.
|
| Also be aware that in all of these, standard input inside the
| loop is redirected. So if you want to prompt a user for
| input, you need to _explicitly_ read from ` /dev/tty`.
|
| My point with all this isn't that you should use the above
| example every single time, but that all of the (mis)features
| of shell compose _extremely_ badly. Even piping to a loop
| causes weird changes in the environment that you now have to
| work around with other approaches. I wouldn 't be surprised
| if there's something still terribly broken about that last
| example.
| dotancohen wrote:
| > I think that when someone uses ls instead of a glob it means
| they most probably don't understand shell.
|
| In 25 years of using Bash, I've picked up the knowledge that I
| shouldn't parse the output of ls. I suppose that it has
| something to do with spaces, newlines, and non-printing
| characters in file names. I really don't know.
|
| But I do know that when I'm scripting, I'm generally wrapping
| what I do by hand, in a file. I'm codifying my decisions with
| ifs and such, but I'm using the same tools that I use by hand.
| And ls is the only tool that I use to list files by hand - so I
| find it natural that people would (naively) pick ls as the tool
| to do that in scripts.
| notnmeyer wrote:
| exactly--well said
| PaulHoule wrote:
| I went through a phase when I really enjoyed writing shell
| scripts like ls *.jpg | awk '{print "resize
| 200x200 $1 thumbnails/$1"}' | bash
|
| because I never got to the point where I could remember the
| strange punctuation that the shell requires for loops without
| looking up the info pages for bash whereas I've thoroughly
| internalized awk syntax.
|
| Word is you should never write something like that because
| you'll never get the escaping right and somebody could craft
| inputs that would cause arbitrary code execution. I mean, they
| try to scare you into using xargs, but I find xargs so foreign
| I have to read the whole man page every time I want to do
| something with it.
| hifromwork wrote:
| I encourage you to give it a try again. Almost every use of
| xargs that I ever did looked like this:
|
| ls *.jpg | xargs -i,, resize 200x200 ,, thumbnails/,,
|
| I just always define the placeholder to ,, (you can pick
| something else but ,, is nice and unique) and write commands
| like you do.
| kstrauser wrote:
| I'm more likely to write that like: for i
| in *.jpg; resize 200x200 "$i" "thumbnails/$i"; end
| TylerE wrote:
| Does that not fail when you hit the maximum command line
| length? Doesn't the entirety of the directory get
| splatted? Isn't this the whole reason xargs exists?
| PaulHoule wrote:
| Exactly, there are so many limits in the shell that I
| don't want to be bothered to think about. When I get
| serious I just write Python.
| genrilz wrote:
| The for loop only runs resize once per file. So no, the
| entire directory does not get splatted. It is unlikely
| you'd hit maximum command length.
|
| At least on mac, the max command length is 1048576 bytes,
| while the maximum path length in the home directory is
| 1024 bytes. There might be some unix variant where the
| max path length is close enough to the max command length
| to cause an overflow, but I doubt that is the case for
| common ones.
|
| xargs exists in an attempt to be able to parse command
| output. You could for instance have awk output xargs
| formatted file names to build up a single command
| invocation from arbitrary records read by awk. Note that
| xargs still has to obey the command line length limit
| though, because the command line needs to get passed to
| the program. Thus, in a situation where this for loop
| overflows the command line, it would cause xargs to also
| fail. Thus I would always use globbing if I have the
| choice.
|
| EDIT: If you mean that the directory is splatted in the
| for loop, then in a theoretical sense it is. However,
| since "for" is a shell builtin, it does not have to care
| about command line length limits to my knowledge.
| TylerE wrote:
| Yes, this is an issue, absolutely.
|
| I've seen some image directories with more than a million
| _files_ in them.
| genrilz wrote:
| This shouldn't overrun the command line length for
| resize, since resize only gets fed one filename at a
| time. I do think that the for loop would need to hold all
| the filenames in a naive shell implementation. (I would
| assume most shells are naive in this respect) The for
| loop's length limit is probably the amount of ram
| available though. I find it improbable that one could
| overflow ram with purely pathnames on a PC, since a
| million files times 100 chars per file is still less than
| a gig of ram. If that was an issue though, one would
| indeed have to use "find" with "-exec" instead to make
| sure that one was never holding all file names in memory
| at the same time.
| lelandbatey wrote:
| No, it does not fail. Maximum command line length exists
| in the operating system, not the shell; you can't launch
| a program with too many argc and you can't launch a
| program with an argv that's a string that's too long.
|
| But when you execute a for loop in bash/sh, the 'for'
| command is not a program that is launched; it's a keyword
| that's interpreted, and the glob is also interpreted.
|
| Thus, no, that does not fail when you hit the maximum
| command line length (which is 4096 on most _nix). It'll
| fail at other limits, but those limits exist in bash and
| are much larger. If you want to move to a stream-
| processing approach to avoid any limits, then that is
| possible, while probably also being a sign you should not
| use the shell.
| projektfu wrote:
| Better is something like find . -maxdepth 1
| -name "*.jpg" -exec resize 200x200 "{}" "thumbnails/{}" \;
|
| which works for spaces and probably quotes in filenames I am
| not sure about other special characters.
| mjevans wrote:
| It's tough to be portable and have a one liner See
| https://stackoverflow.com/questions/45181115/portable-way-
| to...
|
| I switched the command to a graphics magick based resize
| since that's the tool these days, default quality is 75%
| (for JPEG), but is included as a commonly desired
| customization. ,, is from a different comment in this
| thread; it seems better self-documenting than the single ,
| I'd traditionally use. find . -maxdepth 1
| -name "*.jpg" -print0 |\ xargs -0P $(nproc --all)
| -I,, gm convert resize '200x200^>' -quality 75 ,,
| "thumbnails/,,"
| chrsig wrote:
| Commands can have a maximum number of arguments. Try globbing
| on a directory with millions of files.
| anthk wrote:
| Sane people will just use find and/or xargs.
| stephenr wrote:
| One advantage: `ls -i` gives you the file's inode in a POSIX
| portable way. If you glob and then look it up individually for
| each file, you'll need to be aware of which tool (and whether
| it's GNU or BSD in origin) you use on which platform.
|
| In _general_ yes globbing is better for iterating through
| files. But parsing `ls` doesn 't necessarily mean the author
| doesn't know shell. It might mean they know it well enough to
| use the tools that are made available to them.
| amelius wrote:
| I feel like Unix utilities should provide a standardized way to
| generate machine-readable output, perhaps using JSON.
| db48x wrote:
| The same information is already available in a machine-readable
| format. Just call readdir. You don't need to run ls, have ls
| call readdir and convert the output into JSON, and then finally
| parse the JSON back into a data structure. You can just call
| readdir!
| amelius wrote:
| I know, but it would be so great if __every__ Unix utility
| just had the same type of output. By the way, ls does more
| than just readdir.
| masklinn wrote:
| `find` is also an option, or shell globs.
| db48x wrote:
| Right, globs are syntactic sugar on top of readdir.
| Definitely use them when you are in a shell. But in general
| the solution is to call readdir, or some language facility
| built directly on top of it. Calling ls and asking it for
| JSON is the stupid way to do things.
| amelius wrote:
| Just curious, how would you approach getting output from
| utilities like "df", "mount" and "parted"?
| solardev wrote:
| Generally speaking, can't you limit/define the output of
| those commands and parse them that way? like df
| --portability or --total or --output
|
| And/or use their return codes to verify that something
| worked or didn't
|
| Or hope your higher level programming language contains
| built-ins for file system manipulations
| amelius wrote:
| How is that any easier than just giving a standardized
| --json flag?
| solardev wrote:
| It doesn't require trying to organize a small revolution
| across dozens of GNU tools, many authors, and numerous
| distros...?
|
| I'd love to see standard JSON output across these tools.
| I just don't see a realistic way to get that to happen in
| my lifetime.
|
| Maybe a unified parsing layer is more realistic, like an
| open source command output to JSON framework that would
| automatically identify the command variant you're running
| based on its version and your shell settings, parse the
| output for you, and format it in a standard JSON schema?
| Even that would be a huge undertaking though.
|
| There are a lot, LOT of command variants out there. It's
| one thing to tweak the output to make it parseable for
| your one-off script on your specific machine. Not so easy
| to make it reusable across the entire *nix world.
| xolox wrote:
| With regards to parted, if you only want to query for
| information, there is "partx" whose output was
| purposefully designed to be parsed. I have good
| experiences with it.
| emmelaich wrote:
| Can you call readdir() from a shell easily?
|
| WRT format, I'd prefer csv.
| db48x wrote:
| Certainly. Just do `for f in *`. See how easy that is?
| zokier wrote:
| here is trivial program to dump dents to stdout, suitable
| for shell pipelines. example usage `./getdents64 . | xargs
| -0 printf "%q\n"` #define _GNU_SOURCE
| #include <dirent.h> #include <fcntl.h>
| #include <malloc.h> #include <stdio.h>
| #include <stdlib.h> #include <string.h>
| #include <unistd.h> #define BUF_SIZE 32768
| struct linux_dirent64 { ino64_t d_ino;
| /* 64-bit inode number */ off64_t d_off;
| /* Not an offset; see getdents() */ unsigned
| short d_reclen; /* Size of this dirent */
| unsigned char d_type; /* File type */ char
| d_name[]; /* Filename (null-terminated) */
| }; int writeall(char *buf, size_t len) {
| ssize_t wres = 0; wres = write(1, buf, len);
| if (wres == -1) { perror("write");
| return -1; } if (((size_t)wres) < len)
| { return writeall(buf + wres, len - wres);
| } return 0; } int main(int
| argc, char **argv) { if (argc != 2) {
| return EXIT_FAILURE; } int fd =
| open(argv[1], O_DIRECTORY | O_RDONLY); if (fd ==
| -1) { perror("open"); return
| EXIT_FAILURE; } void *buf =
| malloc(BUF_SIZE); ssize_t res = 0; do {
| res = getdents64(fd, buf, BUF_SIZE); if (res ==
| -1) { perror("getdents64");
| return EXIT_FAILURE; } void *it =
| buf; while (it < (buf + res)) {
| struct linux_dirent64 *elem = it; it +=
| elem->d_reclen; size_t len =
| strlen(elem->d_name); if
| (writeall(elem->d_name, len + 1) == -1) {
| return EXIT_FAILURE; } }
| } while (res > 0); return EXIT_SUCCESS; }
| db48x wrote:
| You're still doing unnecessary work. You're turning a
| list of files into a string, then parsing the string back
| into words.
|
| Your shell already provides a nice abstraction over
| calling readdir directly. A glob gives you a list, with
| no intermediate stage as a string that needs to be
| parsed. You can iterate directly over that list.
|
| Every language provides either direct access to the C
| library, so that you can call readdir, or it provides
| some abstraction over it to make the process less
| annoying. In Common Lisp the function `directory` takes a
| pathname and returns a list of pathnames for the files in
| the named directory. In Rust there is the
| `std::fs::read_dir` that gives you an iterator that
| yields `io::Result<std::fs::DirEntry>`, allowing easy
| handling of io errors and also neatly avoiding an extra
| allocation. Raku has a function `dir` that returns a
| similar iterator, but with the added feature that it can
| match the names against a regex for you and only yield
| the matches. You can fill in more examples from your
| favorite languages if you want.
| emmelaich wrote:
| Wow, these replies. I was being a little sarcastic as there
| is no 'readdir' shell command. That is all.
| DonHopkins wrote:
| That doesn't solve the problem that bash is completely useless
| for manipulating JSON.
|
| It certainly would make writing Python scripts that need to
| interact with other programs easier. But Python doesn't
| desperately NEED to interact with so many other programs for
| such simple tasks like enumerating files or making http
| requests or parsing json, the way bash does.
| Kwpolska wrote:
| Bash is useless at JSON _now_. There 's nothing stopping Bash
| from introducing native JSON parsing.
| bpshaver wrote:
| https://kellyjonbrazil.github.io/jc/docs/parsers/ls
| Aerbil313 wrote:
| What to do instead: Use Nushell.
|
| I finally started really using my shell after switching to it. I
| casually write multiple scripts and small functions per day to
| automate my stuff. I'm writing scripts I'd otherwise write in
| python in nu. All because the data needs no parsing. I'm not even
| annotating my data with types even though Nushell supports it
| because it turns out structured data with inferred types is more
| than you need day-to-day. I'm not even talking about all the
| other nice features other shells simply don't have. See this
| custom command definiton: # A greeting command
| that can greet the caller def greet [ name: string
| # The name of the person to greet --age (-a): int # The
| age of the person ] { [$name $age] }
|
| Here's the auto-generated output when you run `help greet`:
| A greeting command that can greet the caller Usage:
| > greet <name> {flags} Parameters: <name> The
| name of the person to greet Flags: -h, --help:
| Display this help message -a, --age <integer>: The age of
| the person
|
| It's one of the software that only empowers you, immediately,
| without a single downside. Except the time spent learning it, but
| that was about a week for me. Bash or fish is there if I ever
| need it to paste some shell commands.
| db48x wrote:
| Parsing, or the lack thereof, is not the point. The point is
| that standard shells already provide all the tools you need for
| dealing with lists of files. Want to do something for every
| file? Write this: shopt -s nullglob
| for f in *; do ... done
|
| But never this: for f in $(ls); do
| ... done
|
| They look similar, but the latter runs ls to turn the list of
| files into a string, then has the shell parse the string back
| into a list. Even if the parsing was done correctly (and it
| isn't), this is still extra work. Looping over the glob avoids
| the extra work.
| Aerbil313 wrote:
| I have to say this is very unintuitive. In Nushell, you'd do:
| ls | each { ... }
|
| Another examples I don't need to explain, which would be far
| harder in stringly typed shells: ls | where
| type == file and size <= 5MiB | sort-by size | reverse |
| first 10 ps | where cpu > 10 and mem > 1GB | kill
| $in.pid
|
| It's immediately obvious what you need to do when you can
| easily visualize your data: > ls +----+
| -----------------------+------+-----------+-------------+
| | # | name | type | size | modified
| | +----+-----------------------+------+-----------+----
| ---------+ | 0 | 404.html | file |
| 429 B | 3 days ago | | 1 | CONTRIBUTING.md |
| file | 955 B | 8 mins ago | | 2 | Gemfile
| | file | 1.1 KiB | 3 days ago | | 3 | Gemfile.lock
| | file | 6.9 KiB | 3 days ago | | 4 | LICENSE
| | file | 1.1 KiB | 3 days ago | | 5 | README.md
| | file | 213 B | 3 days ago | ...
| db48x wrote:
| I didn't say that nushell is bad, I said that it's not
| relevant to the discussion. nushell provides typed data in
| pipelines, which is cool. But standard shells already have
| typed data for this particular use case, thus parsing
| untyped data is unnecessary. Of course it would be nice if
| that typed data could be used in a pipeline, but everything
| had to start somewhere.
| acureau wrote:
| Who are you to decide what's relevant to the discussion?
| It's very clearly on topic. I had never heard of nushell
| and I'm glad it was mentioned
| CJefferson wrote:
| How do I replace: for f in $(cd subdir;
| ls); do ... done ?
| ykonstant wrote:
| Either for f in subdir/*; do ...
| done
|
| or ( cd subdir || exit 1 for f
| in *; do ... done )
|
| work fine. However, I must insist against using `for` loops
| in favor of `find`.
| probably_wrong wrote:
| I think there's a middle point where you want to do something
| that's complex enough that a glob won't cut it but simple enough
| that switching languages is not worth it.
|
| I think the example of "exclude these two types of files" is a
| good case. I often have to write stuff like `ls P* | grep -Ev
| "wav|draft"` which doesn't solve a problem I don't have (such as
| filenames with newlines in them) but does solve the one I do
| (keeping a subset of files that would be tricky to glob
| properly).
|
| In my experience 95% of those scripts are going to be discarded
| in a week, and bringing Python into it means I need to deal with
| `os.path` and `subprocess.run`. My rule of thumb: if it's not
| going to be version controlled then Bash is fine.
| OrderlyTiamat wrote:
| You might enjoy a variety of `find` based commands, e.g. `find
| -maxdepth 1 -iregex ".*\\.(wav|draft)" | xargs echo "found
| file:"`
|
| This uses regex to match files ending in .wav or .draft (which
| is what I interpreted you to want). Xargs then processes the
| file. You could use flags to have xargs pass the file names in
| a specific place in the command, which can even be a one liner
| shell call or some script.
|
| So the "find <regex> - xarg <command>" pattern is almost fully
| generally applicable to any problem where you want to execute a
| oneliner on a number of files with regular names. (I think gnu
| find has no extended regex, which is just as well- thats not a
| "regular expression" at that point)
| bheadmaster wrote:
| > You might enjoy a variety of `find` based commands, e.g.
| `find -maxdepth 1 -iregex ".*\\.(wav|draft)" | xargs echo
| "found file:"`
|
| Find can even execute commands itself without using `xargs`:
| find -maxdepth 1 -iregex '.\.\(wav\|draft\)' -exec echo
| "found file:" {} \;
| Izkata wrote:
| Definitely do it this way if you want to stick to the pre-
| filtered version (I recommend the cousin comment, filter
| inside the loop). GP's version is buggy in the same way as
| the post misunderstands, particularly with files that
| somehow got newlines in the filename (xargs is newline-
| delimited by default).
|
| If for some reason you do need the "find | xargs" combo
| (maybe for concurrency), you can get it to work with "find
| -print0" and "xargs -0". Nulls can't be in filenames so a
| null-delimited list should work.
| ykonstant wrote:
| As an addendum, note that `-print0` and `-0` for find and
| xargs respectively are now in the latest POSIX standard,
| so their use is compliant.
| genrilz wrote:
| The latest standard I know of is SuS 2018, which I have
| the docs for, and does not include either switch. I
| searched around a bit and it doesn't seem like there is a
| new one. Are you referring to some draft? I sure wish
| this was true.
|
| That being said, I would interpret "-exec printf '%s\0'
| {} +" as being a posix compliant way for find to output
| null delimited files. I say this since the docs for the
| octal escape for printf allows zero digits. However, most
| posix tools operate on "text" input "files", which are
| defined as not having null characters. Thus I don't think
| outputting nulls could be easily used in a posix
| complaint way. In practice, I would expect many posix
| implementations to also not handle nulls well because C
| uses null to mean end of string, so lots of C library
| calls for dealing with strings will not correctly deal
| with null characters.
| PhilipRoman wrote:
| >xargs is newline-delimited by default
|
| Even worse, it is whitespace delimited (with its own
| rules for escaping with quotes and backslashes)
| hifromwork wrote:
| It is not, but (for reasons unknown to me) it doesn't
| quote parameters in the default mode. Consider:
|
| touch "a b" ls | xargs rm # this won't work, rm gets two
| parameters ls | xargs -i,, rm ,, # this will work
| PhilipRoman wrote:
| https://pubs.opengroup.org/onlinepubs/9699919799/utilitie
| s/x...
|
| >[..] arguments in the standard input are separated by
| unquoted <blank> characters [..]
|
| As for -i, it is documented to be the same as -I, which,
| among other things, makes it so that "unquoted blanks do
| not terminate input items; instead the separator is the
| newline character."
| hifromwork wrote:
| >GP's version is buggy in the same way as the post
| misunderstands, particularly with files that somehow got
| newlines in the filename
|
| I understand this caveat, but I never had a file with
| newline that I cared about. Everyone keeps repeating this
| gotcha but I literally don't care. When I do "ls | grep
| [.]png\$ | xargs -i,, rm ,," (yes, stupid example) there
| is 0% chance that a png file with a newline in the name
| found itself in my Downloads folder. Or my project's
| source code. Or my photo library. It just won't happen,
| and the bash oneliner only needs to run once. In my 20
| years of using xargs I didn't have to use -0 even once.
| bheadmaster wrote:
| It's not _necessary_ to bring Python into it, Bash can handle
| filenames with weird characters properly if you know how to use
| it.
|
| E.g. instead of `ls | grep -Ev 'wav|draft'`, you'd have to do
| something like for filename in *; do
| if grep -E 'wav|draft' >/dev/null <<< "$filename"
| then : # ... fi done
|
| Of course, it's more convoluted, but when you're writing
| scripts that might be used for a long time and by many people,
| it helps to know that it is _possible_ to write robust things.
| Tools like shellcheck certainly help.
| PeterWhittaker wrote:
| grep -q and you won't need the redirect of stdout.
| ykonstant wrote:
| The above is perfectly fine for small directories, but in
| general the preferred way to loop over files is with find:
| find . ! -name . -prune \
| -exec grep -qE 'wav|draft' {} \; \ -exec
| "${action}" \; ;
|
| Edit: I missed the herestring in the original code, so the
| above is wrong as mentioned in the comments; if your find has
| regex, you can use it to save one grep: find
| . ! -name . -prune \ -regex
| '.*wav.*\|.*draft.*' \ -exec "${action}"
| \; ;
|
| Otherwise you can call sh to printf the filename into a grep.
|
| However, the point of my post is that find can perform _seek_
| , _filter_ and _execute_ , and should be used for all three
| unless it is really impossible (which is unlikely).
| tux1968 wrote:
| Your example is grepping the file contents, where GP is
| using grep to select the filenames.
| ykonstant wrote:
| D'oh!
| some_random wrote:
| At that point I think you need to ask yourself why you're
| using Bash to begin with. If it's just meant to be a quick
| script that's run occasionally then this is good but probably
| overkill. If it's going into prod to be run regularly as part
| of business critical, then it should be in a language that
| has a less convoluted way to _ls a directory_. There's an
| inflection point somewhere in there, where it is depends on
| you.
| mywittyname wrote:
| Am I monitoring the execution?
|
| Yes: bash is probably fine.
|
| No: real programming language time.
| some_random wrote:
| Before you write anything, you need to think about the cost of
| it breaking and the chance of it breaking, and Bash scripts in
| VC tend to maximize both. I like that heuristic a lot.
| mcc1ane wrote:
| https://mywiki.wooledge.org/BashPitfalls#for_f_in_.24.28ls_....
| teddyh wrote:
| Many people turn to globbing to save them, which is usually
| better, but has some problems in case of no matches. But, for
| Bash, you can do this to fix it: shopt -s
| failglob
| billpg wrote:
| Why do you want to put LF bytes into filenames?
|
| Using magic, I've renamed any files you have to remove control
| characters in the name and made it impossible to make any new
| ones. (You can thank me later.)
|
| What can't you do now?
| tredre3 wrote:
| Preventing certain characters in filenames would solve a _lot_
| of issues, from security issues to wasted time all around.
|
| But for whatever reason, when it is suggested, you get many
| people chiming in that "filenames should be dumb bytes,
| anything allowed except / !"
| badgersnake wrote:
| Some front end clown is about to suggest all tools should output
| json by default aren't they. They'll probably add that it's fine
| to start up 17 v8 engines just to parse all this json because
| modern laptops have loads of ram and anyway it's modern and cross
| platform.
| wslh wrote:
| An extension of this could be to also input everything in json:
| {"command":"ls", parameters: ["-l"]}, etc.
| solardev wrote:
| Didn't Microsoft try to define something like that with
| Powershell, with parameters being objects (though not JSON)?
| solardev wrote:
| But hey, at least it's not YAML!
| emmelaich wrote:
| The funny thing is that so so many bits of info come in very
| much like but not quite yaml.
|
| e.g. /proc/cpuinfo
| badgersnake wrote:
| Of course, with some k8s yaml we can run all our cli tools in
| separate containers each with their own userland.
| lukan wrote:
| I do not think parsing json requires a full blown javascript
| engine.
| williamcotton wrote:
| Ever heard of jq?
| amelius wrote:
| What do you suggest? Have 100 different ways to parse output?
| Think about the resulting code bloat.
|
| And no, you don't need V8 to parse JSON.
| afiori wrote:
| sadly JSON does not handle non-utf8 strings
| scintill76 wrote:
| Is that a problem for this application, though? Don't most
| people encode their file names in utf8, or is that an ASCII-
| centric falsehood?
| MaxMatti wrote:
| Json is maybe a bit heavy, but using a machine readable format
| such as tsv or csv (including configuring your terminal
| emulator to properly display it) would be a big step up from
| the status quo.
| CJefferson wrote:
| Do you really think outputting a stream of JSON, as opposed to
| plain text, would add any measurable overhead to all command
| line tools?
|
| Honestly, I'd love this. Output one JSON object per file, bash
| already has hash tables and lists, so it has all the types we
| need for JSON already.
| badsectoracula wrote:
| I guess this is for shell scripts that need to work with "unsafe"
| filenames?
|
| I've been using Linux since 1999 and i never came across a
| filename with newlines. On the other hand, pretty much all "ls
| parsing" i've done was on the command-line to pipe it to other
| stuff in files i was 100.1% sure would be fine.
| kemitchell wrote:
| When teaching beginners shell, it's natural to teach `ls` for
| listing directory contents. It's also natural to extend from
| `ls` to `ls | ...` for processing lists of files
|
| The important point to get across is that pipes let us build
| bigger commands _from the commands we already know_. If needed,
| you can back up later to teach patterns like `find [...]
| -exec`, `find [...] -print0 | xargs -0 [...]`, `find [...] |
| while read -r file; do [...] done` and so on.
|
| There are all kinds of prerequisites to creating files with
| unusual names. Those barriers tend to mean beginners won't run
| into file name processing edge cases for a while. The exception
| will be files they download from the Internet. But the
| complexity there will usually be quote and non-ASCII Unicode
| characters, not newlines or other control codes.
|
| In teaching, the one filename complexity I would try to get
| ahead of, preventively, is spaces. There was a time, way back
| when, when newbies seemed to expect to stick with short, simple
| filenames. These days they the people I've helped tended to be
| used to using spaces in file names in Finder and Explorer for
| office or school work.
| zokier wrote:
| I wonder if anyone has implemented kernel module or smth to limit
| filenames to sane set. Just ensuring that they are valid utf8 and
| do not contain any non-printables would be huge improvement. Sure
| some niche applications might break so its not something that can
| be made default, but still I think it would help on systems I
| control.
| tmtvl wrote:
| Today I learned how neat find is: find ~/Music
| -iname 'p*' -not -iname '*age*' -not -iname '*etto*' find
| ~/Music -iname 'p*' -not -iregex '.*\(age\|etto\).*' find
| ~/Music -regextype posix-extended -iname 'p*' -not -iregex
| '.*(age|etto).*'
|
| Not that I'm likely to ever use any of that in anger, but it's
| good to know if ever I do wind up needing it.
| InsideOutSanta wrote:
| Files and directories, once a reference to them is obtained,
| should not be identified by their path. This causes all kinds of
| problems, like the reference breaking when the user moves or
| renames things, and issues like the ones described in the
| article, where some "edge case" (and I'm using that term very
| loosely, because it includes common situations like a space in a
| file name) causes problems down the line.
|
| You might say that people don't move or rename things while files
| are open, but they absolutely do, and it absolutely breaks
| things. Even something as simple as starting to copy a directory
| in Explorer to a different drive, and then moving it while the
| copy is ongoing, doesn't work. That's pathetic! There is no
| technical reason this should not be possible.
|
| And who can forget the case where an Apple installer deleted
| people's hard disk contents when they had two drives, one with a
| space character, and another one whose name was the string before
| the first drive's space character?
|
| Files and directories need to have a unique ID, and references to
| files need to be that ID, not their path, in almost all cases.
| MFS got that right in 1984, it's insane that we have failed to
| properly replicate this simple concept ever since, and actually
| gone backwards in systems like Mac OS X, which used to work
| correctly, and now no longer consistently do.
| Kwpolska wrote:
| IDs don't really solve many problems. The issues with scripts
| removing all your files were either caused by the absurd bash
| spaces and quotes rules, or by bash silently ignoring
| nonexistent variables. Those scripts would still need paths,
| since the ID of ~/.steam will be different for everyone.
| Scripts that need to work on more than one system, and human-
| authored config files, would still have paths. There are cases
| where you want to depend on the path, not the identity of the
| folder, and potentially swap the folder with something else
| without editing configuration.
|
| Explorer needs to support local drives, with a lot of
| filesystems, including possibly third-party ones, but also
| network drives, FTP, WebDAV, and a bunch of other niche things.
| Not all of them have IDs and might not be possible to be
| extended. The cost is massive, solving it everywhere is
| impossible, and the benefit seems negligible to me (even though
| I fairly recently managed to eject a disk image (vhdx) in the
| middle of copying files onto it...)
| InsideOutSanta wrote:
| Earlier versions of Mac OS had APIs to retrieve the IDs of
| directories and files relevant for things like installing
| applications (such as the the System directory). It
| effectively never used paths to identify any files; if users
| opened a file, they'd use the system file picker, which would
| provide the application a file ID, not a path.
|
| Similarly, things like config files would be identified by
| their name, not their path, because the directory containing
| configs was a directory the system knew about. As a result,
| no application needed to know the path to its own config
| files.
|
| This meant there was no action that the system prevented you
| from doing to an open file, other than actually deleting that
| file. There was also no way for an installer to accidentally
| break your system because its code didn't take your drive,
| file, or directory names into account.
|
| And, of course, there _are_ file systems that don 't use
| paths at all, like HashFS, a bunch of modern document
| management systems, or the Newton's Soup.
|
| I get your point about interoperability with existing file
| systems, but I think it's perfectly acceptable to offer
| better solutions where possible, and fall back to paths for
| situations where that is not possible.
| 7bit wrote:
| Or use PowerShell where LS returns a bunch of objects, and say
| goodbye to string parsing forever.
| chickenimprint wrote:
| nushell is the superior structured data shell and it's cross-
| platform. https://www.nushell.sh/
| redserk wrote:
| I've only used Powershell a little bit on Linux and Mac but
| it seems reasonably cross-platform.
|
| On the surface, it looks like I'd be giving up the decently
| sized ecosystem of Powershell libraries for a new ecosystem
| without much support?
|
| I'm interested in knowing what Nushell does differently since
| I'm wanting to find a better shell.
| ericfrederich wrote:
| Wait until you realize that "giving up the decently sized
| ecosystem of Powershell libraries" is a net positive ;-)
| chickenimprint wrote:
| I'm probably not the best person to ask, since the last
| time I touched Powershell, it was Windows only, but I'd say
| nushell is likely a lot more platform-agnostic, has sane
| syntax and follows a functional paradigm. Plugins are
| written in Rust. It's probably not worth it if all you do
| is Windows sysadmin work, as you'd have to serialize and
| deserialize data when interacting with Powershell from nu.
| TacticalCoder wrote:
| Now of course having scripts and pre-commit hooks enforcing
| simple rules so that files _must_ only use a subset of Unicode
| are a thing and do help.
|
| Do you really think that, say, all music streaming services are
| storing their songs with names allowing Unicode HANGUL fillers
| and control characters allowing to modify the direction of
| characters?
|
| Or... Maybe just maybe that Unicode characters belong to metadata
| and that a strict rule of "only visible ASCII chars are allowed
| and nothing else or you're fired" does make sense.
|
| I'm not saying you always have control on every single filename
| you'll ever encounter. But when you've got power over that and
| can enforce saner rules, sometimes it's a good idea to use it.
|
| You'll thank me later.
| jcalvinowens wrote:
| Not sure how portable it is, but gnu ls has a flag to solve this
| problem trivially: --zero end each output line
| with NUL, not newline
| renewiltord wrote:
| I just solve this by not having files like that on my computer.
| No spaces. No null chars.
| waffletower wrote:
| Borkdude has a wonderful Clojure/Babashka solution in this space:
| https://github.com/babashka/fs
| geophile wrote:
| I wrote a pipe-objects-instead-of-strings shell:
| https://marceltheshell.org.
|
| Not piping strings avoids this issue completely. Marcel's ls
| produces a stream of File objects, which can be processed without
| worrying about whitespace, EOL, etc.
|
| In general, this approach avoids parsing the output of any
| command. You always get a stream of Python values.
| fooker wrote:
| > In general, this approach avoids parsing the output of any
| command.
|
| Somewhere, there has to be validation phases. Just because you
| have objects, doesn't mean they are well formed.
|
| https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
|
| It turns out proper validation is way harder than parsing.
| There is a reason text based interfaces and formats are so
| pervasive.
| geophile wrote:
| As long as Python does the right thing with globs, there is
| really no room for marcel to get it wrong. Not sure what
| additional validation you are thinking of.
| lostmsu wrote:
| This is a problem I faced recently on Linux. You can use ip addr
| to see the list of your IPv6 addresses and their types (temporary
| or not, etc). But doing it programmatically from a non-C codebase
| is way more involved.
| Nimitz14 wrote:
| These sorts of pedantic exchanges are so pointless to me. We are
| programmers. We can control what characters are used in
| filenames. Then you can use the simplest tool for the job and
| move on with your life to focus on the stuff that actually
| matters. Fix the root cause instead of creating workarounds for
| the symptom.
| tremon wrote:
| Most of the time I avoid parsing ls, but I haven't found a
| reliable way to do this one: latest="$(ls -1
| $pattern | sort --reverse --version-sort | head -1)"
|
| Anyone got a better solution?
| chenxiaolong wrote:
| This should work with any arbitrary filename:
| latest=$(printf '%s\0' <glob> | sort -zrV | head -zn1)
|
| or with long args: latest=$(printf '%s\0'
| <glob> | sort --zero-terminated --reverse --version-sort | head
| --zero-terminated --lines 1
| genrilz wrote:
| What unix is this on? Neither the mac nor gnu manpages have a
| -z or --zero-terminated option for head.
| tremon wrote:
| Debian's head (from GNU coreutils) does: https://manpages.d
| ebian.org/bookworm/coreutils/head.1.en.htm...
| genrilz wrote:
| Yay! Glad to see zero termination flags in more places.
|
| EDIT: The linux manpages I read were from die.net, which
| it looks like were from 2010, guess I'll have to avoid
| them in the future. I checked FreeBSD, OpenBSD, and Mac
| man page to make sure, and unfortunately none of them
| support the -z flag yet.
| genrilz wrote:
| This ones a hard one. Since "--version-sort" isn't standard
| anyways, lets assume we can use flags which are common to BSD
| and GNU. Furthermore, lets assume bash or zsh so we can use
| "read -d ''".
|
| In that case, how about: IFS='' read -d ''
| latest < <(find $pattern -prune -print0 | sort -z --reverse
| --version-sort)
| midjji wrote:
| The bash code which creates the c file which gets the list of
| null terminated files in a directory and compiles it, and runs
| it, is easier to write and understand. Bash is a lousy language
| to do anything in, python is almost always available, and if not,
| then CC is.
| cess11 wrote:
| I don't know, this seems like a lot of words to avoid coming to
| the conclusion that there are many ways to skin a directory.
|
| Most of the time it's fine to just suck in ls and split it on \n
| and iterate away, which I do a lot because it's just a nice and
| simple way forward when names are well-formed. Sometimes it's
| nicer to figure out a 'find at-place thing -exec do-the-stuff {}
| \;'. And sometimes one needs some other tool that scours the file
| system directly and doesn't choke on absolutely bizarre file
| names and gives a representation that doesn't explode in the
| subsequent context, whatever that may be, which is quite rare.
|
| A more common issue than file names consisting of line breaks is
| unclean encodings, non-UTF-8 text that seeps in from lesser
| operating systems. Renaming makes the problem go away, so one
| should absolutely do that and then crude techniques are likely
| very viable again.
___________________________________________________________________
(page generated 2024-06-25 23:01 UTC)