[HN Gopher] Desed: Demystify and debug your sed scripts
       ___________________________________________________________________
        
       Desed: Demystify and debug your sed scripts
        
       Author : asicsp
       Score  : 146 points
       Date   : 2024-09-05 04:46 UTC (18 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | hiAndrewQuinn wrote:
       | I feel we're witnessing a resurgence of interest in 'nix default
       | programs such as `sed` and `awk` in part because LLMs make it so
       | much easier to get started in them, and because they really do
       | exist _everywhere_ you might look. (The fact they were designed
       | to be performant in bygone decades and are super-performant now
       | as a result is also nice!)
       | 
       | There is just something incredibly freeing about knowing you can
       | sit down at a freshly-reinstalled box and do productive work
       | without having to install a single thing on the box itself first.
       | 
       | EDIT: https://hiandrewquinn.github.io/til-site/posts/what-
       | programm... might be of interest if you want to know what you can
       | work with right out of the box on Debian 12. Other distros might
       | differ.
        
         | WizardClickBoy wrote:
         | 100% agree. I'm currently preparing several 10s of GBs of HTML
         | in nested directories for static hosting via S3 and was
         | floundering until Gippity recommended find + exec sed to me.
         | I'm now batch fixing issues (think 'not enough "../" in 60000
         | relative hrefs in nested directories') with a single command
         | rather than writing scripts and feel like a wizard.
         | 
         | These tools are things I've used before but always found
         | painful and confusing. Being able to ask Gippity for detailed
         | explanations of what is happening, in particular being able to
         | paste a failing command and have it explain what the problem
         | is, has been a game changer.
         | 
         | In general, for those of us who never had a command line wizard
         | colleague or mentor to show what is possible, LLMs are an
         | absolute game changer both in terms of recommending tools and
         | showing how to use them.
        
           | godelski wrote:
           | Primeagen detected
           | 
           | I find him hard to listen to when he does things like this
        
             | WizardClickBoy wrote:
             | Primeagen is some kind of Youtuber? I am not familiar and
             | don't understand what you are trying to convey here.
        
               | 000ooo000 wrote:
               | Guessing 'gippity' has been used by primeagen recently,
               | so now you're gonna be tarred with the 18-23 React
               | bootcamp graduate brush (at least that's who I imagine
               | find him watchable).
        
               | WizardClickBoy wrote:
               | It's a case of convergent evolution - I don't know where
               | I heard it first, but I asked GPT if it minded and it
               | said "Of course, you can call me Gippity!", so I do,
               | because it's more fun.
        
               | poulpy123 wrote:
               | yes, and a cringy one
        
           | barrkel wrote:
           | If you have a lot of files, consider find piped to xargs with
           | -P for parallelism and -n to limit the number of files per
           | parallel invocation.
           | 
           | Only a tiny bit more complex but often an order of magnitude
           | faster with today's CPUs.
           | 
           | Use -print0 on find with -0 on xargs to handle spaces in
           | filenames correctly.
           | 
           | GNU parallel is another step up, but xargs is generally
           | always to hand.
        
             | WizardClickBoy wrote:
             | Thanks! Gippity did suggest the xargs approach as an
             | alternative, but I found that
             | 
             | find [...] - exec [...] {} +
             | 
             | as opposed to
             | 
             | find [...] - exec [...] {} \;
             | 
             | worked fine and was performant enough for my use-case. An
             | example command was
             | 
             | find . -type f -name "*.html" -exec sed -i '' -e 's/\\.\\.\
             | /\\.\\.\/\\.\\.\//\\.\\.\/\\.\\.\/\\.\\.\/source\//g' {} +
             | 
             | which took about 20s to run
        
               | mdaniel wrote:
               | One can express your sed in less Leaning Toothpick
               | Syndrome[1] via:                 find . -type f -name
               | "*.html" -exec sed -i '' -e
               | 's|\.\./\.\./\.\./|../../../source/|g' {} +
               | 
               | Using "/" as the delineation character for "s" patterns
               | that include "/" drives me batshit - almost as much as
               | scripts that use the doublequote for strings that contain
               | no variables but also contain doublequotes (looking at
               | you, json literals in awscli examples)
               | 
               | If your sed is GNU, or otherwise sane, one can also `sed
               | -Ee` and then use `s|\Q../../../|` getting rid of almost
               | every escape character. I got you half way there because
               | one need not escape the "." in the replacement pattern
               | because "." isn't a meta character in the replacement
               | space - what would that even mean?
               | 
               | 1:
               | https://en.wikipedia.org/wiki/Leaning_toothpick_syndrome
        
             | skydhash wrote:
             | Parallel is nice when doing music conversion with ffmpeg.
        
         | dools wrote:
         | I needed some scripts to run a little "factory" for flashing an
         | operating system onto some IoT devices. Lots of the work was
         | running various shell commands but it is nonetheless something
         | I would have traditionally written in PHP or Python but I
         | thought "what the hell" and did the whole thing in bash with
         | ChatGPT and it was a totally mind blowing experience.
         | 
         | Now I use bash for all sorts of stuff. I've been working with
         | *nix for 20 years but bash is so arcane and my needs always so
         | immediate that I never did anything other than use it to run
         | commands in sequence with maybe a $1 or a $2 in there
        
         | godelski wrote:
         | I've gotten into it recently but actually not because LLMs.
         | Actually I find them unhelpful here. The reason I've gotten
         | into it is because I wanted to make a bunch of install scripts
         | for programs I want on fresh boxes. Mostly it's been fun.
         | Seeing what I can do with curl, sed, awk, regex, and bash
         | scripting. I'm often finding that I can do a ton of things in a
         | single line where I would have done a lot more if I wrote it in
         | python or something else. Idk, there's just something very fun
         | about this.
         | 
         | Though what's been a little frustrating is that there's anti
         | scraping measures and they break things. But they're always
         | trivial to get around, so it's just annoying.
         | 
         | A big reason LLMs and up failing is that I need my scripts to
         | work on osx and nix machines. So it's always suggesting things
         | to me that work on one but not the other. It seems to not want
         | to listen to my constraints and grep is problematic for them in
         | particular. Luckily man pages are great. I think they're often
         | over looked.
        
           | asicsp wrote:
           | If you are able to install specific implementations of the
           | tools, go with GNU tools on all the machines. That way, you'd
           | get more features and work the same everywhere.
           | 
           | If that is not an option, go with Perl. It'd be a little
           | slower, but you'll get consistent results. Plus, Perl has
           | powerful regex, lots of standard libraries, etc.
        
             | godelski wrote:
             | Well the fun is, as I was trying to convey, building the
             | tools automatically from fresh boxes. Sure, I can bootstrap
             | my way by first installing gnu coreutils but if this was
             | about doing things the easy way I'd just use the relevant
             | package manager and ansible like everyone else
        
         | keybored wrote:
         | I resent this combination.
         | 
         | - We never figured out how to package programs properly (Nix
         | needs to become easier to use)
         | 
         | - For all kinds of smaller tasks we practically need to use
         | those Unix tools
         | 
         | - Those everywhere tools are for hysterical raisins hard to use
         | in a larger context (The Unix Philosophy in practice: use these
         | five different tools but keep in mind that they are each
         | different from each other across six dimensions and also they
         | have defaults from the 70's or 80's)
         | 
         | - For a lot of "simple" things you need to remember the simple
         | thing plus eight comments (on the StackOverflow answer which
         | has 166 votes but that's just because it was the first to
         | answer the question) with nuance like "this won't work for your
         | coworker on Mac"
         | 
         | - So you don't: you go to SO (see previous) and use snippets
         | (see first point: we don't know how to package programs, this
         | is the best we got)
         | 
         | - This works fine until Google Search decides that you are too
         | reliant on it for it to have to work _well_
         | 
         | - Now you don't use "random stuff from StackOverflow" which can
         | at least have an audit trail: now you use random weights from
         | your LLM in order to make "simple" solutions (six Unix tools in
         | a small Bash script which you can't read because Bash is hard)
         | 
         | This is pretty much the opposite of what inspired me when
         | studying computer science and programming.
        
           | skydhash wrote:
           | > We never figured out how to package programs properly
           | 
           | What the issue with apt, pacman, and the others? I think
           | they're doing their job fine.
           | 
           | > For all kinds of smaller tasks we practically need to use
           | those Unix tools
           | 
           | I mean, they're good for what they do
           | 
           | > Those everywhere tools are for hysterical raisins hard to
           | use in a larger context
           | 
           | Because each does a universal task you may want to do in the
           | unix world of files and stream of texts.
           | 
           | > For a lot of "simple" things you need to remember the
           | simple thing plus eight comments
           | 
           | No, you just need the manuals. And there are books too. And
           | yes the difference between BSD and GNU is not obvious at
           | first glance. But they're different software worked on by
           | different people.
        
       | leetrout wrote:
       | Related, `sd` is a great utility worth the install which makes
       | simple sed-type operations more obvious / easier (for some value
       | of easy).
       | 
       | https://github.com/chmln/sd
        
         | oguz-ismail wrote:
         | It uses a different syntax though. Hardly worth anyone's time
        
           | Etheryte wrote:
           | Not sure if I agree. Sed is widely known and much of the
           | value comes from that, just being around for a long while,
           | but I wouldn't really say that the syntax is all that
           | straightforward. As a thought experiment, try explaining how
           | to use sed to a fresh graduate who's never seen it. Not
           | saying sd is better or anything, but rather that just because
           | the syntax is different doesn't make it bad.
        
             | oguz-ismail wrote:
             | sed is widely known because it's available everywhere and
             | is used in every shell script. I just don't see the point
             | in learning a new utility that does the same thing as sed
             | but with different syntax. In this case the new utility
             | doesn't even honor my language settings and just errors out
             | if I enter a non-English letter. It's ridiculous
        
               | wolletd wrote:
               | How? Shouldn't it just all be UTF-8? Or do you use a
               | different encoding on your system?
        
             | wolletd wrote:
             | > try explaining how to use sed to a fresh graduate who's
             | never seen it
             | 
             | Well, for starters, you just `s/<regex>/<replacement>/` and
             | try to use that in your everyday work. Just forget about
             | the syntax. It's a search-and-replace tool.
             | 
             | That's the only way I used sed for years. I've learned more
             | since then, but it's still the command I use the most. And
             | that's also what `sd` focuses on.
             | 
             | Also, if you want to replace newlines, just use `tr`, to
             | hook onto the examples of sd. It may seem annoying to use a
             | different tool, but there are two major advantages: 1.
             | you're learning about the existence, capabilities and
             | limitations of more tools 2. both `sed` and `tr` are
             | probably available in your next shitty embedded busybox-
             | driven device, while `sd` probably is not
             | 
             | As you said, the value comes from being around for a long
             | time and, probably more importantly, still being present on
             | nearly any Unix-like system.
        
             | ta1243 wrote:
             | 99% of the time I use sed to mangle the output of a text
             | file into something else.
             | 
             | Earlier I did this                   cat as1 | grep " 65" |
             | sed -e 's/.* 0 65/65/' -e 's/[^ 0-9]//' |sort|uniq
             | 
             | Now some twat will come along and say my process should
             | have been                   cat as1         grep " 65" as1
             | grep " 65" as1 | sed -e (various different tries to the
             | data looks useful)         grep " 65" as1 | sed -e
             | (options) | sort|uniq
             | 
             | Because otherwise it's a "useless use of cat" and
             | reformatting my line is well worth the time and cognitive
             | load to save those extra forks.
        
               | Etheryte wrote:
               | I think the concept of useless use of cat is one of the
               | few things I strongly disagree with in software
               | development. Most things have their trade-offs, pros and
               | cons, but using cat to start a pipe makes everything
               | composable and easy to work with, it's pretty much
               | universally good. The moment you drop it because of the
               | small redundancy, you have to make sure you don't mess up
               | the params for whatever comes next, and that overhead is
               | in my opinion never worth what you gain by dropping cat.
        
           | GolDDranks wrote:
           | sd has very much proven to be worth of my time. It's both
           | faster and way easier to use.
        
           | keybored wrote:
           | Middle-brow dismissal. Hardly worth anyone's consideration.
           | 
           | Just go straight to the point that this isn't available on a
           | proprietary Unix that had its EOL fifteen years ago and that
           | five people still use.
        
             | oguz-ismail wrote:
             | >this isn't available on a proprietary Unix
             | 
             | Skill issue. It's not necessary in the first place anyway
        
         | ReleaseCandidat wrote:
         | As soon as there is a _complete_ regex reference in the readme,
         | it may be worth a try. The main problem with _any_ regex tool
         | or programming language or ... is the subtle and not so subtle
         | differences between the various regex implementations - like
         | the "normal" and "extended" mode of sed.
         | 
         | This phrase:                   sd uses regex syntax that you
         | already know from JavaScript and Python.
         | 
         | says it all.
         | 
         | I still haven't found a better short overview of various regex
         | engines than that:
         | https://web.archive.org/web/20130830063653/http://www.regula...
        
           | ptman wrote:
           | Indeed. It's different from Python, maybe JavaScript as well.
           | https://docs.rs/regex/latest/regex/#syntax
        
         | gregwebs wrote:
         | There's also sad that let's you review find and replace changes
         | to files before making them: https://github.com/ms-jpq/sad
        
       | aidos wrote:
       | sed, awk, grep and friends are just so effective at trawling
       | through text.
       | 
       | I dump about 150GB of Postgres logs a day (I know, it's over the
       | top but I only keep a few days worth and there have been several
       | occasions where I was saved by being able to pick through them).
       | 
       | At that size you even need to give up on grepping, really. I've
       | written a tiny bash script that uses the fact that log lines
       | start with a timestamp and `dd` for immediate extraction. This
       | allows me to quickly binary search for the location I'm
       | interested in.
       | 
       | Then I can `dd` to dump the region of the file I want. After that
       | I have an little awk script that lets me collapse the sql lines
       | (since they break across multiple lines) to make grepping really
       | easy.
       | 
       | All in all it's a handful of old school script that makes an
       | almost impossible task easy.
        
         | porridgeraisin wrote:
         | Can you explain how you used dd here? Ive never seen it used
         | this way, curious
        
           | fwip wrote:
           | dd lets you specify an offset to start reading the file at,
           | with `skip`. This would let you perform a binary search by
           | picking an offset in the file, reading a small chunk (say, a
           | kilobyte), and scanning for the date/time string within it.
           | Each read should be O(1) in terms of the size of the file, so
           | a O(log(n)) for the binary search, whereas a grep-based
           | approach is O(n).
           | 
           | (The datetime in the log message is presumably sorted, or
           | nearly so).
        
           | aidos wrote:
           | Sure! I've created a gist so you can see for yourself but the
           | basic idea is as described. Read a chunk, find the first date
           | in it and then decide if you want to read further forward or
           | back in the file.
           | 
           | https://gist.github.com/aidos/5a6a3fa887f41f156b282d72e1b79f.
           | ..
           | 
           | For anyone else, here's the awk for combining lines in the
           | log files for making them greppable too: https://gist.github.
           | com/aidos/44a9dfce3c16626e9e7834a83aed91...
        
       | qwertox wrote:
       | > Why sed??
       | 
       | > Sed is the perfect programming language, especially for graph
       | problems. It's plain and simple and doesn't clutter your screen
       | with useless identifiers like if, for, while, or int. Furthermore
       | since it doesn't have things like numbers, it's very simple to
       | use.
       | 
       | "useless identifiers like if, for, while, or int"? Useless
       | identifiers?
        
         | ReleaseCandidat wrote:
         | That's about as serious as                   Some of the
         | notable features include:            Preview variable values,
         | both of them!            ...            Its name is a
         | palindrome
        
           | 082349872349872 wrote:
           | To be fair, IBM actually had a commercial product that was
           | simpler (cheaper) because it didn't have things like numbers:
           | https://en.wikipedia.org/wiki/IBM_1620#Transferred_to_San_Jo.
           | ..
        
         | russfink wrote:
         | For me, the question of why is because it's already installed
         | in the environment and available on every UNIX system I have
         | used. This is a case of conforming myself to the tool, rather
         | than the other way around. If you are of a certain vintage like
         | I am, You got used to doing these things early on because we
         | could not just apt install foo on our platforms anytime we
         | needed something.
         | 
         | I do not mean to sound like "kids these days... " I really like
         | these modern systems that allow you to install a wide range of
         | packages. It is a huge step forward. I just want to explain my
         | perspective, perhaps others share that perspective. It probably
         | also explains why such tools continue to exist.
        
       | JoelJacobson wrote:
       | I wish there was a similar tool for relational algebraic
       | expressions, to make relational database research papers more
       | accessible.
        
       | sylware wrote:
       | I am done with regular expressions languages and engines. Each
       | time I wanted to do a not so trivial usage of it, I had to re-
       | learn the language(s) and debug it, not to mention the editing
       | operations on top of them (sed...).
       | 
       | This has been quite annoying. So now I code it in C or assembly
       | fusing common-cases code templates and ready build scripts to
       | have a comfortable dev loop.
       | 
       | In the end, I get roughly the same results and I don't need those
       | regular expressions languages and engines.
       | 
       | It is a clear win in that case.
        
       | mlegendre wrote:
       | Amusingly, in French, "desed" sounds like "decede", which means
       | die / decease. That's quite a fitting name for a tool one would
       | use in "I need to debug a sed script" situations!
        
         | 082349872349872 wrote:
         | `sed` in latin is often used to contrast two things, "not this,
         | but that", eg
         | 
         |  _Amicitia non semper intellegitur sed sentitur._ (Friendship
         | is not always understood, but it is felt.)
         | 
         | which I'm always reminded of when using sed(1) in a script to
         | provide, not this pattern, but that replacement.
        
       | russfink wrote:
       | No Debian (Ubuntu, Mint and friends) version?
        
       | trey-jones wrote:
       | Once in HN comments I saw `sed` referred to as a one-way hashing
       | function, and that's always stuck with me - not just for sed, but
       | for any type of operation that ends up being sort of a "black
       | box". Input becomes output reliably, but it's hell to understand
       | how. My big take away was: These types of operations are OK, when
       | necessary, but it's a good idea to take the time to write some
       | comments/documentation so the next person who looks at it
       | (including self) has somewhere to start.
       | 
       | That said, debugging is definitely a thing, and tools like this
       | are awesome!
        
       | mifydev wrote:
       | Oh, I definitely need to run this one on
       | https://github.com/chebykinn/sedmario
        
       | ok123456 wrote:
       | This is built into perl:
       | 
       | perl -MO=Deparse -w -naF: -le 'print $F[2]'
        
       | tqwhite wrote:
       | IMPOSSIBLE!!! God made sed as a test for humans to prove their
       | humility. It is intrinsically mysterious.
        
       ___________________________________________________________________
       (page generated 2024-09-05 23:01 UTC)