hngopher.com

       [HN Gopher] CLI text processing with GNU awk
       ___________________________________________________________________
        
       CLI text processing with GNU awk
        
       Author : asicsp
       Score  : 369 points
       Date   : 2023-08-28 06:02 UTC (16 hours ago)
        
 (HTM) web link (learnbyexample.github.io)
 (TXT) w3m dump (learnbyexample.github.io)
        
       | rottc0dd wrote:
       | confession: plug
       | 
       | I once wrote a diff2html script ported from bash and it was much,
       | much faster (for obvious reasons). And awk makes it much more
       | readable than bash script. And I could learn the language, debug,
       | understand bugs and fix them in a night.
       | 
       | Not sure, if it is idiomatic way to awk, but have to say it is a
       | really nice language.
       | 
       | https://github.com/berry-thawson/diff2html/blob/master/diff2...
        
       | thangngoc89 wrote:
       | I have been using ChatGPT for generating these kind of small CLI
       | like this. My prompts look like this:                   - use jq
       | to count a nested array "a.b.c.d"         - find and delete empty
       | folders using `find`         - find and replace text using
       | sed/awk
       | 
       | I found that using ChatGPT for these purposes boosted my
       | productivity tremendously.
        
         | simonw wrote:
         | My usages of tools like awk, sed and Bash scripting has
         | increased an enormous amount thanks to ChatGPT/GPT-4.
         | 
         | I'm using those on a weekly basis now, because I don't have to
         | memorize details of entirely new programming languages in order
         | to apply them to small problems.
         | 
         | Smaller languages that I never took the time to learn are no
         | longer something I avoid. I even use AppleScript now!
         | https://til.simonwillison.net/gpt3/chatgpt-applescript
        
         | [deleted]
        
         | dotancohen wrote:
         | ChatGPT is a great time saver for those who already know how to
         | use awk. But it should not be used by those who are unfamiliar.
         | 
         | Just an example, I saw someone come up with a great awk line to
         | change some text in a nested directory. He then pasted into
         | bash. Only once the server went down did anybody realize that
         | he forgot to cd into the proper directory and he wiped out not
         | only the server config but also all the user-uploaded data as
         | well.
         | 
         | The server config was not version controlled and the user data
         | had not been backed up in almost a week.
        
           | swores wrote:
           | That's not really a ChatGPT issue, people pasting in slightly
           | wrong commands (or right commands in the wrong folder) is a
           | tale as old as time - well, as old as linux at least. Short
           | of saying that nobody who's already an expert should ever
           | touch a CLI, the lesson from that story is "be as careful as
           | possible, then be more careful, and also have backups of
           | everything" not "don't use a LLM to help".
        
             | simonw wrote:
             | Yeah, that exact same problem could easily affect someone
             | who spent hours cobbling together the same awk script from
             | Google searches and StackOverflow.
        
             | mistercheph wrote:
             | What "care" do you suggesst that someone pasting in a
             | script they don't understand should take?
        
               | swores wrote:
               | I'm far from an expert so you should probably ask someone
               | other than me. But my two cents would be not to paste any
               | code until you have understood it, or unless it's written
               | by a source you trust, or alternatively only paste it
               | somewhere you don't care - when I'm playing around
               | testing stuff I might not fully understand on a linux
               | server I do it on a VPS that's unimportant to me, and
               | that if I mess it up I can very easily restore it back to
               | a clean OS install and I have a bash script ready to
               | reinstall all software I want & all the profile
               | customisations etc.
        
       | [deleted]
        
       | mplanchard wrote:
       | I love awk, and I find myself reaching for it a fair bit. One of
       | the main things I use it for is "sed with state," so for things
       | like matching on a line, but only if it was preceded by some
       | other line. I find this to be really useful for creating one-off
       | linters, for example I made one recently to check all our
       | migration files for CREATE INDEX without CONCURRENTLY on a
       | particular set of very large tables where it would cause issues.
       | Since sql statements can be spread over multiple lines, it was
       | difficult to write a straightforward match, but awk can track
       | state like "I'm in a create statement," "I'm creating an index,"
       | etc. across multiple lines, which allowed me to cobble together
       | something that has worked well for about a year now.
        
         | noloblo wrote:
         | Can you share this example of tracking state of sql with awk?
        
           | mplanchard wrote:
           | Sure! I posted a gist here, stripped of anything particular
           | to our company: https://gist.github.com/mplanchard/07229d61bd
           | 32ce73624d9003c...
        
         | jmholla wrote:
         | One of these days I need to get around to learning awk. In the
         | meantime, I've learned some of the deeper, stateful, features
         | of sed. For instance, you mentioned wanting to only output a
         | line if it was preceded by another. Here's a sed command that
         | does so:                   sed -ne 'x' -e '/PREV/ {x; /CURR/ p;
         | x}'              > echo -e "PREV\nCURR\nCURR\nCURR\nPREV\nRED"
         | | sed -ne 'x' -e '/PREV/ {x; /CURR/ p; x}'         CURR
         | 
         | This uses sed's hold buffer. I'll break it down:
         | sed -n
         | 
         | The `-n` tells sed no to print anything out. By default, sed
         | prints out whatever is left when processing. We'll tell it with
         | the `p` command when to do so.                   sed -ne 'x'
         | 
         | `-e` indicates we are specifying one of the scripts sed will
         | execute. The command `x` switches the current line with
         | whatever is in the hold buffer. We'll do this on every line.
         | sed -ne 'x' -e '/PREV/
         | 
         | The next command will only run on lines that contain `PREV`.
         | But, because we've been putting lines in the hold buffer, we'll
         | only execute on lines after `PREV` when it has been switched
         | out of the hold buffer.                   sed -ne 'x' -e
         | '/PREV/ { ... }'
         | 
         | The braces indicate all commands should be run when we see this
         | match.                   sed -ne 'x' -e '/PREV/ { x; ... }'
         | 
         | First, we switch the hold buffer with the line buffer.
         | sed -ne 'x' -e '/PREV/ { x; /CURR/ p; ... }'
         | 
         | Then, we only print out the line if it contains CURR.
         | sed -ne 'x' -e '/PREV/ {x; /CURR/ p; x}'
         | 
         | Finally, we switch them back in case there is overlap in our
         | matches. (Give `echo -e
         | "PREV\nPREVCURR\nCURR\nCURR\nPREV\nRED"` a try with this.)
         | 
         | All that said, I'm pretty sure the `awk` script is much simpler
         | and more direct, but I wanted to share how one might accomplish
         | this was sed.
         | 
         | The time I spent learning this probably would've been better
         | spend on awk, but this tutorial[0], was so good and so easy, it
         | taught me nearly everything I know about sed.
         | 
         | [0]: https://www.grymoire.com/Unix/Sed.html
        
           | mbivert wrote:
           | > One of these days I need to get around to learning awk
           | 
           | Plan9's awk(1)[0] man page provides a precise and concise (a
           | few paragraphs) presentation of the core features of all awk
           | implementations.
           | 
           | Tutorials bring practical knowledge, but often lack complete
           | and self-contained descriptions of those nifty little tools.
           | 
           | [0]: https://man.cat-v.org/plan_9/1/awk
        
             | btschaegg wrote:
             | I still maintain that "The AWK Programming Language" [1] is
             | one of, if not _the_ best programming language book I 've
             | read so far.
             | 
             | It's short and to the point, has good examples, and cuts
             | most of the usual fluff like "what is a variable?". Its
             | base assumptions are: You know how to program, and you're
             | here to learn AWK. Let's get to it.
             | 
             | I dearly wish there'd be more books like it for other
             | languages.
             | 
             | [1]: https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
        
           | mplanchard wrote:
           | This is nifty, thanks for sharing! I had no idea that sed had
           | a hold buffer, and it's very cool that you can swap it in and
           | out within the sed command like that. It's funny, because I
           | went essentially the opposite way that you did: I used to
           | know sed and awk basics, but then I properly learned awk.
           | Since then my sed has atrophied a bit, and I still only know
           | the basics. I'll have to run through that tutorial you linked
        
       | tyingq wrote:
       | One somewhat not-well-known thing with gawk is that it typically
       | ships with some useful extensions that give you access to things
       | like readdir(), ord(), chr(), gettimeofday(), sleep(), etc.
       | 
       | https://www.gnu.org/software/gawk/manual/html_node/Extension...
        
       | cb321 wrote:
       | Of possible interest - instead of making a whole new programming
       | language like awk, you can also just systematize generating code
       | for an existing one with a command-line harness.
       | 
       | This can even stay terse & keep a fairly fast edit-test
       | turnaround in a fully statically typed language like Nim:
       | https://github.com/c-blake/bu/blob/main/doc/rp.md
        
         | macintux wrote:
         | Whole new language? Awk is 45 years old.
        
           | cb321 wrote:
           | I agree that "writing/learning a different" (what I meant) is
           | more clear wording than "making a whole new".
           | 
           | EDIT: and it is a fair counterpoint that any command with
           | options is, in _some_ sense, also a different language one
           | must learn. Learning API calls is _also_ a different language
           | (at least nouns  & verbs if not syntax). But that is all
           | partly the point. awk did/does a programming language with
           | different syntax where other alternatives might be enough.
        
         | BaseballPhysics wrote:
         | If you think that's a good idea, you don't understand why tools
         | like awk/perl/sed/etc exist and are popular. They are, by
         | design, optimized toward specific kinds of use cases.
         | 
         | In fact, their dynamically typed nature is a perfect example of
         | that since it's much easier to quickly manipulate strings in a
         | language that isn't so strict, as they'll do more heavy lifting
         | for you via automatic coercion while limiting extra
         | syntax/boilerplate (which, granted, is less of a problem with
         | modern type inference). That makes it a lot easier to toss
         | together quick one-liners and glue code, which is where these
         | tools shine in the first place.
         | 
         | Hell, even something like python or ruby is just a little too
         | structured for my taste when doing something quick and dirty,
         | which is why I love perl as it can be unstructured if that's
         | all I need, or I can create a more structured program if that's
         | what the problem requires.
        
           | cb321 wrote:
           | It's just a different & in my experience often neglected
           | point in a similar design space (as that initial, linked text
           | argues). Your tastes & use cases are your own. Almost
           | everything "all depends" upon so very much in computer
           | systems & in life.
           | 
           | To add some more color, Nim is also a very adaptable
           | prog.lang. I believe there are converts from Perl in its fan
           | base. Nim's creator long ago recreated some Perl in Nim:
           | https://nim-lang.org/araq/perlish.html
           | 
           | Anyway, it's a different set of trade-offs to consider which
           | I thought some reading about learning awk with open minds
           | might find interesting. That's all, really.
        
         | kazinator wrote:
         | So Awk is a whole _new_ language, but Nim isn 't?
        
           | cb321 wrote:
           | I never said Nim was unique | older than awk. While I cannot
           | make you read my cousin comment to understand I meant "new"
           | as clarified-"different" [1] or click through any links, I
           | can perhaps non-redundantly emphasize that the mentioned
           | approach "works" not just for Nim, but for _any language_ ,
           | _C_ & _Go_ (impls refd in mentioned `rp.md`), and _Python_ in
           | another comment in this comment thread:
           | https://news.ycombinator.com/item?id=37295399 (maybe even
           | with `eval` there!)
           | 
           | Only, the approach "works" with differing levels of "success"
           | for different use cases / contexts. It is true (whichever)
           | _shell language_ is still there to differ in shell 1-liner
           | cases. That is _also_ true of sed  / awk / perl / ... If you
           | don't want to click through on `rp.md`, you could also read
           | Ben Hoyt's article on his Prig if you like:
           | https://benhoyt.com/writings/prig/ discussed on HN a while
           | back https://news.ycombinator.com/item?id=30498735
           | 
           | It's not actually _that_ different from your `cppawk` that
           | you mention elsethread.. just maybe rotated 27 degrees away
           | in  "idea space". ;-)
           | 
           | [1] https://news.ycombinator.com/item?id=37293475
        
       | kazinator wrote:
       | I maintain a minor side interest in Awk, along side Lisp and
       | other things.
       | 
       | I developed cppawk in 2022:
       | https://www.kylheku.com/cgit/cppawk/about/
       | 
       | cppawk extends Awk with preprocessing.
       | 
       | There is a loop macro that supports a vocabularly of clauses.
       | Clauses can be combined for parallel and cross-product iteration.
       | And they are user-extensible. By writing five simple macros, you
       | can define a new clause.
       | 
       | Something potentially useful if you use Awk.
       | 
       | Cppawk is documented with multiple man pages, and covered by unit
       | tests which run with gawk and mawk.
        
       | e63f67dd-065b wrote:
       | Perhaps my old sysadmin hat is showing through, but I don't quite
       | see what the advantage of awk is over just writing the same thing
       | in perl. I've seen my fair share of horrendous shell scripts from
       | junior sysadmins, and every time I think to myself "the text
       | processing portion would be so much cleaner in Perl".
        
         | asicsp wrote:
         | If you are comparing Awk vs Perl for scripts, I'd prefer Perl
         | (or Python).
         | 
         | This post is about short one-liners for ad hoc use cases. I
         | prefer sed/awk over Perl for such cases. Though, if you already
         | know Perl, you could continue using it instead of having to
         | learn more tools.
        
           | SOLAR_FIELDS wrote:
           | Do all systems still come with Perl baked in these days? If
           | so I could see reaching for that over awk/sed. If I have to
           | install a runtime I may as well just reach for Python
        
             | thesuperbigfrog wrote:
             | >> Do all systems still come with Perl baked in these days?
             | 
             | If you use Git for Windows (https://gitforwindows.org/), it
             | includes Perl.
        
               | btschaegg wrote:
               | ...and gawk :)
        
               | thesuperbigfrog wrote:
               | Yes. Frequently any tool set that has gawk will also
               | include sed, perl, cut, head, tail, less, vi / vim, etc.
               | 
               | It is nice that Git for Windows includes bash and all
               | these tools.
        
             | pphysch wrote:
             | A bunch of the default git extensions are written with
             | Perl, so you will find some version of it available on most
             | modern Linux systems
        
         | whartung wrote:
         | Awks super power, and the reason I mostly use it, is it's free
         | read loop, free field splitting, and the pattern/condition
         | matching model.
         | 
         | As a LANGUAGE, it's "eh". It just happens to be "good enough".
         | 
         | You can, of course, do all of that with Perl. But then I have
         | to write all that boiler plate I get with awk for free. And the
         | gains in Perls language aren't enough, for me, to dump awk. And
         | I don't use it for "scripting", I use it for data processing,
         | tearing up files for mostly one off tasks. So I don't miss
         | Perls depth. If I want depth, I'll go somewhere else.
        
           | e63f67dd-065b wrote:
           | > free read loop
           | 
           | perl -n
           | 
           | > free field splitting
           | 
           | perl -a ... $F[1/2/3/etc]
           | 
           | > pattern/condition matching model
           | 
           | Not quite sure what you mean, but `perl -lane 'print if
           | /abc/'` is might be what you're looking for
           | 
           | The boilerplate can be mostly eliminated with the magic
           | incantation of `perl -lane`. The trick that makes all this
           | work is that perl defines a whole bunch of pre-defined
           | variables and populates them with things that might be
           | helpful (see $_, @F, etc).
        
         | zeteo wrote:
         | If you're in perl all the time that probably makes a lot of
         | sense. For me, awk is one of the few languages that I can
         | safely set aside for months and then I'm back up to speed in 10
         | minutes. There's just something very intuitive about it, and it
         | somehow fits very naturally with other common command line
         | tools.
        
           | BaseballPhysics wrote:
           | Weird, I feel the same way about Perl.
        
         | SomeoneFromCA wrote:
         | awk is more intuitive for sure, for a regular javascript coder
        
         | ajross wrote:
         | There aren't any technical advantages, no. Perl's features are
         | a proper superset of awk (by design!).
         | 
         | What's happened is that Kids Today (tm) never learned perl. So
         | they're discovering awk as someone new to the idea of stream
         | processing. And awk was a great idea for that, and it
         | represented a genuine innovation worth emulating.
         | 
         | In the late 1970's. Then of course perl did emulate and surpass
         | it. But then got forgotten. So kids are discovering awk
         | instead. It's a little cringe, really.
        
           | arp242 wrote:
           | I'm one of those "kids these days" but did actually learn to
           | program Perl at some point, and I generally prefer AWK. Perl
           | is a large and complex language, I don't need it that often,
           | and I'm not smart enough to keep remembering all of it.
           | 
           | Now, if I would get hired as a full-time Perl developer and
           | spent 2 years developing Perl: it would perhaps be different.
           | But that's not the case, and isn't for most people.
           | 
           | For better or worse, Perl sees a lot less usage than it once
           | did; I rarely encounter it "in the wild" and don't even have
           | it on my laptop because nothing needs it.
        
           | macintux wrote:
           | For simple uses cases, I find awk simpler than Perl. I love
           | Perl, have written tens of thousands of lines, but on the CLI
           | I prefer awk. I'm sorry I "cringe" you.
        
           | hereonout2 wrote:
           | Perl came out in the late 80s and by the mid 2000's was
           | really on its way out? I hired for my last perl position in
           | around 2006.
           | 
           | Just saying, your definition of "kids today" could well
           | include a decent portion of developers under 45 years old.
           | Referring to this cohort repeatedly as "kids" is also a
           | little cringe.
        
             | ajross wrote:
             | Did you really just reply to a comment that used the phrase
             | "Kids Today (tm)" and try to interpret it as a genuine
             | insult? The inability of this community to understand
             | straightforward humor amazes me. Dude, it was a joke. And
             | _yes_ , I was calling mid-career professionals "kids".
             | Deliberately. Because I'm old. And it's funny.
        
           | sureglymop wrote:
           | Why is that cringe? They genuinely probably came across awk
           | before perl (I know I did, I read "The AWK Programming
           | Language" and then went on to "The C Programming Language").
           | Having that said, awk is great and it's been the same for
           | decades and available on every system (the same can't really
           | be said about perl).
        
             | thesuperbigfrog wrote:
             | >> awk is great and it's been the same for decades and
             | available on every system (the same can't really be said
             | about perl).
             | 
             | The only issue with AWK is that there are many
             | implementations and they are not always compatible with one
             | another:
             | 
             | https://www.gnu.org/software/gawk/manual/html_node/Other-
             | Ver...
             | 
             | I have ported AWK scripts from legacy Unix systems to Linux
             | and ran into incompatibilities that required some
             | adjustments to the scripts.
             | 
             | Curious: what systems have AWK, but do not have Perl?
        
           | coliveira wrote:
           | Awk is a useful language that you can learn in one afternoon,
           | after reading the man page and a few examples. And then you
           | can spend your whole life using it for several projects. You
           | cannot do that with perl. That's why awk has a longer shelf
           | life than perl.
        
             | skinkestek wrote:
             | > And then you can spend your whole life using it for
             | several projects. You cannot do that with perl. That's why
             | awk has a longer shelf life than perl.
             | 
             | A Perl developer would of course say you have this
             | completely backwards and even if I haven't programmed Perl
             | much, or even at all for the last decade I would tend to
             | agree.
        
         | pletnes wrote:
         | Remove perl, have less security issues. Some scanning tools
         | flag it, too. Awk is found in more places, in my experience.
        
           | ajross wrote:
           | Sorry, but that's ridiculous. Any general purpose programming
           | language is a vector for bugs and security problems, but come
           | on: you're genuinely trying to say that a kludge of
           | bash+sed+awk is objectively more "secure" than a single perl
           | script to solve the same problem?
        
             | coliveira wrote:
             | In the case of awk, actually yes, it is safer. The reason
             | is that awk is a very limited language. It has only enough
             | functionality to provide text matching and substitution. It
             | is very difficult to use awk to do anything of high
             | security risk, compared to a language like perl.
        
               | rascul wrote:
               | > It has only enough functionality to provide text
               | matching and substitution.
               | 
               | Gawk at least can do a lot more than that. Reading and
               | writing files, network communications, and run arbitrary
               | shell commands, for example. It's certainly not as
               | powerful as perl but it's also not limited to just text
               | matching and substitution.
               | 
               | Edit: figured I would provide some examples. Here's an
               | http server and a first person shooter in gawk. Maybe not
               | so practical but they show some of gawk's capabilities.
               | 
               | https://github.com/kevin-albert/awkserver
               | 
               | https://github.com/TheMozg/awk-raycaster
        
               | vbrandl wrote:
               | There is a virus written in awk that infects other awk
               | scripts[0]. And according to wikipedia, the language is
               | Turing complete.
               | 
               | [0]: https://github.com/SPTHvx/ezines/tree/main/dc5/CODES
               | /Perfori...
        
               | ajross wrote:
               | But awk is never used alone. You don't solve whole
               | problems with awk, you squish it into a script with a
               | bunch of other junk. My point is that you're making an
               | apples-to-oranges comparison. Sure, "awk" isn't the
               | problem, but "bash" is, and bash is _undeniably_ a more
               | error-prone language than perl. You surely agree with
               | that much, right?
               | 
               | And if you disallow "bash" for security reasons, where
               | does that leave "awk" in the category of useful tools?
               | See my point?
        
               | pletnes wrote:
               | You're right. But the alternative might be bash + perl or
               | just bash. Or none of them. Perl is anyway the first one
               | to go.
        
               | TeMPOraL wrote:
               | Is it completely gone, or rather _just for you_ , blocked
               | by sysadmins who know Perl is the magic pixie dust for
               | total control, and want to keep it for themselves?
               | 
               | In Windows-land, compare how PowerShell access may be
               | restricted, and you won't be allowed to run macros in
               | Office, all while your computer is "managed" by a
               | horrible hodge-podge of PowerShell and VBA scripts that
               | make Perl code look like high literature.
        
               | coliveira wrote:
               | Just use awk for what it was designed: text search and
               | substitution. You can run shell scripts along with awk,
               | but that is clearly not what you should be doing if you
               | want to design secure systems. The first rule of security
               | is not to abuse your tools.
        
         | jsyolo wrote:
         | Not being snarky, why not python over perl? what makes perl
         | better for scripts?
        
           | inejge wrote:
           | _what makes perl better for scripts?_
           | 
           | Having an implicit line- and field-splitting loop for
           | standard input with a couple of command-line switches. (Awk
           | doesn't even need switches, but is cumbersome if you need
           | initial state.) This covers a _lot_ of use-cases. Also, very
           | compact and powerful regular expressions.
        
           | e63f67dd-065b wrote:
           | Perl is much more terse for one-liners and has much more
           | built-in for doing text processing in scripts. Stuff like
           | implicit read loop, field separation, etc. I would say
           | they're suitable for different jobs: if a perl script grows
           | beyond a hundred lines (you can do a surprising amount in
           | that space!), then Python may be the right tool.
           | 
           | Perl is also much more of a known target: some version of it
           | exists on basically every single Unix, and the language
           | really hasn't changed that much in the past decade. I have
           | SSH'ed into multiple CentOS 6/SLES 11 (released 2009, and
           | granted mostly to rescue data off them) servers in the past 2
           | years, and perl is just much more of a known target to write
           | things against than whatever python release is on that
           | system.
        
           | reddit_clone wrote:
           | Perl is a progression of that particular environment. It is a
           | superset of shell/grep/awk/sed.
           | 
           | A shell command works exactly as you would expect copied
           | literally inside a backquote. With all the other goodies of a
           | real programming langauge.
           | 
           | Doing this in Python (to me atleast) seems unnatural.
        
       | qalmakka wrote:
       | Awk is fine and dandy but, like wity Sed, I think that it's
       | almost always replaceable with Perl which is way nicer to use,
       | and ubiquitous. Every OS (except Windows) I laid my hands on in
       | the last 15 years has had Perl installed in either its default
       | install or pulled in as a dependency almost immediately (a LOT of
       | stuff depends on Perl in any Unix system).
       | 
       | This is, unless you are running on an embedded environment, but
       | in that case you are stuck with something like busybox's Awk
       | which is way more limited than gawk...
        
         | coliveira wrote:
         | The difference is that learning pearl is an ordeal that will
         | take several weeks at the minimum. Learning awk can be done in
         | one afternoon, after reading a man page and a few examples. And
         | it really works for the tasks it was designed. So I think awk
         | is superior to perl for the purpose it was created.
        
           | thesuperbigfrog wrote:
           | >> The difference is that learning pearl is an ordeal that
           | will take several weeks at the minimum.
           | 
           | I am not sure about pearl, but Perl is not that different
           | from most other programming languages. If you are familiar
           | with Javascript or Python, learning the basics of Perl is
           | pretty easy:
           | 
           | https://perldoc.perl.org/perlintro
           | 
           | https://www.perltutorial.org/
           | 
           | Perl is designed for text processing, so it has a powerful
           | regular expression engine. Writing regular expressions can be
           | difficult, but it is a great skill to have in your toolkit.
           | 
           | Fun Fact: If the programming language you are using has
           | support for regular expressions, they are almost certainly
           | Perl-compatible regular expressions because Perl's regular
           | expression syntax is more widely used and more popular than
           | other regular expression syntaxes (e.g. POSIX, etc.).
        
         | [deleted]
        
         | hereonout2 wrote:
         | The obligatory perl replaced awk decades ago comment.
         | 
         | Soon to be followed by the ones saying nobody should be writing
         | shell scripts at all anymore.
        
         | empthought wrote:
         | Perl is not nicer to use.
        
         | chasil wrote:
         | I was actually surprised to find mktime() in Busybox awk.
         | 
         | The big thing lacking there are the GAWK networking extensions.
        
         | delta_p_delta_x wrote:
         | > except Windows
         | 
         | On Windows, you use PowerShell.
        
           | layer8 wrote:
           | I prefer Cygwin.
        
             | horse_dung wrote:
             | I just finished writing a "dumb stuff with containers"
             | internal blog which included:
             | 
             | C:\> type somefile.txt | docker run --rm -i ubuntu awk
             | 'something' > output.txt
        
           | swores wrote:
           | They were talking about whether or not the OS comes with Perl
           | by default, not whether it has a CLI at all.
        
           | thesuperbigfrog wrote:
           | >> On Windows, you use PowerShell.
           | 
           | If you use Git for Windows (https://gitforwindows.org/), it
           | includes Perl.
           | 
           | Or you could install Strawberry Perl which is made for
           | Windows: https://strawberryperl.com/
        
         | asicsp wrote:
         | I wouldn't put Perl as easier to use, but it certainly is more
         | powerful and has a vast ecosystem. And it is more portable,
         | since there's no need to worry about GNU/BSD/etc variations.
         | 
         | And I wrote a book for Perl one-liners as well
         | (https://learnbyexample.github.io/learn_perl_oneliners/), which
         | I'm currently revising (like I did for the grep/sed/awk
         | ebooks).
        
         | temp_gnuser wrote:
         | I have a good story about this: My first time really working
         | with a great scientist, we were taking genetic papers and
         | making them code for improving analysis. I spent two days
         | writing a perl script before I finally got frustrated enough to
         | ask for help.
         | 
         | The first question he asked was "Did you email the author(s)?"
         | I said I hadn't and didn't want to bother this seemingly very
         | important scientist. He told me nonsense, that most of them
         | don't mind responding but he warned me to be terse and to the
         | point. I emailed the gentleman and told him what I was doing
         | and my issues, and asked him for some guidance. He sent me back
         | a one line awk-script that did everything all that perl was
         | failing to do!
         | 
         | Of course all that proves is I'm horrible at perl, but it was
         | an important moment in my life that showed me that even very
         | smart and important people are still just people, and that just
         | asking is often a great way to learn new things yourself, and
         | that sometimes you just need to step back and reconsider what
         | tools you are using. I am forever grateful that an awesome
         | geneticist who needed help bootstrapping tech infra took the
         | time to teach me, a greybeard sysadmin type, practical,
         | reproducible science, from paper to implimentation. I learned a
         | lot but the biggest downside is, after being heavily surrounded
         | by scientists in the workplace in most jobs since then, I find
         | companies without that difficult to work for.
        
       | ycombinete wrote:
       | Last week chat-gpt spat out some Awk for me for a generic linux
       | request. Was quite a pleasant surprise!
        
         | [deleted]
        
       | asicsp wrote:
       | Hello! Author here.
       | 
       | I am pleased to announce a new version of my "CLI text processing
       | with GNU awk" ebook.
       | 
       | Learn the `GNU awk` command step-by-step from beginner to
       | advanced levels with hundreds of examples and exercises. This
       | book will dive deep into field processing, show examples for
       | filtering features, multiple file processing, how to construct
       | solutions that depend on multiple records, how to compare records
       | and fields between two or more files, how to identify duplicates
       | while maintaining input order and so on. Regular Expressions will
       | also be discussed in detail.
       | 
       | Links:
       | 
       | * PDF/EPUB versions: https://learnbyexample.gumroad.com/l/gnu_awk
       | (free till 31-August-2023)
       | 
       | * Web version: https://learnbyexample.github.io/learn_gnuawk/
       | 
       | * Markdown source, example files, etc:
       | https://github.com/learnbyexample/learn_gnuawk
       | 
       | * Interactive TUI app for exercises:
       | https://github.com/learnbyexample/TUI-apps/blob/main/AwkExer...
       | 
       | Bundle offers:
       | 
       | * Magical one-liners
       | (https://learnbyexample.gumroad.com/l/oneliners/new_awk_relea...)
       | is $5 (normal price $15) -- grep, sed, awk, perl and ruby one-
       | liners bundle
       | 
       | * All Books Bundle (https://learnbyexample.gumroad.com/l/all-
       | books/new_awk_relea...) is $12 (normal price $32) -- all my 13
       | programming ebooks
       | 
       | I would highly appreciate it if you'd let me know how you felt
       | about this book. It could be anything from a simple thank you,
       | pointing out a typo, mistakes in code snippets, which aspects of
       | the book worked for you (or didn't!) and so on. Reader feedback
       | is essential and especially so for self-published authors. Happy
       | learning :)
       | 
       | ---
       | 
       | Previous discussions:
       | 
       | * Learn to use Awk with hundreds of examples
       | (https://news.ycombinator.com/item?id=15549318) -- _478 points,
       | Oct 2017, 116 comments_
       | 
       | * Show HN: An eBook with hundreds of GNU Awk one-liners
       | (https://news.ycombinator.com/item?id=22758217) -- _539 points,
       | April 2020, 48 comments_
        
         | mbwgh wrote:
         | Thanks for sharing this, it is pleasantly obvious you put a lot
         | of work into this. I especially like the TUI application!
         | 
         | Will the web version of this book remain free even after August
         | 31st?
        
           | asicsp wrote:
           | You're welcome and thanks for the feedback :)
           | 
           | Yeah, the web version is always free for all of my ebooks.
           | And you can find the markdown source on GitHub, for example: 
           | https://github.com/learnbyexample/learn_gnuawk/blob/master/g.
           | ..
           | 
           | I use `pandoc` to generate the PDF/EPUB versions from
           | markdown. See my blog post
           | https://learnbyexample.github.io/customizing-pandoc/ for
           | details.
        
         | swores wrote:
         | I'm curious to know how many people you get paying for
         | something like your "Magical one-liners", and also whether
         | you've ever experimented with a "choose to pay after" model?
         | 
         | I ask because it's the kind of thing that I can imagine finding
         | useful enough to pay $5 (or $15) for, but I can also imagine it
         | being something that contains nothing I don't already have
         | saved in my personal "one liners" file, so I'm not really
         | interested in paying to find out.
        
           | asicsp wrote:
           | I use Gumroad/Leanpub to sell my ebooks. As far as I know,
           | they don't support the "choose to pay after" model.
           | 
           | You can see the number of paid sales for the bundles under
           | the "I want this!" button. When the price is 0, it shows the
           | total of both paid/free users.
           | 
           | I started selling ebooks about 5 years back. Where I live, my
           | monthly living cost is just $150. While the first two years
           | of sales were just about enough to cover my costs, the last
           | three years have been much better - I can continue being
           | self-employed :)
        
             | Bjartr wrote:
             | > Where I live, my monthly living cost is just $150.
             | 
             | That's quite low! Mind if I ask where in the world that is?
        
               | asicsp wrote:
               | Outskirts of a second-tier city in southern India. I live
               | a modest lifestyle - no vehicles, desktop instead of
               | laptop, live alone etc.
        
       | swores wrote:
       | I suspect that anyone reading this thread is likely to be equally
       | interested in "Ask HN: Share a shell script you like" from a
       | fortnight ago (though at 78 comments, it didn't get as much
       | traction / comments as I hoped it would when I saw it)
       | 
       | https://news.ycombinator.com/item?id=37112991
        
         | asicsp wrote:
         | There was a similar discussion 5 months back:
         | https://news.ycombinator.com/item?id=35122780 _(332 points |
         | 328 comments)_
         | 
         | And here's one from last year:
         | https://news.ycombinator.com/item?id=32467957 _(374 points |
         | 294 comments)_
        
           | swores wrote:
           | Thanks :)
        
       | mpalmer wrote:
       | A few years back I decided to just get as capable as I could with
       | jq, which is fast and functional enough to cover 99% of awk/sed
       | use cases, plus cases you'd never want to touch with awk/sed.
       | 
       | No regrets!
        
       | ra1231963 wrote:
       | Never learned awk or committed esoteric cli incantations to
       | memory. Don't get me wrong, I can get around on the cli, but sed,
       | awk, etc just didn't seem like a good cost/benefit investment.
       | I'm also not a sysadmin.
       | 
       | Thankfully I waited long enough and LLMs can write them for me
       | better than I ever could.
        
       | healeycodes wrote:
       | Awesome! I've been meaning to replace my usage of
       | Python/JavaScript for tasks (which I believe) are more awk-
       | shaped.
        
         | SOLAR_FIELDS wrote:
         | As long as you don't care about unit test ability. Usually if
         | you bothered to write them in Python or JS you usually don't
         | want to regress back to shell stuff. You're already in a place
         | where you have a runtime available so you can do way more
         | stuff.
         | 
         | It's usually the opposite direction that you mentioned that you
         | want to go. You one liner some shell like awk to quickly get
         | shit done without worrying about a runtime being available to
         | you and then if you need it to be more robust and legible
         | because of testing etc or production grade you move to a proper
         | dynamic scripting environment
        
       | nologic01 wrote:
       | awk one-liners are a slam dunk. The tough question whether to
       | invest in more complex awk programming. Invariably some
       | processing task requires more complex logic and awk provides
       | that, but in the terse and arcane ways of early computing. Yet
       | reaching for a modern alternative is also an overhead, may not be
       | particularly intuitive either (hello pandas) and may even have
       | performance issues...
        
         | martincmartin wrote:
         | For me, the big problem is libraries. Even a personal file of
         | common functions doesn't seem that well supported, and there
         | just doesn't seem to be a way to get third party libraries.
         | 
         | When I start needing helper functions and splitting it into
         | multiple lines is usually when I reach for Python instead. And
         | then sigh, because my program will be 2 to 3 times bigger. Ruby
         | is a great awk replacement, but unless other people at your job
         | know it, you can't expect others to maintain it.
        
           | cb321 wrote:
           | I'd bet you could do a harness like
           | https://news.ycombinator.com/item?id=37292882 mentions but
           | with Python in like an hour and then you could stay in both
           | one syntax and more significantly in one library ecosystem.
           | Why, 3 such things may even already exist. :) The
           | syntax/semantics is not as optimized for 1-liner brevity, but
           | everything has trade-offs.
        
           | nmz wrote:
           | The problem of libraries is namespaces, since everything is
           | global its not worth it. and also incredibly problematic
           | (especially since match() actually sets a 2 globals)
           | fortunately gawk offers namespaces, it even has a flag for
           | loading libraries from a path { gawk -i inplace } has
           | replaced sed -i for me a couple of times. But yeah, its still
           | lacking.
           | 
           | PS: This is gawk only though, but awk -f
           | $awkmodules/mymodule.awk -f <(echo '') is an ok replacement,
           | even though its just concatenating files
        
         | SOLAR_FIELDS wrote:
         | I used to not like them pre chat gpt. But nowadays when you can
         | paste an arcane awk/sed illegible one liner into an AI and have
         | it describe step by step what it's doing in totally fine with
         | it now. I still don't like them as much as a few lines of
         | python for unit test ability reasons but sometimes you just
         | straight up don't need unit tests for some quick data munging
         | task
        
           | kagevf wrote:
           | > have it describe step by step what it's doing
           | 
           | And then building on the original "update the script to do
           | $thing" where $thing isn't obvious/trivial. It saves a lot of
           | time.
        
       | rubicks wrote:
       | I love awk. Enough to shill for this:
       | 
       | https://www.oreilly.com/library/view/effective-awk-programmi...
       | 
       | If TFA is an excerpt for a book forthcoming on dead-tree media,
       | then I'll be buying that one as well.
        
         | [deleted]
        
       | Galanwe wrote:
       | 99.9% of my awk use case is to split a line (a la "cut - d\ - f)
       | while discarding successive spaces.
       | 
       | e.g.:                   $ echo "key:     value" | awk '{print
       | $1}'         value
       | 
       | Open to a simpler replacement :-)
        
         | cb321 wrote:
         | You might consider:
         | https://github.com/c-blake/bu/blob/main/doc/cols.md
         | 
         | That's in Nim, though that may not be much a barrier. (There
         | may also be other tools in bu/ of interest.)
        
         | asicsp wrote:
         | Check out https://github.com/sstadick/hck and
         | https://github.com/theryangeary/choose - both are alternatives
         | for cut/awk and allows regex based split as well. Though, they
         | don't remove starting/ending whitespaces IIRC.
         | 
         | I wrote a script (https://github.com/learnbyexample/regexp-cut)
         | that uses `awk` to provide a `cut`-like tool with regex-based
         | split, negative index, etc. And this will take care of
         | starting/ending whitespaces as that's the default `awk`
         | behavior.
        
         | llimllib wrote:
         | You can do it with cut too:                   $ echo "key:
         | value" | cut -wf 2         value
         | 
         | but whether it's actually "simpler" is open to debate
         | 
         | edit: actually gnu cut lacks -w, so this is bsd-only. lol
         | computers, stick with awk
        
           | cb321 wrote:
           | Only with FreeBSD `cut`.. coreutils `cut` (Linux) is missing
           | -w as is at least OpenBSD?
           | 
           | EDIT: I see you discovered this. lol computers indeed. ;-)
        
           | rigelina wrote:
           | I can't tell you how many times I pipe in rev to put my text
           | where I want it for cut (then rev it again).
           | 
           | Abbreviated example, getting the service names from a k8s
           | cluster looks roughly like (actual command does a bit more
           | processing):
           | 
           | kubectl get deployments -o wide | rev | cut -d'=' -f1 | rev
           | 
           | But if it's just gobbling whitespace, xargs without a command
           | can be your friend.
           | 
           | $ echo "key: value" | cut -d: -f2 | xargs
           | 
           | value
           | 
           | My brain generally goes "rev sed head tail xargs cut tr ...
           | screw it, I'll use python ... someday I shall learn awk."
           | There's a young engineer on my team that knows awk, and I'm
           | envious.
        
             | SOLAR_FIELDS wrote:
             | You don't even need to know awk these days. Just say "how
             | to do x munging task" in ChatGPT and you'll get a one liner
             | that will be just as good as if you'd say there squinting
             | at man pages for 30 minutes
        
               | llimllib wrote:
               | this is exactly the sort of case where you get non-
               | portable bullshit you don't understand out of it! It
               | spits out something that works on BSD but not on GNU, you
               | put it in your script and _boom_ wonder why the thing
               | blew up in prod, and oh btw you also lack the ability to
               | debug it because you never understood it in the first
               | place
        
               | Galanwe wrote:
               | More importantly it's just mega slow.
               | 
               | The whole point of bash one liners is not to write short
               | bash, it's to type it in your shell to get a result
               | quickly.
               | 
               | Just typing the url to chatgpt in my browser I would have
               | had the time to write my one liner in she'll XD
        
               | SOLAR_FIELDS wrote:
               | I sincerely doubt unless you are writing shell all day
               | every day that you will get a decently complex working
               | one liner out faster than GPT4.
        
               | SOLAR_FIELDS wrote:
               | Meh, I don't think that is a problem endemic to shell
               | specifically. That's more of putting untested code in
               | production. The thing about the fact that is a one liner
               | is... if that happens and it's not portable, who cares?
               | You just turn around, paste that sucker back in, and say
               | make it work on OsFlavor2.06 and you get back something
               | that works. You don't even have to fully understand why
               | it isn't portable, you can just ask the AI and have it
               | explain why. If you wanted something battle tested in
               | prod that was readable and understandable you wouldn't be
               | using one line shell scripts in the first place,
               | regardless of whether they were written by an AI or not
        
             | Galanwe wrote:
             | Neat, but both your tricks (rev and xargs) are more for
             | getting the last word than getting the nth word.
             | 
             | For the sake of the argument, say I have the following
             | fixed output and want the sizes:                   $ ls -l
             | -rw-rw-r-- 1 userAAA group      588 Aug 29 00:25 file1
             | -rw-rw-r-- 1 userAA  groupB   11870 Aug 29 00:24 file2
             | -rw-rw-r-- 1 userA   groupBB   1166 Aug 28 23:56 file3
             | -rw-rw-r-- 1 user    groupBBB   195 Aug 28 23:56 file4
             | 
             | I would just do:                   $ ls -l | awk '{print
             | $5}'
        
           | version_five wrote:
           | What does -w do? This works without it, no?
           | 
           | Edit, found it, "use whitespace as the delimiter"
           | 
           | https://www.unix.com/man-page/FreeBSD/1/cut/
           | 
           | For most cases like the OP you'd know the delimiter anyway so
           | I don't think the absence is a big deal, and if not it would
           | be easy to use tr or sed to make it consistent
        
             | llimllib wrote:
             | the important thing is that it uses _consecutive_
             | whitespace as a the delimiter, so you'd have to use sed to
             | collapse all the whitespace down to one tab.
             | 
             | At that point, awk is vastly simpler
        
         | Narishma wrote:
         | I think you mean                   echo "key:     value" | awk
         | '{print $2}'
        
       | auselen wrote:
       | I did this golfing a while back: Drawing a heart with AWK -
       | https://gist.github.com/auselen/906a53b47a7d616b080dbef85eb8...
        
       | ymgch wrote:
       | What is better? Starting with awk or sed?
        
         | pphysch wrote:
         | I'd start with AWK because it is covers more use cases
        
         | asicsp wrote:
         | Depends on the task. Sed is typically used for search and
         | replace and Awk is better suited for field based processing.
         | Both these tools also have filtering features (regexp based,
         | line number based, range, etc).
         | 
         | See also: When to use grep, sed, awk, perl, etc
         | https://unix.stackexchange.com/q/303044
        
         | _ZeD_ wrote:
         | While each with their scope and idiosyncrasies ... they're
         | pretty similar, at least in the pattern matching part, and both
         | of them have a pretty internal "core" of functionalities that
         | is easy to grasp. so the honest answer IMHO is ... "both"
        
         | chasil wrote:
         | A sed binary is usually much smaller than an awk binary, either
         | POSIX or GNU. The memory footprint of sed will be much more
         | compact.
         | 
         | However, sed has grown out of the command language used by the
         | tty editors, and is more difficult to program (although it is
         | Turing-complete).
         | 
         | The awk language implements much of the syntax of C, and it is
         | not difficult to write a very slow and inefficient script. This
         | inefficiency is harder to reach in sed, because it takes more
         | effort to abuse it.
         | 
         | O'Reilly's book on sed and awk is available free online, both
         | to browse and to download as a ZIP.
         | 
         | https://docstore.mik.ua/orelly/unix/index.htm
        
           | Anthony-G wrote:
           | For those who care about copyright, that URL is not from
           | O'Reilly; it's a copy of a book-set that O'Reilly used to
           | distribute via CD-ROM - with a nice user interface that used
           | web technologies (even included a search feature). O'Reilly
           | could make it available for free - as they've done so for
           | other books such as _Apache Security_ 1 or _Using Samba_ 2 -
           | but they still (as is their right) expect you to pay for the
           | _sed & awk_ book3.
           | 
           | 1 https://blog.ivanristic.com/2015/02/apache-security-ten-
           | year...
           | 
           | 2 https://www.oreilly.com/openbook/samba/book/
           | 
           | 3 https://www.oreilly.com/library/view/sed-awk/1565922255/
        
         | alpaca128 wrote:
         | If you're familiar with Vim's search/replace syntax you already
         | know how to use "sed -e" to replace text, that's how I got into
         | it.
        
         | e63f67dd-065b wrote:
         | The old sysadmin in me says to forget both and just learn Perl.
        
         | whartung wrote:
         | Depends on your use case. I can't speak to sed, I don't know it
         | very well. Awk is my SAK.
         | 
         | But I learned awk while sitting in an office at a client site.
         | I forget the specific scenario, but I wanted to split up some
         | files into some other files. I didn't even know awk, but
         | grokked enough from the man page to let me do what I wanted to
         | do. I can't even say what provoked me to turn to awk in the
         | first place. I do know I ran into some internal open file
         | limits, but worked around that.
         | 
         | If you want to tear files apart, or summarize them in some way,
         | or push the fields around, awk is much better. sed is an
         | editor. If I have a sed scenario, I'm more apt to just do it in
         | vi and save the result than stitch together some pipeline with
         | sed.
         | 
         | Most of my use cases are one off processing and analysis. I've
         | never had any workflows that relied on awk or most anything
         | like that. It was almost all throw away code, a tool on the
         | workbench, not the production line.
        
         | radiator wrote:
         | Start with sed. It can take you a long way and it is more
         | succint.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-08-28 23:01 UTC)