[HN Gopher] CLI text processing with GNU awk
___________________________________________________________________
CLI text processing with GNU awk
Author : asicsp
Score : 369 points
Date : 2023-08-28 06:02 UTC (16 hours ago)
(HTM) web link (learnbyexample.github.io)
(TXT) w3m dump (learnbyexample.github.io)
| rottc0dd wrote:
| confession: plug
|
| I once wrote a diff2html script ported from bash and it was much,
| much faster (for obvious reasons). And awk makes it much more
| readable than bash script. And I could learn the language, debug,
| understand bugs and fix them in a night.
|
| Not sure, if it is idiomatic way to awk, but have to say it is a
| really nice language.
|
| https://github.com/berry-thawson/diff2html/blob/master/diff2...
| thangngoc89 wrote:
| I have been using ChatGPT for generating these kind of small CLI
| like this. My prompts look like this: - use jq
| to count a nested array "a.b.c.d" - find and delete empty
| folders using `find` - find and replace text using
| sed/awk
|
| I found that using ChatGPT for these purposes boosted my
| productivity tremendously.
| simonw wrote:
| My usages of tools like awk, sed and Bash scripting has
| increased an enormous amount thanks to ChatGPT/GPT-4.
|
| I'm using those on a weekly basis now, because I don't have to
| memorize details of entirely new programming languages in order
| to apply them to small problems.
|
| Smaller languages that I never took the time to learn are no
| longer something I avoid. I even use AppleScript now!
| https://til.simonwillison.net/gpt3/chatgpt-applescript
| [deleted]
| dotancohen wrote:
| ChatGPT is a great time saver for those who already know how to
| use awk. But it should not be used by those who are unfamiliar.
|
| Just an example, I saw someone come up with a great awk line to
| change some text in a nested directory. He then pasted into
| bash. Only once the server went down did anybody realize that
| he forgot to cd into the proper directory and he wiped out not
| only the server config but also all the user-uploaded data as
| well.
|
| The server config was not version controlled and the user data
| had not been backed up in almost a week.
| swores wrote:
| That's not really a ChatGPT issue, people pasting in slightly
| wrong commands (or right commands in the wrong folder) is a
| tale as old as time - well, as old as linux at least. Short
| of saying that nobody who's already an expert should ever
| touch a CLI, the lesson from that story is "be as careful as
| possible, then be more careful, and also have backups of
| everything" not "don't use a LLM to help".
| simonw wrote:
| Yeah, that exact same problem could easily affect someone
| who spent hours cobbling together the same awk script from
| Google searches and StackOverflow.
| mistercheph wrote:
| What "care" do you suggesst that someone pasting in a
| script they don't understand should take?
| swores wrote:
| I'm far from an expert so you should probably ask someone
| other than me. But my two cents would be not to paste any
| code until you have understood it, or unless it's written
| by a source you trust, or alternatively only paste it
| somewhere you don't care - when I'm playing around
| testing stuff I might not fully understand on a linux
| server I do it on a VPS that's unimportant to me, and
| that if I mess it up I can very easily restore it back to
| a clean OS install and I have a bash script ready to
| reinstall all software I want & all the profile
| customisations etc.
| [deleted]
| mplanchard wrote:
| I love awk, and I find myself reaching for it a fair bit. One of
| the main things I use it for is "sed with state," so for things
| like matching on a line, but only if it was preceded by some
| other line. I find this to be really useful for creating one-off
| linters, for example I made one recently to check all our
| migration files for CREATE INDEX without CONCURRENTLY on a
| particular set of very large tables where it would cause issues.
| Since sql statements can be spread over multiple lines, it was
| difficult to write a straightforward match, but awk can track
| state like "I'm in a create statement," "I'm creating an index,"
| etc. across multiple lines, which allowed me to cobble together
| something that has worked well for about a year now.
| noloblo wrote:
| Can you share this example of tracking state of sql with awk?
| mplanchard wrote:
| Sure! I posted a gist here, stripped of anything particular
| to our company: https://gist.github.com/mplanchard/07229d61bd
| 32ce73624d9003c...
| jmholla wrote:
| One of these days I need to get around to learning awk. In the
| meantime, I've learned some of the deeper, stateful, features
| of sed. For instance, you mentioned wanting to only output a
| line if it was preceded by another. Here's a sed command that
| does so: sed -ne 'x' -e '/PREV/ {x; /CURR/ p;
| x}' > echo -e "PREV\nCURR\nCURR\nCURR\nPREV\nRED"
| | sed -ne 'x' -e '/PREV/ {x; /CURR/ p; x}' CURR
|
| This uses sed's hold buffer. I'll break it down:
| sed -n
|
| The `-n` tells sed no to print anything out. By default, sed
| prints out whatever is left when processing. We'll tell it with
| the `p` command when to do so. sed -ne 'x'
|
| `-e` indicates we are specifying one of the scripts sed will
| execute. The command `x` switches the current line with
| whatever is in the hold buffer. We'll do this on every line.
| sed -ne 'x' -e '/PREV/
|
| The next command will only run on lines that contain `PREV`.
| But, because we've been putting lines in the hold buffer, we'll
| only execute on lines after `PREV` when it has been switched
| out of the hold buffer. sed -ne 'x' -e
| '/PREV/ { ... }'
|
| The braces indicate all commands should be run when we see this
| match. sed -ne 'x' -e '/PREV/ { x; ... }'
|
| First, we switch the hold buffer with the line buffer.
| sed -ne 'x' -e '/PREV/ { x; /CURR/ p; ... }'
|
| Then, we only print out the line if it contains CURR.
| sed -ne 'x' -e '/PREV/ {x; /CURR/ p; x}'
|
| Finally, we switch them back in case there is overlap in our
| matches. (Give `echo -e
| "PREV\nPREVCURR\nCURR\nCURR\nPREV\nRED"` a try with this.)
|
| All that said, I'm pretty sure the `awk` script is much simpler
| and more direct, but I wanted to share how one might accomplish
| this was sed.
|
| The time I spent learning this probably would've been better
| spend on awk, but this tutorial[0], was so good and so easy, it
| taught me nearly everything I know about sed.
|
| [0]: https://www.grymoire.com/Unix/Sed.html
| mbivert wrote:
| > One of these days I need to get around to learning awk
|
| Plan9's awk(1)[0] man page provides a precise and concise (a
| few paragraphs) presentation of the core features of all awk
| implementations.
|
| Tutorials bring practical knowledge, but often lack complete
| and self-contained descriptions of those nifty little tools.
|
| [0]: https://man.cat-v.org/plan_9/1/awk
| btschaegg wrote:
| I still maintain that "The AWK Programming Language" [1] is
| one of, if not _the_ best programming language book I 've
| read so far.
|
| It's short and to the point, has good examples, and cuts
| most of the usual fluff like "what is a variable?". Its
| base assumptions are: You know how to program, and you're
| here to learn AWK. Let's get to it.
|
| I dearly wish there'd be more books like it for other
| languages.
|
| [1]: https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
| mplanchard wrote:
| This is nifty, thanks for sharing! I had no idea that sed had
| a hold buffer, and it's very cool that you can swap it in and
| out within the sed command like that. It's funny, because I
| went essentially the opposite way that you did: I used to
| know sed and awk basics, but then I properly learned awk.
| Since then my sed has atrophied a bit, and I still only know
| the basics. I'll have to run through that tutorial you linked
| tyingq wrote:
| One somewhat not-well-known thing with gawk is that it typically
| ships with some useful extensions that give you access to things
| like readdir(), ord(), chr(), gettimeofday(), sleep(), etc.
|
| https://www.gnu.org/software/gawk/manual/html_node/Extension...
| cb321 wrote:
| Of possible interest - instead of making a whole new programming
| language like awk, you can also just systematize generating code
| for an existing one with a command-line harness.
|
| This can even stay terse & keep a fairly fast edit-test
| turnaround in a fully statically typed language like Nim:
| https://github.com/c-blake/bu/blob/main/doc/rp.md
| macintux wrote:
| Whole new language? Awk is 45 years old.
| cb321 wrote:
| I agree that "writing/learning a different" (what I meant) is
| more clear wording than "making a whole new".
|
| EDIT: and it is a fair counterpoint that any command with
| options is, in _some_ sense, also a different language one
| must learn. Learning API calls is _also_ a different language
| (at least nouns & verbs if not syntax). But that is all
| partly the point. awk did/does a programming language with
| different syntax where other alternatives might be enough.
| BaseballPhysics wrote:
| If you think that's a good idea, you don't understand why tools
| like awk/perl/sed/etc exist and are popular. They are, by
| design, optimized toward specific kinds of use cases.
|
| In fact, their dynamically typed nature is a perfect example of
| that since it's much easier to quickly manipulate strings in a
| language that isn't so strict, as they'll do more heavy lifting
| for you via automatic coercion while limiting extra
| syntax/boilerplate (which, granted, is less of a problem with
| modern type inference). That makes it a lot easier to toss
| together quick one-liners and glue code, which is where these
| tools shine in the first place.
|
| Hell, even something like python or ruby is just a little too
| structured for my taste when doing something quick and dirty,
| which is why I love perl as it can be unstructured if that's
| all I need, or I can create a more structured program if that's
| what the problem requires.
| cb321 wrote:
| It's just a different & in my experience often neglected
| point in a similar design space (as that initial, linked text
| argues). Your tastes & use cases are your own. Almost
| everything "all depends" upon so very much in computer
| systems & in life.
|
| To add some more color, Nim is also a very adaptable
| prog.lang. I believe there are converts from Perl in its fan
| base. Nim's creator long ago recreated some Perl in Nim:
| https://nim-lang.org/araq/perlish.html
|
| Anyway, it's a different set of trade-offs to consider which
| I thought some reading about learning awk with open minds
| might find interesting. That's all, really.
| kazinator wrote:
| So Awk is a whole _new_ language, but Nim isn 't?
| cb321 wrote:
| I never said Nim was unique | older than awk. While I cannot
| make you read my cousin comment to understand I meant "new"
| as clarified-"different" [1] or click through any links, I
| can perhaps non-redundantly emphasize that the mentioned
| approach "works" not just for Nim, but for _any language_ ,
| _C_ & _Go_ (impls refd in mentioned `rp.md`), and _Python_ in
| another comment in this comment thread:
| https://news.ycombinator.com/item?id=37295399 (maybe even
| with `eval` there!)
|
| Only, the approach "works" with differing levels of "success"
| for different use cases / contexts. It is true (whichever)
| _shell language_ is still there to differ in shell 1-liner
| cases. That is _also_ true of sed / awk / perl / ... If you
| don't want to click through on `rp.md`, you could also read
| Ben Hoyt's article on his Prig if you like:
| https://benhoyt.com/writings/prig/ discussed on HN a while
| back https://news.ycombinator.com/item?id=30498735
|
| It's not actually _that_ different from your `cppawk` that
| you mention elsethread.. just maybe rotated 27 degrees away
| in "idea space". ;-)
|
| [1] https://news.ycombinator.com/item?id=37293475
| kazinator wrote:
| I maintain a minor side interest in Awk, along side Lisp and
| other things.
|
| I developed cppawk in 2022:
| https://www.kylheku.com/cgit/cppawk/about/
|
| cppawk extends Awk with preprocessing.
|
| There is a loop macro that supports a vocabularly of clauses.
| Clauses can be combined for parallel and cross-product iteration.
| And they are user-extensible. By writing five simple macros, you
| can define a new clause.
|
| Something potentially useful if you use Awk.
|
| Cppawk is documented with multiple man pages, and covered by unit
| tests which run with gawk and mawk.
| e63f67dd-065b wrote:
| Perhaps my old sysadmin hat is showing through, but I don't quite
| see what the advantage of awk is over just writing the same thing
| in perl. I've seen my fair share of horrendous shell scripts from
| junior sysadmins, and every time I think to myself "the text
| processing portion would be so much cleaner in Perl".
| asicsp wrote:
| If you are comparing Awk vs Perl for scripts, I'd prefer Perl
| (or Python).
|
| This post is about short one-liners for ad hoc use cases. I
| prefer sed/awk over Perl for such cases. Though, if you already
| know Perl, you could continue using it instead of having to
| learn more tools.
| SOLAR_FIELDS wrote:
| Do all systems still come with Perl baked in these days? If
| so I could see reaching for that over awk/sed. If I have to
| install a runtime I may as well just reach for Python
| thesuperbigfrog wrote:
| >> Do all systems still come with Perl baked in these days?
|
| If you use Git for Windows (https://gitforwindows.org/), it
| includes Perl.
| btschaegg wrote:
| ...and gawk :)
| thesuperbigfrog wrote:
| Yes. Frequently any tool set that has gawk will also
| include sed, perl, cut, head, tail, less, vi / vim, etc.
|
| It is nice that Git for Windows includes bash and all
| these tools.
| pphysch wrote:
| A bunch of the default git extensions are written with
| Perl, so you will find some version of it available on most
| modern Linux systems
| whartung wrote:
| Awks super power, and the reason I mostly use it, is it's free
| read loop, free field splitting, and the pattern/condition
| matching model.
|
| As a LANGUAGE, it's "eh". It just happens to be "good enough".
|
| You can, of course, do all of that with Perl. But then I have
| to write all that boiler plate I get with awk for free. And the
| gains in Perls language aren't enough, for me, to dump awk. And
| I don't use it for "scripting", I use it for data processing,
| tearing up files for mostly one off tasks. So I don't miss
| Perls depth. If I want depth, I'll go somewhere else.
| e63f67dd-065b wrote:
| > free read loop
|
| perl -n
|
| > free field splitting
|
| perl -a ... $F[1/2/3/etc]
|
| > pattern/condition matching model
|
| Not quite sure what you mean, but `perl -lane 'print if
| /abc/'` is might be what you're looking for
|
| The boilerplate can be mostly eliminated with the magic
| incantation of `perl -lane`. The trick that makes all this
| work is that perl defines a whole bunch of pre-defined
| variables and populates them with things that might be
| helpful (see $_, @F, etc).
| zeteo wrote:
| If you're in perl all the time that probably makes a lot of
| sense. For me, awk is one of the few languages that I can
| safely set aside for months and then I'm back up to speed in 10
| minutes. There's just something very intuitive about it, and it
| somehow fits very naturally with other common command line
| tools.
| BaseballPhysics wrote:
| Weird, I feel the same way about Perl.
| SomeoneFromCA wrote:
| awk is more intuitive for sure, for a regular javascript coder
| ajross wrote:
| There aren't any technical advantages, no. Perl's features are
| a proper superset of awk (by design!).
|
| What's happened is that Kids Today (tm) never learned perl. So
| they're discovering awk as someone new to the idea of stream
| processing. And awk was a great idea for that, and it
| represented a genuine innovation worth emulating.
|
| In the late 1970's. Then of course perl did emulate and surpass
| it. But then got forgotten. So kids are discovering awk
| instead. It's a little cringe, really.
| arp242 wrote:
| I'm one of those "kids these days" but did actually learn to
| program Perl at some point, and I generally prefer AWK. Perl
| is a large and complex language, I don't need it that often,
| and I'm not smart enough to keep remembering all of it.
|
| Now, if I would get hired as a full-time Perl developer and
| spent 2 years developing Perl: it would perhaps be different.
| But that's not the case, and isn't for most people.
|
| For better or worse, Perl sees a lot less usage than it once
| did; I rarely encounter it "in the wild" and don't even have
| it on my laptop because nothing needs it.
| macintux wrote:
| For simple uses cases, I find awk simpler than Perl. I love
| Perl, have written tens of thousands of lines, but on the CLI
| I prefer awk. I'm sorry I "cringe" you.
| hereonout2 wrote:
| Perl came out in the late 80s and by the mid 2000's was
| really on its way out? I hired for my last perl position in
| around 2006.
|
| Just saying, your definition of "kids today" could well
| include a decent portion of developers under 45 years old.
| Referring to this cohort repeatedly as "kids" is also a
| little cringe.
| ajross wrote:
| Did you really just reply to a comment that used the phrase
| "Kids Today (tm)" and try to interpret it as a genuine
| insult? The inability of this community to understand
| straightforward humor amazes me. Dude, it was a joke. And
| _yes_ , I was calling mid-career professionals "kids".
| Deliberately. Because I'm old. And it's funny.
| sureglymop wrote:
| Why is that cringe? They genuinely probably came across awk
| before perl (I know I did, I read "The AWK Programming
| Language" and then went on to "The C Programming Language").
| Having that said, awk is great and it's been the same for
| decades and available on every system (the same can't really
| be said about perl).
| thesuperbigfrog wrote:
| >> awk is great and it's been the same for decades and
| available on every system (the same can't really be said
| about perl).
|
| The only issue with AWK is that there are many
| implementations and they are not always compatible with one
| another:
|
| https://www.gnu.org/software/gawk/manual/html_node/Other-
| Ver...
|
| I have ported AWK scripts from legacy Unix systems to Linux
| and ran into incompatibilities that required some
| adjustments to the scripts.
|
| Curious: what systems have AWK, but do not have Perl?
| coliveira wrote:
| Awk is a useful language that you can learn in one afternoon,
| after reading the man page and a few examples. And then you
| can spend your whole life using it for several projects. You
| cannot do that with perl. That's why awk has a longer shelf
| life than perl.
| skinkestek wrote:
| > And then you can spend your whole life using it for
| several projects. You cannot do that with perl. That's why
| awk has a longer shelf life than perl.
|
| A Perl developer would of course say you have this
| completely backwards and even if I haven't programmed Perl
| much, or even at all for the last decade I would tend to
| agree.
| pletnes wrote:
| Remove perl, have less security issues. Some scanning tools
| flag it, too. Awk is found in more places, in my experience.
| ajross wrote:
| Sorry, but that's ridiculous. Any general purpose programming
| language is a vector for bugs and security problems, but come
| on: you're genuinely trying to say that a kludge of
| bash+sed+awk is objectively more "secure" than a single perl
| script to solve the same problem?
| coliveira wrote:
| In the case of awk, actually yes, it is safer. The reason
| is that awk is a very limited language. It has only enough
| functionality to provide text matching and substitution. It
| is very difficult to use awk to do anything of high
| security risk, compared to a language like perl.
| rascul wrote:
| > It has only enough functionality to provide text
| matching and substitution.
|
| Gawk at least can do a lot more than that. Reading and
| writing files, network communications, and run arbitrary
| shell commands, for example. It's certainly not as
| powerful as perl but it's also not limited to just text
| matching and substitution.
|
| Edit: figured I would provide some examples. Here's an
| http server and a first person shooter in gawk. Maybe not
| so practical but they show some of gawk's capabilities.
|
| https://github.com/kevin-albert/awkserver
|
| https://github.com/TheMozg/awk-raycaster
| vbrandl wrote:
| There is a virus written in awk that infects other awk
| scripts[0]. And according to wikipedia, the language is
| Turing complete.
|
| [0]: https://github.com/SPTHvx/ezines/tree/main/dc5/CODES
| /Perfori...
| ajross wrote:
| But awk is never used alone. You don't solve whole
| problems with awk, you squish it into a script with a
| bunch of other junk. My point is that you're making an
| apples-to-oranges comparison. Sure, "awk" isn't the
| problem, but "bash" is, and bash is _undeniably_ a more
| error-prone language than perl. You surely agree with
| that much, right?
|
| And if you disallow "bash" for security reasons, where
| does that leave "awk" in the category of useful tools?
| See my point?
| pletnes wrote:
| You're right. But the alternative might be bash + perl or
| just bash. Or none of them. Perl is anyway the first one
| to go.
| TeMPOraL wrote:
| Is it completely gone, or rather _just for you_ , blocked
| by sysadmins who know Perl is the magic pixie dust for
| total control, and want to keep it for themselves?
|
| In Windows-land, compare how PowerShell access may be
| restricted, and you won't be allowed to run macros in
| Office, all while your computer is "managed" by a
| horrible hodge-podge of PowerShell and VBA scripts that
| make Perl code look like high literature.
| coliveira wrote:
| Just use awk for what it was designed: text search and
| substitution. You can run shell scripts along with awk,
| but that is clearly not what you should be doing if you
| want to design secure systems. The first rule of security
| is not to abuse your tools.
| jsyolo wrote:
| Not being snarky, why not python over perl? what makes perl
| better for scripts?
| inejge wrote:
| _what makes perl better for scripts?_
|
| Having an implicit line- and field-splitting loop for
| standard input with a couple of command-line switches. (Awk
| doesn't even need switches, but is cumbersome if you need
| initial state.) This covers a _lot_ of use-cases. Also, very
| compact and powerful regular expressions.
| e63f67dd-065b wrote:
| Perl is much more terse for one-liners and has much more
| built-in for doing text processing in scripts. Stuff like
| implicit read loop, field separation, etc. I would say
| they're suitable for different jobs: if a perl script grows
| beyond a hundred lines (you can do a surprising amount in
| that space!), then Python may be the right tool.
|
| Perl is also much more of a known target: some version of it
| exists on basically every single Unix, and the language
| really hasn't changed that much in the past decade. I have
| SSH'ed into multiple CentOS 6/SLES 11 (released 2009, and
| granted mostly to rescue data off them) servers in the past 2
| years, and perl is just much more of a known target to write
| things against than whatever python release is on that
| system.
| reddit_clone wrote:
| Perl is a progression of that particular environment. It is a
| superset of shell/grep/awk/sed.
|
| A shell command works exactly as you would expect copied
| literally inside a backquote. With all the other goodies of a
| real programming langauge.
|
| Doing this in Python (to me atleast) seems unnatural.
| qalmakka wrote:
| Awk is fine and dandy but, like wity Sed, I think that it's
| almost always replaceable with Perl which is way nicer to use,
| and ubiquitous. Every OS (except Windows) I laid my hands on in
| the last 15 years has had Perl installed in either its default
| install or pulled in as a dependency almost immediately (a LOT of
| stuff depends on Perl in any Unix system).
|
| This is, unless you are running on an embedded environment, but
| in that case you are stuck with something like busybox's Awk
| which is way more limited than gawk...
| coliveira wrote:
| The difference is that learning pearl is an ordeal that will
| take several weeks at the minimum. Learning awk can be done in
| one afternoon, after reading a man page and a few examples. And
| it really works for the tasks it was designed. So I think awk
| is superior to perl for the purpose it was created.
| thesuperbigfrog wrote:
| >> The difference is that learning pearl is an ordeal that
| will take several weeks at the minimum.
|
| I am not sure about pearl, but Perl is not that different
| from most other programming languages. If you are familiar
| with Javascript or Python, learning the basics of Perl is
| pretty easy:
|
| https://perldoc.perl.org/perlintro
|
| https://www.perltutorial.org/
|
| Perl is designed for text processing, so it has a powerful
| regular expression engine. Writing regular expressions can be
| difficult, but it is a great skill to have in your toolkit.
|
| Fun Fact: If the programming language you are using has
| support for regular expressions, they are almost certainly
| Perl-compatible regular expressions because Perl's regular
| expression syntax is more widely used and more popular than
| other regular expression syntaxes (e.g. POSIX, etc.).
| [deleted]
| hereonout2 wrote:
| The obligatory perl replaced awk decades ago comment.
|
| Soon to be followed by the ones saying nobody should be writing
| shell scripts at all anymore.
| empthought wrote:
| Perl is not nicer to use.
| chasil wrote:
| I was actually surprised to find mktime() in Busybox awk.
|
| The big thing lacking there are the GAWK networking extensions.
| delta_p_delta_x wrote:
| > except Windows
|
| On Windows, you use PowerShell.
| layer8 wrote:
| I prefer Cygwin.
| horse_dung wrote:
| I just finished writing a "dumb stuff with containers"
| internal blog which included:
|
| C:\> type somefile.txt | docker run --rm -i ubuntu awk
| 'something' > output.txt
| swores wrote:
| They were talking about whether or not the OS comes with Perl
| by default, not whether it has a CLI at all.
| thesuperbigfrog wrote:
| >> On Windows, you use PowerShell.
|
| If you use Git for Windows (https://gitforwindows.org/), it
| includes Perl.
|
| Or you could install Strawberry Perl which is made for
| Windows: https://strawberryperl.com/
| asicsp wrote:
| I wouldn't put Perl as easier to use, but it certainly is more
| powerful and has a vast ecosystem. And it is more portable,
| since there's no need to worry about GNU/BSD/etc variations.
|
| And I wrote a book for Perl one-liners as well
| (https://learnbyexample.github.io/learn_perl_oneliners/), which
| I'm currently revising (like I did for the grep/sed/awk
| ebooks).
| temp_gnuser wrote:
| I have a good story about this: My first time really working
| with a great scientist, we were taking genetic papers and
| making them code for improving analysis. I spent two days
| writing a perl script before I finally got frustrated enough to
| ask for help.
|
| The first question he asked was "Did you email the author(s)?"
| I said I hadn't and didn't want to bother this seemingly very
| important scientist. He told me nonsense, that most of them
| don't mind responding but he warned me to be terse and to the
| point. I emailed the gentleman and told him what I was doing
| and my issues, and asked him for some guidance. He sent me back
| a one line awk-script that did everything all that perl was
| failing to do!
|
| Of course all that proves is I'm horrible at perl, but it was
| an important moment in my life that showed me that even very
| smart and important people are still just people, and that just
| asking is often a great way to learn new things yourself, and
| that sometimes you just need to step back and reconsider what
| tools you are using. I am forever grateful that an awesome
| geneticist who needed help bootstrapping tech infra took the
| time to teach me, a greybeard sysadmin type, practical,
| reproducible science, from paper to implimentation. I learned a
| lot but the biggest downside is, after being heavily surrounded
| by scientists in the workplace in most jobs since then, I find
| companies without that difficult to work for.
| ycombinete wrote:
| Last week chat-gpt spat out some Awk for me for a generic linux
| request. Was quite a pleasant surprise!
| [deleted]
| asicsp wrote:
| Hello! Author here.
|
| I am pleased to announce a new version of my "CLI text processing
| with GNU awk" ebook.
|
| Learn the `GNU awk` command step-by-step from beginner to
| advanced levels with hundreds of examples and exercises. This
| book will dive deep into field processing, show examples for
| filtering features, multiple file processing, how to construct
| solutions that depend on multiple records, how to compare records
| and fields between two or more files, how to identify duplicates
| while maintaining input order and so on. Regular Expressions will
| also be discussed in detail.
|
| Links:
|
| * PDF/EPUB versions: https://learnbyexample.gumroad.com/l/gnu_awk
| (free till 31-August-2023)
|
| * Web version: https://learnbyexample.github.io/learn_gnuawk/
|
| * Markdown source, example files, etc:
| https://github.com/learnbyexample/learn_gnuawk
|
| * Interactive TUI app for exercises:
| https://github.com/learnbyexample/TUI-apps/blob/main/AwkExer...
|
| Bundle offers:
|
| * Magical one-liners
| (https://learnbyexample.gumroad.com/l/oneliners/new_awk_relea...)
| is $5 (normal price $15) -- grep, sed, awk, perl and ruby one-
| liners bundle
|
| * All Books Bundle (https://learnbyexample.gumroad.com/l/all-
| books/new_awk_relea...) is $12 (normal price $32) -- all my 13
| programming ebooks
|
| I would highly appreciate it if you'd let me know how you felt
| about this book. It could be anything from a simple thank you,
| pointing out a typo, mistakes in code snippets, which aspects of
| the book worked for you (or didn't!) and so on. Reader feedback
| is essential and especially so for self-published authors. Happy
| learning :)
|
| ---
|
| Previous discussions:
|
| * Learn to use Awk with hundreds of examples
| (https://news.ycombinator.com/item?id=15549318) -- _478 points,
| Oct 2017, 116 comments_
|
| * Show HN: An eBook with hundreds of GNU Awk one-liners
| (https://news.ycombinator.com/item?id=22758217) -- _539 points,
| April 2020, 48 comments_
| mbwgh wrote:
| Thanks for sharing this, it is pleasantly obvious you put a lot
| of work into this. I especially like the TUI application!
|
| Will the web version of this book remain free even after August
| 31st?
| asicsp wrote:
| You're welcome and thanks for the feedback :)
|
| Yeah, the web version is always free for all of my ebooks.
| And you can find the markdown source on GitHub, for example:
| https://github.com/learnbyexample/learn_gnuawk/blob/master/g.
| ..
|
| I use `pandoc` to generate the PDF/EPUB versions from
| markdown. See my blog post
| https://learnbyexample.github.io/customizing-pandoc/ for
| details.
| swores wrote:
| I'm curious to know how many people you get paying for
| something like your "Magical one-liners", and also whether
| you've ever experimented with a "choose to pay after" model?
|
| I ask because it's the kind of thing that I can imagine finding
| useful enough to pay $5 (or $15) for, but I can also imagine it
| being something that contains nothing I don't already have
| saved in my personal "one liners" file, so I'm not really
| interested in paying to find out.
| asicsp wrote:
| I use Gumroad/Leanpub to sell my ebooks. As far as I know,
| they don't support the "choose to pay after" model.
|
| You can see the number of paid sales for the bundles under
| the "I want this!" button. When the price is 0, it shows the
| total of both paid/free users.
|
| I started selling ebooks about 5 years back. Where I live, my
| monthly living cost is just $150. While the first two years
| of sales were just about enough to cover my costs, the last
| three years have been much better - I can continue being
| self-employed :)
| Bjartr wrote:
| > Where I live, my monthly living cost is just $150.
|
| That's quite low! Mind if I ask where in the world that is?
| asicsp wrote:
| Outskirts of a second-tier city in southern India. I live
| a modest lifestyle - no vehicles, desktop instead of
| laptop, live alone etc.
| swores wrote:
| I suspect that anyone reading this thread is likely to be equally
| interested in "Ask HN: Share a shell script you like" from a
| fortnight ago (though at 78 comments, it didn't get as much
| traction / comments as I hoped it would when I saw it)
|
| https://news.ycombinator.com/item?id=37112991
| asicsp wrote:
| There was a similar discussion 5 months back:
| https://news.ycombinator.com/item?id=35122780 _(332 points |
| 328 comments)_
|
| And here's one from last year:
| https://news.ycombinator.com/item?id=32467957 _(374 points |
| 294 comments)_
| swores wrote:
| Thanks :)
| mpalmer wrote:
| A few years back I decided to just get as capable as I could with
| jq, which is fast and functional enough to cover 99% of awk/sed
| use cases, plus cases you'd never want to touch with awk/sed.
|
| No regrets!
| ra1231963 wrote:
| Never learned awk or committed esoteric cli incantations to
| memory. Don't get me wrong, I can get around on the cli, but sed,
| awk, etc just didn't seem like a good cost/benefit investment.
| I'm also not a sysadmin.
|
| Thankfully I waited long enough and LLMs can write them for me
| better than I ever could.
| healeycodes wrote:
| Awesome! I've been meaning to replace my usage of
| Python/JavaScript for tasks (which I believe) are more awk-
| shaped.
| SOLAR_FIELDS wrote:
| As long as you don't care about unit test ability. Usually if
| you bothered to write them in Python or JS you usually don't
| want to regress back to shell stuff. You're already in a place
| where you have a runtime available so you can do way more
| stuff.
|
| It's usually the opposite direction that you mentioned that you
| want to go. You one liner some shell like awk to quickly get
| shit done without worrying about a runtime being available to
| you and then if you need it to be more robust and legible
| because of testing etc or production grade you move to a proper
| dynamic scripting environment
| nologic01 wrote:
| awk one-liners are a slam dunk. The tough question whether to
| invest in more complex awk programming. Invariably some
| processing task requires more complex logic and awk provides
| that, but in the terse and arcane ways of early computing. Yet
| reaching for a modern alternative is also an overhead, may not be
| particularly intuitive either (hello pandas) and may even have
| performance issues...
| martincmartin wrote:
| For me, the big problem is libraries. Even a personal file of
| common functions doesn't seem that well supported, and there
| just doesn't seem to be a way to get third party libraries.
|
| When I start needing helper functions and splitting it into
| multiple lines is usually when I reach for Python instead. And
| then sigh, because my program will be 2 to 3 times bigger. Ruby
| is a great awk replacement, but unless other people at your job
| know it, you can't expect others to maintain it.
| cb321 wrote:
| I'd bet you could do a harness like
| https://news.ycombinator.com/item?id=37292882 mentions but
| with Python in like an hour and then you could stay in both
| one syntax and more significantly in one library ecosystem.
| Why, 3 such things may even already exist. :) The
| syntax/semantics is not as optimized for 1-liner brevity, but
| everything has trade-offs.
| nmz wrote:
| The problem of libraries is namespaces, since everything is
| global its not worth it. and also incredibly problematic
| (especially since match() actually sets a 2 globals)
| fortunately gawk offers namespaces, it even has a flag for
| loading libraries from a path { gawk -i inplace } has
| replaced sed -i for me a couple of times. But yeah, its still
| lacking.
|
| PS: This is gawk only though, but awk -f
| $awkmodules/mymodule.awk -f <(echo '') is an ok replacement,
| even though its just concatenating files
| SOLAR_FIELDS wrote:
| I used to not like them pre chat gpt. But nowadays when you can
| paste an arcane awk/sed illegible one liner into an AI and have
| it describe step by step what it's doing in totally fine with
| it now. I still don't like them as much as a few lines of
| python for unit test ability reasons but sometimes you just
| straight up don't need unit tests for some quick data munging
| task
| kagevf wrote:
| > have it describe step by step what it's doing
|
| And then building on the original "update the script to do
| $thing" where $thing isn't obvious/trivial. It saves a lot of
| time.
| rubicks wrote:
| I love awk. Enough to shill for this:
|
| https://www.oreilly.com/library/view/effective-awk-programmi...
|
| If TFA is an excerpt for a book forthcoming on dead-tree media,
| then I'll be buying that one as well.
| [deleted]
| Galanwe wrote:
| 99.9% of my awk use case is to split a line (a la "cut - d\ - f)
| while discarding successive spaces.
|
| e.g.: $ echo "key: value" | awk '{print
| $1}' value
|
| Open to a simpler replacement :-)
| cb321 wrote:
| You might consider:
| https://github.com/c-blake/bu/blob/main/doc/cols.md
|
| That's in Nim, though that may not be much a barrier. (There
| may also be other tools in bu/ of interest.)
| asicsp wrote:
| Check out https://github.com/sstadick/hck and
| https://github.com/theryangeary/choose - both are alternatives
| for cut/awk and allows regex based split as well. Though, they
| don't remove starting/ending whitespaces IIRC.
|
| I wrote a script (https://github.com/learnbyexample/regexp-cut)
| that uses `awk` to provide a `cut`-like tool with regex-based
| split, negative index, etc. And this will take care of
| starting/ending whitespaces as that's the default `awk`
| behavior.
| llimllib wrote:
| You can do it with cut too: $ echo "key:
| value" | cut -wf 2 value
|
| but whether it's actually "simpler" is open to debate
|
| edit: actually gnu cut lacks -w, so this is bsd-only. lol
| computers, stick with awk
| cb321 wrote:
| Only with FreeBSD `cut`.. coreutils `cut` (Linux) is missing
| -w as is at least OpenBSD?
|
| EDIT: I see you discovered this. lol computers indeed. ;-)
| rigelina wrote:
| I can't tell you how many times I pipe in rev to put my text
| where I want it for cut (then rev it again).
|
| Abbreviated example, getting the service names from a k8s
| cluster looks roughly like (actual command does a bit more
| processing):
|
| kubectl get deployments -o wide | rev | cut -d'=' -f1 | rev
|
| But if it's just gobbling whitespace, xargs without a command
| can be your friend.
|
| $ echo "key: value" | cut -d: -f2 | xargs
|
| value
|
| My brain generally goes "rev sed head tail xargs cut tr ...
| screw it, I'll use python ... someday I shall learn awk."
| There's a young engineer on my team that knows awk, and I'm
| envious.
| SOLAR_FIELDS wrote:
| You don't even need to know awk these days. Just say "how
| to do x munging task" in ChatGPT and you'll get a one liner
| that will be just as good as if you'd say there squinting
| at man pages for 30 minutes
| llimllib wrote:
| this is exactly the sort of case where you get non-
| portable bullshit you don't understand out of it! It
| spits out something that works on BSD but not on GNU, you
| put it in your script and _boom_ wonder why the thing
| blew up in prod, and oh btw you also lack the ability to
| debug it because you never understood it in the first
| place
| Galanwe wrote:
| More importantly it's just mega slow.
|
| The whole point of bash one liners is not to write short
| bash, it's to type it in your shell to get a result
| quickly.
|
| Just typing the url to chatgpt in my browser I would have
| had the time to write my one liner in she'll XD
| SOLAR_FIELDS wrote:
| I sincerely doubt unless you are writing shell all day
| every day that you will get a decently complex working
| one liner out faster than GPT4.
| SOLAR_FIELDS wrote:
| Meh, I don't think that is a problem endemic to shell
| specifically. That's more of putting untested code in
| production. The thing about the fact that is a one liner
| is... if that happens and it's not portable, who cares?
| You just turn around, paste that sucker back in, and say
| make it work on OsFlavor2.06 and you get back something
| that works. You don't even have to fully understand why
| it isn't portable, you can just ask the AI and have it
| explain why. If you wanted something battle tested in
| prod that was readable and understandable you wouldn't be
| using one line shell scripts in the first place,
| regardless of whether they were written by an AI or not
| Galanwe wrote:
| Neat, but both your tricks (rev and xargs) are more for
| getting the last word than getting the nth word.
|
| For the sake of the argument, say I have the following
| fixed output and want the sizes: $ ls -l
| -rw-rw-r-- 1 userAAA group 588 Aug 29 00:25 file1
| -rw-rw-r-- 1 userAA groupB 11870 Aug 29 00:24 file2
| -rw-rw-r-- 1 userA groupBB 1166 Aug 28 23:56 file3
| -rw-rw-r-- 1 user groupBBB 195 Aug 28 23:56 file4
|
| I would just do: $ ls -l | awk '{print
| $5}'
| version_five wrote:
| What does -w do? This works without it, no?
|
| Edit, found it, "use whitespace as the delimiter"
|
| https://www.unix.com/man-page/FreeBSD/1/cut/
|
| For most cases like the OP you'd know the delimiter anyway so
| I don't think the absence is a big deal, and if not it would
| be easy to use tr or sed to make it consistent
| llimllib wrote:
| the important thing is that it uses _consecutive_
| whitespace as a the delimiter, so you'd have to use sed to
| collapse all the whitespace down to one tab.
|
| At that point, awk is vastly simpler
| Narishma wrote:
| I think you mean echo "key: value" | awk
| '{print $2}'
| auselen wrote:
| I did this golfing a while back: Drawing a heart with AWK -
| https://gist.github.com/auselen/906a53b47a7d616b080dbef85eb8...
| ymgch wrote:
| What is better? Starting with awk or sed?
| pphysch wrote:
| I'd start with AWK because it is covers more use cases
| asicsp wrote:
| Depends on the task. Sed is typically used for search and
| replace and Awk is better suited for field based processing.
| Both these tools also have filtering features (regexp based,
| line number based, range, etc).
|
| See also: When to use grep, sed, awk, perl, etc
| https://unix.stackexchange.com/q/303044
| _ZeD_ wrote:
| While each with their scope and idiosyncrasies ... they're
| pretty similar, at least in the pattern matching part, and both
| of them have a pretty internal "core" of functionalities that
| is easy to grasp. so the honest answer IMHO is ... "both"
| chasil wrote:
| A sed binary is usually much smaller than an awk binary, either
| POSIX or GNU. The memory footprint of sed will be much more
| compact.
|
| However, sed has grown out of the command language used by the
| tty editors, and is more difficult to program (although it is
| Turing-complete).
|
| The awk language implements much of the syntax of C, and it is
| not difficult to write a very slow and inefficient script. This
| inefficiency is harder to reach in sed, because it takes more
| effort to abuse it.
|
| O'Reilly's book on sed and awk is available free online, both
| to browse and to download as a ZIP.
|
| https://docstore.mik.ua/orelly/unix/index.htm
| Anthony-G wrote:
| For those who care about copyright, that URL is not from
| O'Reilly; it's a copy of a book-set that O'Reilly used to
| distribute via CD-ROM - with a nice user interface that used
| web technologies (even included a search feature). O'Reilly
| could make it available for free - as they've done so for
| other books such as _Apache Security_ 1 or _Using Samba_ 2 -
| but they still (as is their right) expect you to pay for the
| _sed & awk_ book3.
|
| 1 https://blog.ivanristic.com/2015/02/apache-security-ten-
| year...
|
| 2 https://www.oreilly.com/openbook/samba/book/
|
| 3 https://www.oreilly.com/library/view/sed-awk/1565922255/
| alpaca128 wrote:
| If you're familiar with Vim's search/replace syntax you already
| know how to use "sed -e" to replace text, that's how I got into
| it.
| e63f67dd-065b wrote:
| The old sysadmin in me says to forget both and just learn Perl.
| whartung wrote:
| Depends on your use case. I can't speak to sed, I don't know it
| very well. Awk is my SAK.
|
| But I learned awk while sitting in an office at a client site.
| I forget the specific scenario, but I wanted to split up some
| files into some other files. I didn't even know awk, but
| grokked enough from the man page to let me do what I wanted to
| do. I can't even say what provoked me to turn to awk in the
| first place. I do know I ran into some internal open file
| limits, but worked around that.
|
| If you want to tear files apart, or summarize them in some way,
| or push the fields around, awk is much better. sed is an
| editor. If I have a sed scenario, I'm more apt to just do it in
| vi and save the result than stitch together some pipeline with
| sed.
|
| Most of my use cases are one off processing and analysis. I've
| never had any workflows that relied on awk or most anything
| like that. It was almost all throw away code, a tool on the
| workbench, not the production line.
| radiator wrote:
| Start with sed. It can take you a long way and it is more
| succint.
| [deleted]
___________________________________________________________________
(page generated 2023-08-28 23:01 UTC)