[HN Gopher] Awk: The Power and Promise of a 40-Year-Old Language
___________________________________________________________________
Awk: The Power and Promise of a 40-Year-Old Language
Author : jangid
Score : 215 points
Date : 2021-09-07 07:13 UTC (15 hours ago)
(HTM) web link (www.fosslife.org)
(TXT) w3m dump (www.fosslife.org)
| zeteo wrote:
| My company mandates Windows but Git Bash has been a backdoor into
| Unix tools and I've recently learned sed and awk to take full
| advantage of it. You need to think a bit about your one liners
| and they'll always feel very hacky, but sed/awk (with a bit of
| sort thrown in) are an amazingly powerful combination for dealing
| with all sorts of messy data dumps. In 10 minutes I can craft a
| one liner that replaces a 2 hours C# console app and runs just as
| fast. And, surprisingly, I often find it easier to go back months
| later and understand the messy looking one liner than the nicely
| formatted, well commented, unit tested console app.
| phkahler wrote:
| I never use Awk until last year. I wanted to monitor an embedded
| device with little more than bustbox and python on it. There was
| quite a bit of information in the log files (I had already
| written a custom log file viewer with some highlighting) but I
| wanted to monitor in real-time. Somehow I decided to use Awk to
| monitor the tail of the log file and do realtime bar-graphs by
| generating appropriate cursor control sequences. In the end I had
| about 50 lines of Awk to upload to the board and run a command to
| pipe the log into it - very minimally invasive and very
| informative.
|
| Would recommend learning Awk with some kind of real-world use of
| your own. BTW it reminded me of using XSLT which I think is
| another often overlooked "good thing".
| cogman10 wrote:
| The biggest reason to learn AWK, IMO, is that it's on pretty
| much every single linux distribution.
|
| You might not have perl or python. You WILL have AWK. Only the
| most minimal of minimal linux systems will exclude it. Even
| busybox includes awk. That's how essential it's viewed.
| jejones3141 wrote:
| Something fun in that regard, speaking of minimal...the
| TRS-80 Color Computer community now has a version of awk that
| runs on NitrOS-9, a variant of OS-9/6809 originally written
| for the Hitachi 6309. (64K address space, no separate I and D
| space.)
| michaelcampbell wrote:
| I'm curious what linux distros don't have either some version
| of perl or python.
|
| I like awk, mind, but this is not necessarily (IME) a good
| argument for it.
| cogman10 wrote:
| You'll find this a lot in the embedded space. As well,
| you'll see a bunch of docker images that don't have
| perl/python.
| selfhoster11 wrote:
| Building a Docker image gives basically full freedom over
| the choice of a runtime. If your Dockerized application
| is written in Java or Python or PHP or C#, why not just
| write the tooling and scripts in the same language too?
| Or at least install a suitable runtime just for the
| scripts? Or if starting from an empty container, why not
| build the script into a statically-linked binary to be
| placed next to the application?
| cogman10 wrote:
| Typically, you want docker images as slim as possible.
| Both to make it faster to distribute and to prevent
| attacks if something escapes your application. The less
| in the image, the less exploitable your image is.
|
| Beyond keeping the images slim, the times I'd reach for
| awk when dealing with a docker container would be when
| I'm debugging problems within that container. I might
| need to do some quick text parsing or finagling in order
| to troubleshoot why the application is sucking.
|
| I'd rather not need to upload a Java script into my
| docker container just for quick troubleshooting.
| selfhoster11 wrote:
| I agree on the slimness of Docker images, but if you e.g.
| have some kind of video or photo CMS written in PHP, then
| any housekeeping or export scripts etc are better off
| being written in PHP as well (or even integrated into the
| application) given how close they're already bound with
| the rest of the application.
|
| For anything beyond that, I would very greatly prefer to
| have "black box", extremely verbose log dumps and
| database dumps that I could analyse over at my actual dev
| machine, or a good debugger that lets me step through the
| code to figure out what's going wrong.
|
| I do realise that not all languages have good tooling, or
| that some people prefer to use `printf` style debugging,
| so it may not apply to all.
| jolmg wrote:
| > I'm curious what linux distros don't have either some
| version of perl or python.
|
| I imagine that DamnSmallLinux or TinyCoreLinux possibly
| don't have them by default. Their focus is to be as small
| as possible in order to download quickly and fit in a USB
| drive or CD. Their small size was more important back when
| speeds were slower and drives were smaller. They were also
| good for when you had a limited number of storage options
| and you wanted the running OS to fit completely in RAM
| (back when RAM was smaller).
| selfhoster11 wrote:
| I don't think I ever ran TinyCore without immediately
| connecting it to the Internet to grab a bunch of
| packages. Puppy Linux included Perl in its base install
| at one time (I don't know if it still does), and Damn
| Small Linux was supposed to have a cut-down version of
| Perl included as well.
| jolmg wrote:
| Python definitely not, though.
| selfhoster11 wrote:
| Yeah, but if you are happy to program in Perl, that's
| basically every major Linux distro covered. Anything
| using DEB or RPM packaging, any machine with Git
| installed (which includes Windows), plus the ones I
| already mentioned, already have access to Perl. This is a
| formidable installed base with no effort needed to
| install a runtime.
| jolmg wrote:
| I agree, but michaelcampbell's point seemed to be: why
| learn a language for its ubiquity, when more commonly
| used languages seem to be just as ubiquitous? So, I
| focused on how they're not _that_ ubiquitous.
| selfhoster11 wrote:
| I see what you mean. I guess what I was trying to say is
| that my position is close to that of michaelcampbell's,
| and that I wanted to emphasize how little portability is
| sacrificed by adopting this position on most environments
| one will ever work in.
| baktubi wrote:
| If you're using DamnSmallLinux etc I'd imagine you can
| package your own awk quite easily! Perl would require a
| lot more packages. But all you need to do is copy a
| couple binaries right?
| jolmg wrote:
| Haven't used these distros since a decade or so ago.
|
| Not sure why I'd have to package awk. Busybox's is
| probably sufficient for most uses, if the need ever
| arised, which I don't think it normally does when using
| these distros.
| baktubi wrote:
| Agreed. Not having enough space for awk would be daaaaamn
| small indeed.
| greggyb wrote:
| The POSIX specification includes awk, but not perl or
| python. The world of UNIX and UNIX-likes is larger than
| just Linux distributions. Depending on the utility you plan
| on building and the platforms you expect it to run on, it
| may be wiser to reach for awk than other PLs.
| pcwalton wrote:
| Modern BSDs, macOS, and Solaris certainly have Perl and
| Python. (iOS and Android don't, but they don't have awk
| either.) What other Unixes are you thinking of? AIX,
| HP/UX, IRIX, UnixWare, etc. should be considered
| retrocomputing at this point and not relevant to modern
| compatibility discussions.
|
| Linux distros based on busybox, as mentioned elsewhere in
| this thread, are a more compelling reason for considering
| awk than considerations involving other Unixes.
| cogman10 wrote:
| Wasn't awk added to android in 9?
| Aptrug wrote:
| Yep, https://android.googlesource.com/platform/system/cor
| e/+/mast...
| bdk0 wrote:
| You can install python and perl on BSDs, but its
| different than awk, where its part of the core OS and
| guaranteed to be there without needing to install extra
| stuff.
| BuildTheRobots wrote:
| The better question might be "which Linux distro's don't
| have perl or python installed by default" as a lot of
| people are working on systems where they can't just add
| additional packages.
|
| Perl has been getting cut from minimal builds of distro's
| for a while. Default installed version of python is a bit
| of a crap-shoot, nevermind which modules you might happen
| to have available.
| abecedarius wrote:
| A nice thing about awk vs. Perl/Python: there's a small
| focused set of things to learn. Once you learn them you're
| done.
|
| This suggests an opening for a Perl/Python intro focused on
| the exact same tasks, admittedly. That seems more realistic
| for Perl -- unless there's someone who writes Python one-
| liners at the shell?
| r-bar wrote:
| I don't think true python "one liners" are a thing, but
| the awkward thing about awk is sits in this place where
| what you are doing is complicated enough you need awk,
| but simple enough you need a one liner? Those cases have
| been exceedingly few and far between for me enough that
| every time I want to reach for awk I have to go lookup
| how to do anything more complex than printing fields.
| That completely defeats the point of the quick one liner.
|
| May as well open up vim, write my 7 lines of python, and
| run it. Because I use it everyday and didn't have to look
| anything up it ends up far faster. Then when I am done I
| either delete it, throw it in a scripts directory, or
| make it part of some existing infrastructure repo. Now if
| I keep it because I used python it is much more readable
| than the awk 1 liner would have been.
|
| I have tried in earnest to memorize awk's idiosyncrasies
| multiple times now. By the time I go to use what I
| learned the last time it is months later and I have
| forgot enough I need to go look stuff up.
|
| So in a way, here I am: The guy that writes "one liners"
| in python.
| abecedarius wrote:
| Yeah, it's a different world from when I learned Awk. You
| might enjoy the (very short) book by the creators just
| because it's a great focused expression of the Unix way.
| But nobody _needs_ to learn it.
| lbhdc wrote:
| I think that is a good point, that often writing a short
| python script is usually the best solution.
|
| I use awk (and python) daily at work. I work with a lot
| of flat files, and I use awk when I am doing data quality
| checks. One of the "sweet spots" it hits for me is when I
| need to group data by value, or other relatively simple
| aggregations.
| kragen wrote:
| Anything busybox-based. I'm not sure busybox awk is very
| complete, either.
| selfhoster11 wrote:
| IMO, unless you're doing embedded work or building minimal
| containers, you'll pretty much always have access to a decent
| runtime (or several).
|
| Python: almost every conventional server. Python dependencies
| are so ubiquitous that you aren't likely to find a Linux
| install without it.
|
| Perl: every DEB and RPM machine, and anything with Git
| installed. You can't really escape it, unless you're
| embedded.
|
| PowerShell (yeah, I know): every Windows machine from XP
| onwards (though usable only from 7 onwards), and some Linux
| computers if installed.
|
| Java: lots and lots of places will have this available.
|
| Dockerized runtime of your choice: not ubiquitous, but I
| expect more and more developer machines and servers to gain
| Docker or Docker-like container support.
|
| There really isn't any reason to stick to AWK, unless you're
| working directly on embedded devices or just like using it.
| [deleted]
| zeveb wrote:
| > Very few people still code with the legacies of the 1970s: ML,
| Pascal, Scheme, Smalltalk.
|
| Arguably, the software world would be better off if more people
| _did_ code with those 1970s languages, than with the ones we are
| stuck with now.
|
| And that applies to Awk, too. As the author quotes Neil Ormos
| stating, Awk is well suited for _personal computing_ , something
| which we have gotten further and further from at the same time as
| computers have become more distributed. At what point in history
| have such a large fraction of the human race had the ability to
| calculate to such an amazing order of magnitude, and at what
| point in history have such a large fraction of the same human
| race not bothered with calculation?
|
| Awk is a great tool precisely because it puts quite a lot of
| expressive power in the hands of an average user on a Unix
| system. Sure, on a Lisp machine or Smalltalk machine there really
| isn't the same need for Awk: the systems languages on such
| machines are safe enough and expressive enough to do what Awk
| does. But in the Unix context -- which is basically what we're
| all living in, with even the VMS-derived Windows more-or-less
| adhering to the Unix model -- Awk is a godsend.
|
| edit: correct typo
| gompertz wrote:
| Oh man, you sound like a long lost friend. As someone who
| struggles to adopt really anything post ~1995 in the
| programming world, I couldn't agree more. I've worked for
| Fortune 100s my whole career; mostly in big data problem-
| spaces, before it ever was cool (if it even is now?), and I
| really feel all the problems people perceive today were solved
| all the way back to the 1960s (i.e. Snobol4). I understand for
| modern web and mobile contexts, sure there is new fancy tools
| for that; but as you said, in the personal computing space, the
| proper tools have existed for decades.
| ketanmaheshwari wrote:
| My own shameless plug:
| https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Da...
| mukundesh wrote:
| awk is great for data analysis - usually, I start with cut, then
| move to awk as complexity increases and finally to python.
| tyingq wrote:
| Gawk's ability to extend it with C code is interesting as well,
| and pretty straightforward.
|
| Here's the source for the fork() extension that ships with
| gawk...it's ~150 lines or so:
| https://git.savannah.gnu.org/cgit/gawk.git/tree/extension/fo...
|
| I was able to make a (terrible/joke/but-it-kinda-works) web
| server with gawk using the extensions that ship with it:
| https://gist.github.com/willurd/5720255#gistcomment-3143007
| tgv wrote:
| My opinion that belongs to me is as follows. This is how it
| goes. The next thing I'm going to say is my opinion.
|
| The C interop and name-spaces (also in gawk) is a bridge too
| far for me. By the time you need one of those, it's time to
| look for another language. Awk is just not enough of a language
| to write serious programs in. And I really like awk. It has
| enabled great scripting not only for log files, but also for
| dictionaries, back in the day when it was still hard to load
| one in memory.
|
| That is my opinion, it is mine, and belongs to me and I own it,
| and what it is too.
| gompertz wrote:
| It's good you're unapologetic. At the same time, these sort
| of features are what I love as they avoid me having to move
| onwards to something new, and start near ground zero. Living
| by the mantra "Do 2 things 1000 times, not 1000 things 2
| times."
| melling wrote:
| i no longer use it but Perl was always the better solution when
| one thought AWK was the answer.
|
| Perl will do those things where AWK really shines and if the
| problem got bigger, Perl was easier to deal with.
| coliveira wrote:
| The problem is that awk is a very simple language, which you
| can learn in an afternoon. Perl is a very complex language, and
| is not used anymore, so you're just spending your time on
| something you'll rarely use.
| selfhoster11 wrote:
| It's used in Debian system tools and in Git, so it's still in
| wide use.
| chasil wrote:
| OpenBSD's binary package system is written in perl.
| throwawayboise wrote:
| Probably as much for legacy reasons as anything else. Perl
| was the chosen scripting language for utilities, it works,
| they understand it, and they've kept with it. Sort of how
| they stay with CVS for their source repository.
|
| Python isn't even installed on a base OpenBSD system.
| chasil wrote:
| Mark Espie rewrote the entire package system in perl in
| 2010, which is a bit late to be classed as legacy.
|
| https://undeadly.org/cgi?action=article;sid=2010032314130
| 7
|
| I'm not sure what was used for the version before this,
| but the original BSD package system was written in C.
| throwawayboise wrote:
| But perl was already the "standard" for other
| system/config utilities, no?
| chasil wrote:
| I don't know what we mean by "standard," but I found a
| number of perl references with the following shell
| fragment: $ for x in $(echo $PATH|sed
| 's/:/ /g'); do file $x/*|grep perl;done
|
| All but two hits were in /usr/sbin, and /usr/bin. I
| isolated those files with: $ file
| /usr/sbin/* | awk
| '/perl/{sub(/:.*/,"");sub(/^.*[/]/,"");printf "%s, ",
| $0}';echo ''
|
| The sbin results are: adduser,
| fw_update, pkg_add, pkg_check, pkg_create, pkg_delete,
| pkg_info, pkg_mklocatedb, pkg_sign, rmuser,
|
| There are more in /bin: $ file
| /usr/bin/* | awk
| '/perl/{sub(/:.*/,"");sub(/^.*[/]/,"");printf "%s, ",
| $0}';echo '' c2ph, corelist, cpan, enc2xs,
| encguess, h2ph, h2xs, instmodsh, libnetcfg, libtool,
| perl, perlbug, perldoc, perlivp, piconv, pkg-config,
| pl2pm, pod2html, pod2man, pod2text, pod2usage,
| podchecker, podselect, prove, pstruct, skeyprune, splain,
| streamzip, xsubpp,
|
| A perl script can't pledge() or unveil(), so I am
| guessing that anything sensitive has moved to C.
| boogies wrote:
| > A perl script can't pledge() or unveil()
|
| It doesn't seem to support all of OpenBSD's privilege
| separation, but there are OpenBSD::Unveil(3p),
| OpenBSD::Pledge(3p), and https://github.com/rfarr/Unix-
| Pledge
|
| https://bronevichok.ru/posts/pledge.html
| chasil wrote:
| Did not know that, thanks.
| mhd wrote:
| The part that's equivalent to what you'd use for your regular
| awk isn't very different. Sure, you can do full-scale OO
| programs, but that doesn't have a large impact on small
| string munging. I get that you might not learn it to fluff up
| your CV.
|
| Also, it's usually the same kind of Perl, so you don't have
| to worry about whether awk is the "one true" one, or mawk, or
| gawk...
| sigzero wrote:
| Perl is very must still used. lol
| thesuperbigfrog wrote:
| >> Perl is a very complex language, and is not used anymore,
| so you're just spending your time on something you'll rarely
| use.
|
| Perl is no more complex than Python, Ruby, or Powershell. If
| you use any of those you can be productive with Perl in a few
| hours.
|
| Perl is still used, it is just not as popular as it was in
| the past. Do you use Git? Parts of it are written in perl.
| Large parts of Git were originally written in Perl, but have
| been migrated to C over time.
| forinti wrote:
| If you work a lot with Linux, you can pretty much count on
| Perl and awk always being there. So it comes in quite handy
| to know them.
| zeteo wrote:
| Perl was built initially as a sed/awk killer but got distracted
| into trying to take over the world. The interpreter for a
| language with 100x the number of features will always be
| slower. Also there's a very clear boundary for when I should
| use awk by itself, as part of a pipeline, or switch to a better
| tool. I feel like Perl has the potential to suck me
| imperceptibly into a huge mess where I spend 80% of my time
| refactoring everything.
| tyingq wrote:
| I found that to be the case many times as well. But awk also
| often outperforms Perl, especially mawk.
| Scarbutt wrote:
| Yes but you can't learn perl as quickly as you can learn awk.
| jfk13 wrote:
| Though you can learn just enough perl to do awk-like things
| fairly easily. And then grow from there as needed.
| throwawayboise wrote:
| IDK. On my OpenBSD system the awk man page is under 500
| lines, and it pretty much covers the subject.
|
| I've tried to get started in Perl a few times, and just
| found it weird. It doesn't click. Awk is kind of weird too
| but it's so simple it doesn't matter.
|
| I'm sure I would eventually get Perl if I _had_ to use it.
| But for me, awk and sed and shell scripting have covered my
| needs.
| linuxlizard wrote:
| I use awk to auto-generate C header files from other header
| files. I work with $vendor's huge complicated kernel driver
| codebase. I need small pieces of $vendor's interconnected header
| files in order to make kernel calls to their drivers without
| pulling in all their code.
| cb321 wrote:
| When you have a standardized problem setting like the implicit
| loop in awk, n alternative to a whole new programming language is
| a simple < 100 lines of code program generator [1].
|
| This design lets you retain easy access to large sets of pre-
| existing libraries as well as have a "compiled/statically typed"
| situation, if you want. It also leverages familiarity with your
| existing programming languages. I adapted a similar small program
| like this to emit a C program, but anything else is obviously
| pretty easy. Easy is good. Familiar is good.
|
| Interactivity-wise, with a TinyC/tcc fast running compiler
| backend my `rp` programs run sub-second from ENTER to completion
| on small data. Even with not optimizing tcc, they they still run
| faster than byte-compiled/VM interpreted mawk/gawk on a per
| input-byte basis. If you take the time to do an optimized build
| with gcc -O3/etc., they can run much faster.
|
| And I leave the source code around if you want to just use the
| program generator as a way to save keystrokes/get a fast start on
| a row processing program.
|
| Anyway, I'm not trying to start a language holy war, but just
| exhibit how if you rotate the problem (or your head looking at
| the problem) ever so slightly another answer exists in this space
| and is quite easy. :-)
|
| [1]
| https://github.com/c-blake/cligen/blob/master/examples/rp.ni...
| gompertz wrote:
| And let's not forget about the amazing commercial offering of
| Awk, known as Tawk (by Thompson Automation). To this day some
| features from Tawk cannot be found in Gawk.
| dugmartin wrote:
| My first and only real use of awk was around 1995. I was working
| at a new job doing embedded software work at GE and we had a lot
| of documentation in SGML, written/viewed using Interleaf.
| Interleaf was super slow on the HP-UX workstations we had and
| iirc search was even slower. I got the idea to convert all the
| SGML files into a single HTML file and I reached for awk as I had
| used it for some one-liners previously. I ended up writing an awk
| script that generated a frameset with one sidebar frame that was
| a treeish table of contents and the other frame the mondo html
| file with anchors for the table of contents. It loaded pretty
| fast in the HP-UX browser and search was really fast.
| torcete wrote:
| I use awk constantly in bioinformatics, for many of the file
| formats designed to store genomic data, awk is the easiest tool
| you can use for processing.
| jhbadger wrote:
| There's even a version of awk specifically designed for
| bioinformatics that natively knows how to handle fasta, fastq,
| and sam files, among other formats.
|
| https://github.com/lh3/bioawk
| unemphysbro wrote:
| I did the exact same thing!
|
| quickly looking at averages/errors, a simple awk one-liner will
| do.
| shp0ngle wrote:
| awk is fast and really useful.
|
| It's also generally unreadable.
| coliveira wrote:
| I don't agree. Awk is very readable for people used to c-like
| languages like javascript. And it is much cleaner that Perl.
| gpderetta wrote:
| It is certainly more readable than sed for example.
| throwawayboise wrote:
| Yeah I use sed not infrequently but try to keep things
| simple. Anything more complicated than a "standard" sed
| one-liner (google it) I will start looking for something
| else.
| forinti wrote:
| sed is pretty ancient too. I've used it a lot with Docker to
| alter parameters during builds.
| dekhn wrote:
| I've used Python almost my entire career, but started with out
| the UNIX tools. I never found awk interesting, then took a peek
| at it recently and understood: this was _the_ pre-perl! it had
| scripting-language hash tables!
| Anon84 wrote:
| PERL was originally advertised as a replacement for "awk and
| sed"
| dekhn wrote:
| yep- and I went straight to perl after learning sed, and
| ignoring awk. awk looked even weirder than perl (I wasn't a
| big fan of the pattern matching style). In retrospect, I
| think awk is a massively underappreciated (for its time and
| context). I can't say I'd want to work with it regularly
| (same for perl; in the long run, I prefer variants of C
| style).
| asicsp wrote:
| HN discussion threads for some of the links mentioned in the
| article:
|
| * Using AWK and R to parse 25TB -
| https://news.ycombinator.com/item?id=20293579
|
| * Command-line Tools can be 235x Faster than a Hadoop Cluster -
| https://news.ycombinator.com/item?id=17135841
|
| * The State of the AWK -
| https://news.ycombinator.com/item?id=23240800
|
| For awk alternative implementations, I'm keeping an eye on frawk
| [0]. Aims to be faster, supports csv, etc.
|
| [0] https://github.com/ezrosent/frawk
| nmz wrote:
| CSV is a complicated format but that does not mean awk is
| incapable of dealing with it.
|
| https://www.gnu.org/software/gawk/manual/html_node/Splitting...
|
| https://github.com/e36freak/awk-libs/blob/master/csv.awk
|
| https://raw.githubusercontent.com/Nomarian/Awk-Batteries/mas...
| boogies wrote:
| > CSV is a complicated format
|
| Surprisingly and unnecessarily so:
|
| > ["DSV"] is to Unix what CSV (comma-separated value) format
| is under Microsoft Windows and elsewhere outside the Unix
| world. CSV (fields separated by commas, double quotes used to
| escape commas, no continuation lines) is rarely found under
| Unix.
|
| > In fact, the Microsoft version of CSV is a textbook example
| of how not to design a textual file format. Its problems
| begin with the case in which the separator character (in this
| case, a comma) is found inside a field. The Unix way would be
| to simply escape the separator with a backslash, and have a
| double escape represent a literal backslash. This design
| gives us a single special case (the escape character) to
| check for when parsing the file, and only a single action
| when the escape is found (treat the following character as a
| literal). The latter conveniently not only handles the
| separator character, but gives us a way to handle the escape
| character and newlines for free. CSV, on the other hand,
| encloses the entire field in double quotes if it contains the
| separator. If the field contains double quotes, it must also
| be enclosed in double quotes, and the individual double
| quotes in the field must themselves be repeated twice to
| indicate that they don't end the field.
|
| > The bad results of proliferating special cases are twofold.
| First, the complexity of the parser (and its vulnerability to
| bugs) is increased. Second, because the format rules are
| complex and underspecified, different implementations diverge
| in their handling of edge cases. Sometimes continuation lines
| are supported, by starting the last field of the line with an
| unterminated double quote -- but only in some products!
| Microsoft has incompatible versions of CSV files between its
| own applications, and in some cases between different
| versions of the same application (Excel being the obvious
| example here).
|
| -- _The Art of Unix Programming_
| http://www.catb.org/~esr/writings/taoup/html/ch05s02.html
| SjorsVG wrote:
| I find it very unpleasant to read Awk code. It looks as bad as
| regex to me.
| nesuse wrote:
| There's a free awk course here for anyone interested
| https://www.udemy.com/course/awk-tutorial/
| justin_oaks wrote:
| I only recently learned Awk enough to be useful. But I still
| don't reach for it when I probably should.
|
| What are the most common cases where you reach for Awk instead of
| some other tools?
|
| I recently used it to parse and recombine data from the OpenVPN
| status file. That file has a few differently formatted tables in
| the same file. Using Awk, I was able to change a variable as each
| table was encountered, this I could change the Awk program
| behavior by which table it was operating on.
| coliveira wrote:
| Anything that is command line based and needs small changes to
| text input can be done with awk. It is a very competent
| language for scripts.
| throwawayboise wrote:
| I use it a lot to filter, slice, and dice CSV (or other
| delimited) or fixed-format files. Sometimes I'll use q[1] if my
| needs are more complex. Or awk piped to q. It can be used as a
| fairly decent report generator for plain-text or HTML reports.
|
| An time I want to process a bunch of lines in a text file, awk
| is my first consideration.
|
| [1] http://harelba.github.io/q/
| jedimastert wrote:
| From what I can tell, Awk really shines in two places,
| transformation and collation, both of which require some form
| of structured file. You can transform one structure into
| another and you can process record by record to some form of
| collation or summary.
| chasil wrote:
| Here is a script that I use to send SMTP mail, via the gawk
| networking extensions. I have a few different versions, but
| this is the most basic: #!/bin/gawk -f
| BEGIN { smtp="/inet/tcp/0/smtp.yourhost.com/25";
| ORS="\r\n"; r=ARGV[1]; s=ARGV[2]; sbj=ARGV[3]; #
| /usr/local/bin/awkmail to from subj < in print
| "helo " ENVIRON["HOSTNAME"] |& smtp; smtp |& getline j;
| print j print "mail from: " s |&
| smtp; smtp |& getline j; print j if(match(r, ","))
| { split(r, z, ",") for(y in z) { print
| "rcpt to: " z[y] |& smtp; smtp |& getline j; print j }
| } else { print "rcpt to: " r |& smtp; smtp
| |& getline j; print j } print "data"
| |& smtp; smtp |& getline j; print j print "From: "
| s |& smtp; ARGV[2] = "" # not a file
| print "To: " r |& smtp; ARGV[1] = ""
| # not a file if(length(sbj)) { print "Subject: " sbj |&
| smtp; ARGV[3] = "" } # not a file print ""
| |& smtp while(getline > 0) print |&
| smtp print "." |&
| smtp; smtp |& getline j; print j print "quit"
| |& smtp; smtp |& getline j; print j close(smtp) }
| # /inet/protocol/local-port/remote-host/remote-port
|
| This allows me to bypass the local MTA (if present). The
| message ID is also returned, which can be useful to log.
| mellavora wrote:
| try running this: awk '{cmd="rm " FILENAME; print cmd;
| system(cmd) }' file*
|
| best results if you do 'sudo' first
|
| ymmv
| [deleted]
| generalizations wrote:
| At least add a /s to your comment. I like learning from the
| stuff people comment on here, and while there's an element of
| "that would be an important lesson" to what you posted, it's
| mostly just an unnecessary landmine.
| exdsq wrote:
| I had to take large CSV files like {question, right_ans,
| wrong_ans1, wrong_ans2, wrong_ans3} and covert them into SQL
| insert files. Few caveats - some could be duplicates, some
| characters were not allowed, and some had formatting issues.
| The first issue was avoided by upserting, but the other two I
| used Awk and Sed for and put together a fairly robust script
| far quicker than if I reached for Python. I probably would have
| reached for Python if I realised how many edge cases there were
| but I didn't know that at the start so the script just sort of
| grew as I went along, but now they're my go-to tools for
| similar tasks.
| WhatIsDukkha wrote:
| """I probably would have reached for Python if I realised how
| many edge cases there were"""
|
| This is the counter for all the "success" stories of awk
| users that walked away with an underspecced and
| underdeveloped 5 minute solution.
| throwawayboise wrote:
| Most people reach for what they know best. I'm not sure it
| really proves anything about relative merits.
| chasil wrote:
| Awk is not really very good at reading complex CSVs (as
| defined in RFC-4180), where newlines (record separators) can
| appear within quoted strings. It can be done, but sometimes
| it's tricky.
|
| The PHP fgetcsv function has been more convenient when I have
| had more exotic examples.
|
| If the CSV is simple, awk remains a very good tool.
| throwawayboise wrote:
| CSVs with quoted fields and imbedded newlines can be
| troublesome in awk. Years ago I had found a script that
| worked for me, I'm not sure but I think it was this:
|
| http://lorance.freeshell.org/csv/
|
| There's also https://github.com/dbro/csvquote which is more
| unix-like in philosophy: it sits in a pipeline, and only
| handles transforming the CVS data into something that awk
| (or other utilities) can more easily deal with. I haven't
| used it but will probably try it next time I need something
| like that.
| nmz wrote:
| if the csv is RFC-4180 then it can handle it[0]. the only
| caveat is that you can't disable FS="" correctly. but a
| gawk -i ./csv.awk -e '{print $5}' would work on most csv
| files I've tried.
|
| https://raw.githubusercontent.com/Nomarian/Awk-
| Batteries/mas...
| cturner wrote:
| Have found static builds of awk useful in low-dependency work.
| I bundled it with a windows installer to do some wrangling we
| needed at install time. Another time I was sending packages to
| a unix cluster, but did not have access myself. Used awk as
| part of the bootstrap for the package.
|
| I used to write event-driven scripts off it - each line is a
| message, interpreted by awk. Something I was not able to get
| working with any of the awks I tried was where you append
| messages to the file as you are consuming it (this is kind of
| like code generation). I ended up doing this in python
| (https://github.com/cratuki/interface_script_py).
| jrochkind1 wrote:
| My first job getting paid to program was in awk. Processing log
| files.
|
| In the middle of that job, my supervsior, you know what, we're
| doing increasingly complicated things with awk and it's getting
| increasingly hacky... I've heard that Perl is like awk but
| better, do you want to learn Perl and switch to that?
|
| And so we did. My thought then was there was little that was
| easier in awk than Perl, you could use Perl very much like awk if
| you wanted, you can even use the right command-line args to have
| Perl have an "implied loop" like awk... but then you can do a lot
| more with Perl too.
|
| I don't use Perl anymore. Or awk.
| linuxlizard wrote:
| I think I remember reading somewhere Larry Wall was inspired to
| create Perl in order to combine awk+sed functionality. He was
| sick of awk+sed being almost powerful enough to do what he
| needed. (I can't find a reference to this though.)
| arendtio wrote:
| Learning awk is actually pretty simple. For years I just used the
| '{print $2}' version to extract fields, but after reading some
| short book I felt pretty confident of having understood the
| basics.
|
| Sadly I don't remember which book it was, but this page looks
| like a good start: https://ferd.ca/awk-in-20-minutes.html
| abecedarius wrote:
| Likely the one by A, W, and K.
| https://news.ycombinator.com/item?id=13451454
| arendtio wrote:
| Yes, this looks like it. Thanks :-)
| vyuh wrote:
| "A good programmer uses the most powerful tool to do a job. A
| great programmer uses the least powerful tool that does the job."
| I believe this, and I always try to find the combination of
| simple and lightweight tools which does the job at hand
| correctly.
|
| Awk sometimes proves surprisingly powerful. Just look at the
| concision of this awk one liner doing a fairly complex job:
| zcat large.log.gz | awk '{print $0 | "gzip -v9c >
| large.log-"$1"_"$2".gz"}' # Breakup compressed log by syslog date
| and recompress. #awksome
|
| Taken from:
| https://mobile.twitter.com/climagic/status/61415389723039744...
| zdwolfe wrote:
| I really love that quote "..A good programmer...", do you have
| a source?
| dunefox wrote:
| Ehh. Until the 'job' gets extended and then your simple tool
| makes it exponentially more complex and you have to rewrite it
| with the more powerful tool.
| klyrs wrote:
| The nice thing about a 1-liner is you only lose a few minutes
| to throwing it out entirely and rewriting it to fit a new
| purpose. Dwelling on what might be needed is of limited
| utility, because of the very real possibility that what's
| actually needed in the future is wildly different from what
| you spent all that time planning for.
| selfhoster11 wrote:
| This is fine. I often "prototype" my automations as shell
| scripts, to explore what I actually want the tool to handle.
| Once it gets longer than 20 or so lines, it's time to move to
| a better language, but I don't mind rewriting. This is a
| chance to add error handling, config, proper arguments,
| built-in help texts and whatever else.
| teknopaul wrote:
| I started to add error handling to my shell scripts and
| often never rewrite them. Defo agree with the sentiment
| that you should always be happy (and able) to rewrite a
| shell scripts, dont let its scope creep. I don't mind
| long(ish) shell scripts as long as the program flow is
| fairly linear. Too many function calls is the smell that
| makes me rewrite.
| inanutshellus wrote:
| Choosing a "good enough for the medium term with minimal
| effort now" is a winner in my book, even if it's likely to be
| rewritten in the long term.
| selfhoster11 wrote:
| Exactly. I end up re-implementing my scripts if they
| outgrow the original scripting language anyway, because
| it's a good time to add proper argument and error handling,
| logging, etc.
| Folcon wrote:
| Surely that isn't a weakness of a simple tool?
|
| A 5 min job that probably won't get extended saving you from
| having to spend 20 mins coding something up is better than,
| feeling annoyed that you have spent the 20 mins coding up the
| original implementation and then extend it.
|
| Hopefully, you also get the benefit of additional knowledge
| on that future implementation as well. Why wouldn't this just
| be a net win?
|
| Unless you're talking about writing hack after hack after
| hack, eventually leaving yourself with some incomprehensible
| eldritch monstrosity, in which case, don't do that?
| rakoo wrote:
| If I understand this correctly, it will gzip every line
| separately instead of gzipping them together... it's not really
| the most effective but it does work
| aidenn0 wrote:
| It does not. The pipe command leaves the pipe open and
| successive pipes with identical strings remain open until the
| pipe is explicitly closed.
|
| [edit]
|
| Here's the link to the gawk documentation, but most flavors
| of AWK work similarly:
| https://www.gnu.org/software/gawk/manual/gawk.html#Close-
| Fil...
| rakoo wrote:
| Wow, this is amazing. It really shows how complexity should
| be managed in the tool so that the user can do the naive
| thing and have it be accidentally optimal
| aidenn0 wrote:
| It is surprising to people who expect them to behave like
| shell pipelines and redirections though. I somehow never
| got bit by it, but have definitely corrected other's awk
| scripts who didn't know about this feature.
___________________________________________________________________
(page generated 2021-09-07 23:01 UTC)