[HN Gopher] Awk: The Power and Promise of a 40-Year-Old Language
       ___________________________________________________________________
        
       Awk: The Power and Promise of a 40-Year-Old Language
        
       Author : jangid
       Score  : 215 points
       Date   : 2021-09-07 07:13 UTC (15 hours ago)
        
 (HTM) web link (www.fosslife.org)
 (TXT) w3m dump (www.fosslife.org)
        
       | zeteo wrote:
       | My company mandates Windows but Git Bash has been a backdoor into
       | Unix tools and I've recently learned sed and awk to take full
       | advantage of it. You need to think a bit about your one liners
       | and they'll always feel very hacky, but sed/awk (with a bit of
       | sort thrown in) are an amazingly powerful combination for dealing
       | with all sorts of messy data dumps. In 10 minutes I can craft a
       | one liner that replaces a 2 hours C# console app and runs just as
       | fast. And, surprisingly, I often find it easier to go back months
       | later and understand the messy looking one liner than the nicely
       | formatted, well commented, unit tested console app.
        
       | phkahler wrote:
       | I never use Awk until last year. I wanted to monitor an embedded
       | device with little more than bustbox and python on it. There was
       | quite a bit of information in the log files (I had already
       | written a custom log file viewer with some highlighting) but I
       | wanted to monitor in real-time. Somehow I decided to use Awk to
       | monitor the tail of the log file and do realtime bar-graphs by
       | generating appropriate cursor control sequences. In the end I had
       | about 50 lines of Awk to upload to the board and run a command to
       | pipe the log into it - very minimally invasive and very
       | informative.
       | 
       | Would recommend learning Awk with some kind of real-world use of
       | your own. BTW it reminded me of using XSLT which I think is
       | another often overlooked "good thing".
        
         | cogman10 wrote:
         | The biggest reason to learn AWK, IMO, is that it's on pretty
         | much every single linux distribution.
         | 
         | You might not have perl or python. You WILL have AWK. Only the
         | most minimal of minimal linux systems will exclude it. Even
         | busybox includes awk. That's how essential it's viewed.
        
           | jejones3141 wrote:
           | Something fun in that regard, speaking of minimal...the
           | TRS-80 Color Computer community now has a version of awk that
           | runs on NitrOS-9, a variant of OS-9/6809 originally written
           | for the Hitachi 6309. (64K address space, no separate I and D
           | space.)
        
           | michaelcampbell wrote:
           | I'm curious what linux distros don't have either some version
           | of perl or python.
           | 
           | I like awk, mind, but this is not necessarily (IME) a good
           | argument for it.
        
             | cogman10 wrote:
             | You'll find this a lot in the embedded space. As well,
             | you'll see a bunch of docker images that don't have
             | perl/python.
        
               | selfhoster11 wrote:
               | Building a Docker image gives basically full freedom over
               | the choice of a runtime. If your Dockerized application
               | is written in Java or Python or PHP or C#, why not just
               | write the tooling and scripts in the same language too?
               | Or at least install a suitable runtime just for the
               | scripts? Or if starting from an empty container, why not
               | build the script into a statically-linked binary to be
               | placed next to the application?
        
               | cogman10 wrote:
               | Typically, you want docker images as slim as possible.
               | Both to make it faster to distribute and to prevent
               | attacks if something escapes your application. The less
               | in the image, the less exploitable your image is.
               | 
               | Beyond keeping the images slim, the times I'd reach for
               | awk when dealing with a docker container would be when
               | I'm debugging problems within that container. I might
               | need to do some quick text parsing or finagling in order
               | to troubleshoot why the application is sucking.
               | 
               | I'd rather not need to upload a Java script into my
               | docker container just for quick troubleshooting.
        
               | selfhoster11 wrote:
               | I agree on the slimness of Docker images, but if you e.g.
               | have some kind of video or photo CMS written in PHP, then
               | any housekeeping or export scripts etc are better off
               | being written in PHP as well (or even integrated into the
               | application) given how close they're already bound with
               | the rest of the application.
               | 
               | For anything beyond that, I would very greatly prefer to
               | have "black box", extremely verbose log dumps and
               | database dumps that I could analyse over at my actual dev
               | machine, or a good debugger that lets me step through the
               | code to figure out what's going wrong.
               | 
               | I do realise that not all languages have good tooling, or
               | that some people prefer to use `printf` style debugging,
               | so it may not apply to all.
        
             | jolmg wrote:
             | > I'm curious what linux distros don't have either some
             | version of perl or python.
             | 
             | I imagine that DamnSmallLinux or TinyCoreLinux possibly
             | don't have them by default. Their focus is to be as small
             | as possible in order to download quickly and fit in a USB
             | drive or CD. Their small size was more important back when
             | speeds were slower and drives were smaller. They were also
             | good for when you had a limited number of storage options
             | and you wanted the running OS to fit completely in RAM
             | (back when RAM was smaller).
        
               | selfhoster11 wrote:
               | I don't think I ever ran TinyCore without immediately
               | connecting it to the Internet to grab a bunch of
               | packages. Puppy Linux included Perl in its base install
               | at one time (I don't know if it still does), and Damn
               | Small Linux was supposed to have a cut-down version of
               | Perl included as well.
        
               | jolmg wrote:
               | Python definitely not, though.
        
               | selfhoster11 wrote:
               | Yeah, but if you are happy to program in Perl, that's
               | basically every major Linux distro covered. Anything
               | using DEB or RPM packaging, any machine with Git
               | installed (which includes Windows), plus the ones I
               | already mentioned, already have access to Perl. This is a
               | formidable installed base with no effort needed to
               | install a runtime.
        
               | jolmg wrote:
               | I agree, but michaelcampbell's point seemed to be: why
               | learn a language for its ubiquity, when more commonly
               | used languages seem to be just as ubiquitous? So, I
               | focused on how they're not _that_ ubiquitous.
        
               | selfhoster11 wrote:
               | I see what you mean. I guess what I was trying to say is
               | that my position is close to that of michaelcampbell's,
               | and that I wanted to emphasize how little portability is
               | sacrificed by adopting this position on most environments
               | one will ever work in.
        
               | baktubi wrote:
               | If you're using DamnSmallLinux etc I'd imagine you can
               | package your own awk quite easily! Perl would require a
               | lot more packages. But all you need to do is copy a
               | couple binaries right?
        
               | jolmg wrote:
               | Haven't used these distros since a decade or so ago.
               | 
               | Not sure why I'd have to package awk. Busybox's is
               | probably sufficient for most uses, if the need ever
               | arised, which I don't think it normally does when using
               | these distros.
        
               | baktubi wrote:
               | Agreed. Not having enough space for awk would be daaaaamn
               | small indeed.
        
             | greggyb wrote:
             | The POSIX specification includes awk, but not perl or
             | python. The world of UNIX and UNIX-likes is larger than
             | just Linux distributions. Depending on the utility you plan
             | on building and the platforms you expect it to run on, it
             | may be wiser to reach for awk than other PLs.
        
               | pcwalton wrote:
               | Modern BSDs, macOS, and Solaris certainly have Perl and
               | Python. (iOS and Android don't, but they don't have awk
               | either.) What other Unixes are you thinking of? AIX,
               | HP/UX, IRIX, UnixWare, etc. should be considered
               | retrocomputing at this point and not relevant to modern
               | compatibility discussions.
               | 
               | Linux distros based on busybox, as mentioned elsewhere in
               | this thread, are a more compelling reason for considering
               | awk than considerations involving other Unixes.
        
               | cogman10 wrote:
               | Wasn't awk added to android in 9?
        
               | Aptrug wrote:
               | Yep, https://android.googlesource.com/platform/system/cor
               | e/+/mast...
        
               | bdk0 wrote:
               | You can install python and perl on BSDs, but its
               | different than awk, where its part of the core OS and
               | guaranteed to be there without needing to install extra
               | stuff.
        
             | BuildTheRobots wrote:
             | The better question might be "which Linux distro's don't
             | have perl or python installed by default" as a lot of
             | people are working on systems where they can't just add
             | additional packages.
             | 
             | Perl has been getting cut from minimal builds of distro's
             | for a while. Default installed version of python is a bit
             | of a crap-shoot, nevermind which modules you might happen
             | to have available.
        
             | abecedarius wrote:
             | A nice thing about awk vs. Perl/Python: there's a small
             | focused set of things to learn. Once you learn them you're
             | done.
             | 
             | This suggests an opening for a Perl/Python intro focused on
             | the exact same tasks, admittedly. That seems more realistic
             | for Perl -- unless there's someone who writes Python one-
             | liners at the shell?
        
               | r-bar wrote:
               | I don't think true python "one liners" are a thing, but
               | the awkward thing about awk is sits in this place where
               | what you are doing is complicated enough you need awk,
               | but simple enough you need a one liner? Those cases have
               | been exceedingly few and far between for me enough that
               | every time I want to reach for awk I have to go lookup
               | how to do anything more complex than printing fields.
               | That completely defeats the point of the quick one liner.
               | 
               | May as well open up vim, write my 7 lines of python, and
               | run it. Because I use it everyday and didn't have to look
               | anything up it ends up far faster. Then when I am done I
               | either delete it, throw it in a scripts directory, or
               | make it part of some existing infrastructure repo. Now if
               | I keep it because I used python it is much more readable
               | than the awk 1 liner would have been.
               | 
               | I have tried in earnest to memorize awk's idiosyncrasies
               | multiple times now. By the time I go to use what I
               | learned the last time it is months later and I have
               | forgot enough I need to go look stuff up.
               | 
               | So in a way, here I am: The guy that writes "one liners"
               | in python.
        
               | abecedarius wrote:
               | Yeah, it's a different world from when I learned Awk. You
               | might enjoy the (very short) book by the creators just
               | because it's a great focused expression of the Unix way.
               | But nobody _needs_ to learn it.
        
               | lbhdc wrote:
               | I think that is a good point, that often writing a short
               | python script is usually the best solution.
               | 
               | I use awk (and python) daily at work. I work with a lot
               | of flat files, and I use awk when I am doing data quality
               | checks. One of the "sweet spots" it hits for me is when I
               | need to group data by value, or other relatively simple
               | aggregations.
        
             | kragen wrote:
             | Anything busybox-based. I'm not sure busybox awk is very
             | complete, either.
        
           | selfhoster11 wrote:
           | IMO, unless you're doing embedded work or building minimal
           | containers, you'll pretty much always have access to a decent
           | runtime (or several).
           | 
           | Python: almost every conventional server. Python dependencies
           | are so ubiquitous that you aren't likely to find a Linux
           | install without it.
           | 
           | Perl: every DEB and RPM machine, and anything with Git
           | installed. You can't really escape it, unless you're
           | embedded.
           | 
           | PowerShell (yeah, I know): every Windows machine from XP
           | onwards (though usable only from 7 onwards), and some Linux
           | computers if installed.
           | 
           | Java: lots and lots of places will have this available.
           | 
           | Dockerized runtime of your choice: not ubiquitous, but I
           | expect more and more developer machines and servers to gain
           | Docker or Docker-like container support.
           | 
           | There really isn't any reason to stick to AWK, unless you're
           | working directly on embedded devices or just like using it.
        
         | [deleted]
        
       | zeveb wrote:
       | > Very few people still code with the legacies of the 1970s: ML,
       | Pascal, Scheme, Smalltalk.
       | 
       | Arguably, the software world would be better off if more people
       | _did_ code with those 1970s languages, than with the ones we are
       | stuck with now.
       | 
       | And that applies to Awk, too. As the author quotes Neil Ormos
       | stating, Awk is well suited for _personal computing_ , something
       | which we have gotten further and further from at the same time as
       | computers have become more distributed. At what point in history
       | have such a large fraction of the human race had the ability to
       | calculate to such an amazing order of magnitude, and at what
       | point in history have such a large fraction of the same human
       | race not bothered with calculation?
       | 
       | Awk is a great tool precisely because it puts quite a lot of
       | expressive power in the hands of an average user on a Unix
       | system. Sure, on a Lisp machine or Smalltalk machine there really
       | isn't the same need for Awk: the systems languages on such
       | machines are safe enough and expressive enough to do what Awk
       | does. But in the Unix context -- which is basically what we're
       | all living in, with even the VMS-derived Windows more-or-less
       | adhering to the Unix model -- Awk is a godsend.
       | 
       | edit: correct typo
        
         | gompertz wrote:
         | Oh man, you sound like a long lost friend. As someone who
         | struggles to adopt really anything post ~1995 in the
         | programming world, I couldn't agree more. I've worked for
         | Fortune 100s my whole career; mostly in big data problem-
         | spaces, before it ever was cool (if it even is now?), and I
         | really feel all the problems people perceive today were solved
         | all the way back to the 1960s (i.e. Snobol4). I understand for
         | modern web and mobile contexts, sure there is new fancy tools
         | for that; but as you said, in the personal computing space, the
         | proper tools have existed for decades.
        
       | ketanmaheshwari wrote:
       | My own shameless plug:
       | https://ketancmaheshwari.github.io/posts/2020/05/24/SMC18-Da...
        
       | mukundesh wrote:
       | awk is great for data analysis - usually, I start with cut, then
       | move to awk as complexity increases and finally to python.
        
       | tyingq wrote:
       | Gawk's ability to extend it with C code is interesting as well,
       | and pretty straightforward.
       | 
       | Here's the source for the fork() extension that ships with
       | gawk...it's ~150 lines or so:
       | https://git.savannah.gnu.org/cgit/gawk.git/tree/extension/fo...
       | 
       | I was able to make a (terrible/joke/but-it-kinda-works) web
       | server with gawk using the extensions that ship with it:
       | https://gist.github.com/willurd/5720255#gistcomment-3143007
        
         | tgv wrote:
         | My opinion that belongs to me is as follows. This is how it
         | goes. The next thing I'm going to say is my opinion.
         | 
         | The C interop and name-spaces (also in gawk) is a bridge too
         | far for me. By the time you need one of those, it's time to
         | look for another language. Awk is just not enough of a language
         | to write serious programs in. And I really like awk. It has
         | enabled great scripting not only for log files, but also for
         | dictionaries, back in the day when it was still hard to load
         | one in memory.
         | 
         | That is my opinion, it is mine, and belongs to me and I own it,
         | and what it is too.
        
           | gompertz wrote:
           | It's good you're unapologetic. At the same time, these sort
           | of features are what I love as they avoid me having to move
           | onwards to something new, and start near ground zero. Living
           | by the mantra "Do 2 things 1000 times, not 1000 things 2
           | times."
        
       | melling wrote:
       | i no longer use it but Perl was always the better solution when
       | one thought AWK was the answer.
       | 
       | Perl will do those things where AWK really shines and if the
       | problem got bigger, Perl was easier to deal with.
        
         | coliveira wrote:
         | The problem is that awk is a very simple language, which you
         | can learn in an afternoon. Perl is a very complex language, and
         | is not used anymore, so you're just spending your time on
         | something you'll rarely use.
        
           | selfhoster11 wrote:
           | It's used in Debian system tools and in Git, so it's still in
           | wide use.
        
           | chasil wrote:
           | OpenBSD's binary package system is written in perl.
        
             | throwawayboise wrote:
             | Probably as much for legacy reasons as anything else. Perl
             | was the chosen scripting language for utilities, it works,
             | they understand it, and they've kept with it. Sort of how
             | they stay with CVS for their source repository.
             | 
             | Python isn't even installed on a base OpenBSD system.
        
               | chasil wrote:
               | Mark Espie rewrote the entire package system in perl in
               | 2010, which is a bit late to be classed as legacy.
               | 
               | https://undeadly.org/cgi?action=article;sid=2010032314130
               | 7
               | 
               | I'm not sure what was used for the version before this,
               | but the original BSD package system was written in C.
        
               | throwawayboise wrote:
               | But perl was already the "standard" for other
               | system/config utilities, no?
        
               | chasil wrote:
               | I don't know what we mean by "standard," but I found a
               | number of perl references with the following shell
               | fragment:                   $ for x in $(echo $PATH|sed
               | 's/:/ /g'); do file $x/*|grep perl;done
               | 
               | All but two hits were in /usr/sbin, and /usr/bin. I
               | isolated those files with:                   $ file
               | /usr/sbin/* | awk
               | '/perl/{sub(/:.*/,"");sub(/^.*[/]/,"");printf "%s, ",
               | $0}';echo ''
               | 
               | The sbin results are:                   adduser,
               | fw_update, pkg_add, pkg_check, pkg_create, pkg_delete,
               | pkg_info, pkg_mklocatedb, pkg_sign, rmuser,
               | 
               | There are more in /bin:                   $ file
               | /usr/bin/* | awk
               | '/perl/{sub(/:.*/,"");sub(/^.*[/]/,"");printf "%s, ",
               | $0}';echo ''              c2ph, corelist, cpan, enc2xs,
               | encguess, h2ph, h2xs, instmodsh, libnetcfg, libtool,
               | perl, perlbug, perldoc, perlivp, piconv, pkg-config,
               | pl2pm, pod2html, pod2man, pod2text, pod2usage,
               | podchecker, podselect, prove, pstruct, skeyprune, splain,
               | streamzip, xsubpp,
               | 
               | A perl script can't pledge() or unveil(), so I am
               | guessing that anything sensitive has moved to C.
        
               | boogies wrote:
               | > A perl script can't pledge() or unveil()
               | 
               | It doesn't seem to support all of OpenBSD's privilege
               | separation, but there are OpenBSD::Unveil(3p),
               | OpenBSD::Pledge(3p), and https://github.com/rfarr/Unix-
               | Pledge
               | 
               | https://bronevichok.ru/posts/pledge.html
        
               | chasil wrote:
               | Did not know that, thanks.
        
           | mhd wrote:
           | The part that's equivalent to what you'd use for your regular
           | awk isn't very different. Sure, you can do full-scale OO
           | programs, but that doesn't have a large impact on small
           | string munging. I get that you might not learn it to fluff up
           | your CV.
           | 
           | Also, it's usually the same kind of Perl, so you don't have
           | to worry about whether awk is the "one true" one, or mawk, or
           | gawk...
        
           | sigzero wrote:
           | Perl is very must still used. lol
        
           | thesuperbigfrog wrote:
           | >> Perl is a very complex language, and is not used anymore,
           | so you're just spending your time on something you'll rarely
           | use.
           | 
           | Perl is no more complex than Python, Ruby, or Powershell. If
           | you use any of those you can be productive with Perl in a few
           | hours.
           | 
           | Perl is still used, it is just not as popular as it was in
           | the past. Do you use Git? Parts of it are written in perl.
           | Large parts of Git were originally written in Perl, but have
           | been migrated to C over time.
        
           | forinti wrote:
           | If you work a lot with Linux, you can pretty much count on
           | Perl and awk always being there. So it comes in quite handy
           | to know them.
        
         | zeteo wrote:
         | Perl was built initially as a sed/awk killer but got distracted
         | into trying to take over the world. The interpreter for a
         | language with 100x the number of features will always be
         | slower. Also there's a very clear boundary for when I should
         | use awk by itself, as part of a pipeline, or switch to a better
         | tool. I feel like Perl has the potential to suck me
         | imperceptibly into a huge mess where I spend 80% of my time
         | refactoring everything.
        
         | tyingq wrote:
         | I found that to be the case many times as well. But awk also
         | often outperforms Perl, especially mawk.
        
         | Scarbutt wrote:
         | Yes but you can't learn perl as quickly as you can learn awk.
        
           | jfk13 wrote:
           | Though you can learn just enough perl to do awk-like things
           | fairly easily. And then grow from there as needed.
        
             | throwawayboise wrote:
             | IDK. On my OpenBSD system the awk man page is under 500
             | lines, and it pretty much covers the subject.
             | 
             | I've tried to get started in Perl a few times, and just
             | found it weird. It doesn't click. Awk is kind of weird too
             | but it's so simple it doesn't matter.
             | 
             | I'm sure I would eventually get Perl if I _had_ to use it.
             | But for me, awk and sed and shell scripting have covered my
             | needs.
        
       | linuxlizard wrote:
       | I use awk to auto-generate C header files from other header
       | files. I work with $vendor's huge complicated kernel driver
       | codebase. I need small pieces of $vendor's interconnected header
       | files in order to make kernel calls to their drivers without
       | pulling in all their code.
        
       | cb321 wrote:
       | When you have a standardized problem setting like the implicit
       | loop in awk, n alternative to a whole new programming language is
       | a simple < 100 lines of code program generator [1].
       | 
       | This design lets you retain easy access to large sets of pre-
       | existing libraries as well as have a "compiled/statically typed"
       | situation, if you want. It also leverages familiarity with your
       | existing programming languages. I adapted a similar small program
       | like this to emit a C program, but anything else is obviously
       | pretty easy. Easy is good. Familiar is good.
       | 
       | Interactivity-wise, with a TinyC/tcc fast running compiler
       | backend my `rp` programs run sub-second from ENTER to completion
       | on small data. Even with not optimizing tcc, they they still run
       | faster than byte-compiled/VM interpreted mawk/gawk on a per
       | input-byte basis. If you take the time to do an optimized build
       | with gcc -O3/etc., they can run much faster.
       | 
       | And I leave the source code around if you want to just use the
       | program generator as a way to save keystrokes/get a fast start on
       | a row processing program.
       | 
       | Anyway, I'm not trying to start a language holy war, but just
       | exhibit how if you rotate the problem (or your head looking at
       | the problem) ever so slightly another answer exists in this space
       | and is quite easy. :-)
       | 
       | [1]
       | https://github.com/c-blake/cligen/blob/master/examples/rp.ni...
        
       | gompertz wrote:
       | And let's not forget about the amazing commercial offering of
       | Awk, known as Tawk (by Thompson Automation). To this day some
       | features from Tawk cannot be found in Gawk.
        
       | dugmartin wrote:
       | My first and only real use of awk was around 1995. I was working
       | at a new job doing embedded software work at GE and we had a lot
       | of documentation in SGML, written/viewed using Interleaf.
       | Interleaf was super slow on the HP-UX workstations we had and
       | iirc search was even slower. I got the idea to convert all the
       | SGML files into a single HTML file and I reached for awk as I had
       | used it for some one-liners previously. I ended up writing an awk
       | script that generated a frameset with one sidebar frame that was
       | a treeish table of contents and the other frame the mondo html
       | file with anchors for the table of contents. It loaded pretty
       | fast in the HP-UX browser and search was really fast.
        
       | torcete wrote:
       | I use awk constantly in bioinformatics, for many of the file
       | formats designed to store genomic data, awk is the easiest tool
       | you can use for processing.
        
         | jhbadger wrote:
         | There's even a version of awk specifically designed for
         | bioinformatics that natively knows how to handle fasta, fastq,
         | and sam files, among other formats.
         | 
         | https://github.com/lh3/bioawk
        
         | unemphysbro wrote:
         | I did the exact same thing!
         | 
         | quickly looking at averages/errors, a simple awk one-liner will
         | do.
        
       | shp0ngle wrote:
       | awk is fast and really useful.
       | 
       | It's also generally unreadable.
        
         | coliveira wrote:
         | I don't agree. Awk is very readable for people used to c-like
         | languages like javascript. And it is much cleaner that Perl.
        
           | gpderetta wrote:
           | It is certainly more readable than sed for example.
        
             | throwawayboise wrote:
             | Yeah I use sed not infrequently but try to keep things
             | simple. Anything more complicated than a "standard" sed
             | one-liner (google it) I will start looking for something
             | else.
        
       | forinti wrote:
       | sed is pretty ancient too. I've used it a lot with Docker to
       | alter parameters during builds.
        
       | dekhn wrote:
       | I've used Python almost my entire career, but started with out
       | the UNIX tools. I never found awk interesting, then took a peek
       | at it recently and understood: this was _the_ pre-perl! it had
       | scripting-language hash tables!
        
         | Anon84 wrote:
         | PERL was originally advertised as a replacement for "awk and
         | sed"
        
           | dekhn wrote:
           | yep- and I went straight to perl after learning sed, and
           | ignoring awk. awk looked even weirder than perl (I wasn't a
           | big fan of the pattern matching style). In retrospect, I
           | think awk is a massively underappreciated (for its time and
           | context). I can't say I'd want to work with it regularly
           | (same for perl; in the long run, I prefer variants of C
           | style).
        
       | asicsp wrote:
       | HN discussion threads for some of the links mentioned in the
       | article:
       | 
       | * Using AWK and R to parse 25TB -
       | https://news.ycombinator.com/item?id=20293579
       | 
       | * Command-line Tools can be 235x Faster than a Hadoop Cluster -
       | https://news.ycombinator.com/item?id=17135841
       | 
       | * The State of the AWK -
       | https://news.ycombinator.com/item?id=23240800
       | 
       | For awk alternative implementations, I'm keeping an eye on frawk
       | [0]. Aims to be faster, supports csv, etc.
       | 
       | [0] https://github.com/ezrosent/frawk
        
         | nmz wrote:
         | CSV is a complicated format but that does not mean awk is
         | incapable of dealing with it.
         | 
         | https://www.gnu.org/software/gawk/manual/html_node/Splitting...
         | 
         | https://github.com/e36freak/awk-libs/blob/master/csv.awk
         | 
         | https://raw.githubusercontent.com/Nomarian/Awk-Batteries/mas...
        
           | boogies wrote:
           | > CSV is a complicated format
           | 
           | Surprisingly and unnecessarily so:
           | 
           | > ["DSV"] is to Unix what CSV (comma-separated value) format
           | is under Microsoft Windows and elsewhere outside the Unix
           | world. CSV (fields separated by commas, double quotes used to
           | escape commas, no continuation lines) is rarely found under
           | Unix.
           | 
           | > In fact, the Microsoft version of CSV is a textbook example
           | of how not to design a textual file format. Its problems
           | begin with the case in which the separator character (in this
           | case, a comma) is found inside a field. The Unix way would be
           | to simply escape the separator with a backslash, and have a
           | double escape represent a literal backslash. This design
           | gives us a single special case (the escape character) to
           | check for when parsing the file, and only a single action
           | when the escape is found (treat the following character as a
           | literal). The latter conveniently not only handles the
           | separator character, but gives us a way to handle the escape
           | character and newlines for free. CSV, on the other hand,
           | encloses the entire field in double quotes if it contains the
           | separator. If the field contains double quotes, it must also
           | be enclosed in double quotes, and the individual double
           | quotes in the field must themselves be repeated twice to
           | indicate that they don't end the field.
           | 
           | > The bad results of proliferating special cases are twofold.
           | First, the complexity of the parser (and its vulnerability to
           | bugs) is increased. Second, because the format rules are
           | complex and underspecified, different implementations diverge
           | in their handling of edge cases. Sometimes continuation lines
           | are supported, by starting the last field of the line with an
           | unterminated double quote -- but only in some products!
           | Microsoft has incompatible versions of CSV files between its
           | own applications, and in some cases between different
           | versions of the same application (Excel being the obvious
           | example here).
           | 
           | -- _The Art of Unix Programming_
           | http://www.catb.org/~esr/writings/taoup/html/ch05s02.html
        
       | SjorsVG wrote:
       | I find it very unpleasant to read Awk code. It looks as bad as
       | regex to me.
        
       | nesuse wrote:
       | There's a free awk course here for anyone interested
       | https://www.udemy.com/course/awk-tutorial/
        
       | justin_oaks wrote:
       | I only recently learned Awk enough to be useful. But I still
       | don't reach for it when I probably should.
       | 
       | What are the most common cases where you reach for Awk instead of
       | some other tools?
       | 
       | I recently used it to parse and recombine data from the OpenVPN
       | status file. That file has a few differently formatted tables in
       | the same file. Using Awk, I was able to change a variable as each
       | table was encountered, this I could change the Awk program
       | behavior by which table it was operating on.
        
         | coliveira wrote:
         | Anything that is command line based and needs small changes to
         | text input can be done with awk. It is a very competent
         | language for scripts.
        
         | throwawayboise wrote:
         | I use it a lot to filter, slice, and dice CSV (or other
         | delimited) or fixed-format files. Sometimes I'll use q[1] if my
         | needs are more complex. Or awk piped to q. It can be used as a
         | fairly decent report generator for plain-text or HTML reports.
         | 
         | An time I want to process a bunch of lines in a text file, awk
         | is my first consideration.
         | 
         | [1] http://harelba.github.io/q/
        
         | jedimastert wrote:
         | From what I can tell, Awk really shines in two places,
         | transformation and collation, both of which require some form
         | of structured file. You can transform one structure into
         | another and you can process record by record to some form of
         | collation or summary.
        
         | chasil wrote:
         | Here is a script that I use to send SMTP mail, via the gawk
         | networking extensions. I have a few different versions, but
         | this is the most basic:                   #!/bin/gawk -f
         | BEGIN { smtp="/inet/tcp/0/smtp.yourhost.com/25";
         | ORS="\r\n"; r=ARGV[1]; s=ARGV[2]; sbj=ARGV[3]; #
         | /usr/local/bin/awkmail to from subj < in              print
         | "helo " ENVIRON["HOSTNAME"]       |& smtp; smtp |& getline j;
         | print j         print "mail from: " s                   |&
         | smtp; smtp |& getline j; print j         if(match(r, ","))
         | {           split(r, z, ",")           for(y in z) { print
         | "rcpt to: " z[y]  |& smtp; smtp |& getline j; print j }
         | }         else { print "rcpt to: " r              |& smtp; smtp
         | |& getline j; print j }         print "data"
         | |& smtp; smtp |& getline j; print j              print "From: "
         | s                        |& smtp; ARGV[2] = ""   # not a file
         | print "To: " r                          |& smtp; ARGV[1] = ""
         | # not a file         if(length(sbj)) { print "Subject: " sbj |&
         | smtp; ARGV[3] = "" } # not a file         print ""
         | |& smtp              while(getline > 0) print                |&
         | smtp              print "."                               |&
         | smtp; smtp |& getline j; print j         print "quit"
         | |& smtp; smtp |& getline j; print j              close(smtp) }
         | # /inet/protocol/local-port/remote-host/remote-port
         | 
         | This allows me to bypass the local MTA (if present). The
         | message ID is also returned, which can be useful to log.
        
         | mellavora wrote:
         | try running this: awk '{cmd="rm " FILENAME; print cmd;
         | system(cmd) }' file*
         | 
         | best results if you do 'sudo' first
         | 
         | ymmv
        
           | [deleted]
        
           | generalizations wrote:
           | At least add a /s to your comment. I like learning from the
           | stuff people comment on here, and while there's an element of
           | "that would be an important lesson" to what you posted, it's
           | mostly just an unnecessary landmine.
        
         | exdsq wrote:
         | I had to take large CSV files like {question, right_ans,
         | wrong_ans1, wrong_ans2, wrong_ans3} and covert them into SQL
         | insert files. Few caveats - some could be duplicates, some
         | characters were not allowed, and some had formatting issues.
         | The first issue was avoided by upserting, but the other two I
         | used Awk and Sed for and put together a fairly robust script
         | far quicker than if I reached for Python. I probably would have
         | reached for Python if I realised how many edge cases there were
         | but I didn't know that at the start so the script just sort of
         | grew as I went along, but now they're my go-to tools for
         | similar tasks.
        
           | WhatIsDukkha wrote:
           | """I probably would have reached for Python if I realised how
           | many edge cases there were"""
           | 
           | This is the counter for all the "success" stories of awk
           | users that walked away with an underspecced and
           | underdeveloped 5 minute solution.
        
             | throwawayboise wrote:
             | Most people reach for what they know best. I'm not sure it
             | really proves anything about relative merits.
        
           | chasil wrote:
           | Awk is not really very good at reading complex CSVs (as
           | defined in RFC-4180), where newlines (record separators) can
           | appear within quoted strings. It can be done, but sometimes
           | it's tricky.
           | 
           | The PHP fgetcsv function has been more convenient when I have
           | had more exotic examples.
           | 
           | If the CSV is simple, awk remains a very good tool.
        
             | throwawayboise wrote:
             | CSVs with quoted fields and imbedded newlines can be
             | troublesome in awk. Years ago I had found a script that
             | worked for me, I'm not sure but I think it was this:
             | 
             | http://lorance.freeshell.org/csv/
             | 
             | There's also https://github.com/dbro/csvquote which is more
             | unix-like in philosophy: it sits in a pipeline, and only
             | handles transforming the CVS data into something that awk
             | (or other utilities) can more easily deal with. I haven't
             | used it but will probably try it next time I need something
             | like that.
        
             | nmz wrote:
             | if the csv is RFC-4180 then it can handle it[0]. the only
             | caveat is that you can't disable FS="" correctly. but a
             | gawk -i ./csv.awk -e '{print $5}' would work on most csv
             | files I've tried.
             | 
             | https://raw.githubusercontent.com/Nomarian/Awk-
             | Batteries/mas...
        
         | cturner wrote:
         | Have found static builds of awk useful in low-dependency work.
         | I bundled it with a windows installer to do some wrangling we
         | needed at install time. Another time I was sending packages to
         | a unix cluster, but did not have access myself. Used awk as
         | part of the bootstrap for the package.
         | 
         | I used to write event-driven scripts off it - each line is a
         | message, interpreted by awk. Something I was not able to get
         | working with any of the awks I tried was where you append
         | messages to the file as you are consuming it (this is kind of
         | like code generation). I ended up doing this in python
         | (https://github.com/cratuki/interface_script_py).
        
       | jrochkind1 wrote:
       | My first job getting paid to program was in awk. Processing log
       | files.
       | 
       | In the middle of that job, my supervsior, you know what, we're
       | doing increasingly complicated things with awk and it's getting
       | increasingly hacky... I've heard that Perl is like awk but
       | better, do you want to learn Perl and switch to that?
       | 
       | And so we did. My thought then was there was little that was
       | easier in awk than Perl, you could use Perl very much like awk if
       | you wanted, you can even use the right command-line args to have
       | Perl have an "implied loop" like awk... but then you can do a lot
       | more with Perl too.
       | 
       | I don't use Perl anymore. Or awk.
        
         | linuxlizard wrote:
         | I think I remember reading somewhere Larry Wall was inspired to
         | create Perl in order to combine awk+sed functionality. He was
         | sick of awk+sed being almost powerful enough to do what he
         | needed. (I can't find a reference to this though.)
        
       | arendtio wrote:
       | Learning awk is actually pretty simple. For years I just used the
       | '{print $2}' version to extract fields, but after reading some
       | short book I felt pretty confident of having understood the
       | basics.
       | 
       | Sadly I don't remember which book it was, but this page looks
       | like a good start: https://ferd.ca/awk-in-20-minutes.html
        
         | abecedarius wrote:
         | Likely the one by A, W, and K.
         | https://news.ycombinator.com/item?id=13451454
        
           | arendtio wrote:
           | Yes, this looks like it. Thanks :-)
        
       | vyuh wrote:
       | "A good programmer uses the most powerful tool to do a job. A
       | great programmer uses the least powerful tool that does the job."
       | I believe this, and I always try to find the combination of
       | simple and lightweight tools which does the job at hand
       | correctly.
       | 
       | Awk sometimes proves surprisingly powerful. Just look at the
       | concision of this awk one liner doing a fairly complex job:
       | zcat large.log.gz | awk '{print $0 | "gzip -v9c >
       | large.log-"$1"_"$2".gz"}' # Breakup compressed log by syslog date
       | and recompress. #awksome
       | 
       | Taken from:
       | https://mobile.twitter.com/climagic/status/61415389723039744...
        
         | zdwolfe wrote:
         | I really love that quote "..A good programmer...", do you have
         | a source?
        
         | dunefox wrote:
         | Ehh. Until the 'job' gets extended and then your simple tool
         | makes it exponentially more complex and you have to rewrite it
         | with the more powerful tool.
        
           | klyrs wrote:
           | The nice thing about a 1-liner is you only lose a few minutes
           | to throwing it out entirely and rewriting it to fit a new
           | purpose. Dwelling on what might be needed is of limited
           | utility, because of the very real possibility that what's
           | actually needed in the future is wildly different from what
           | you spent all that time planning for.
        
           | selfhoster11 wrote:
           | This is fine. I often "prototype" my automations as shell
           | scripts, to explore what I actually want the tool to handle.
           | Once it gets longer than 20 or so lines, it's time to move to
           | a better language, but I don't mind rewriting. This is a
           | chance to add error handling, config, proper arguments,
           | built-in help texts and whatever else.
        
             | teknopaul wrote:
             | I started to add error handling to my shell scripts and
             | often never rewrite them. Defo agree with the sentiment
             | that you should always be happy (and able) to rewrite a
             | shell scripts, dont let its scope creep. I don't mind
             | long(ish) shell scripts as long as the program flow is
             | fairly linear. Too many function calls is the smell that
             | makes me rewrite.
        
           | inanutshellus wrote:
           | Choosing a "good enough for the medium term with minimal
           | effort now" is a winner in my book, even if it's likely to be
           | rewritten in the long term.
        
             | selfhoster11 wrote:
             | Exactly. I end up re-implementing my scripts if they
             | outgrow the original scripting language anyway, because
             | it's a good time to add proper argument and error handling,
             | logging, etc.
        
           | Folcon wrote:
           | Surely that isn't a weakness of a simple tool?
           | 
           | A 5 min job that probably won't get extended saving you from
           | having to spend 20 mins coding something up is better than,
           | feeling annoyed that you have spent the 20 mins coding up the
           | original implementation and then extend it.
           | 
           | Hopefully, you also get the benefit of additional knowledge
           | on that future implementation as well. Why wouldn't this just
           | be a net win?
           | 
           | Unless you're talking about writing hack after hack after
           | hack, eventually leaving yourself with some incomprehensible
           | eldritch monstrosity, in which case, don't do that?
        
         | rakoo wrote:
         | If I understand this correctly, it will gzip every line
         | separately instead of gzipping them together... it's not really
         | the most effective but it does work
        
           | aidenn0 wrote:
           | It does not. The pipe command leaves the pipe open and
           | successive pipes with identical strings remain open until the
           | pipe is explicitly closed.
           | 
           | [edit]
           | 
           | Here's the link to the gawk documentation, but most flavors
           | of AWK work similarly:
           | https://www.gnu.org/software/gawk/manual/gawk.html#Close-
           | Fil...
        
             | rakoo wrote:
             | Wow, this is amazing. It really shows how complexity should
             | be managed in the tool so that the user can do the naive
             | thing and have it be accidentally optimal
        
               | aidenn0 wrote:
               | It is surprising to people who expect them to behave like
               | shell pipelines and redirections though. I somehow never
               | got bit by it, but have definitely corrected other's awk
               | scripts who didn't know about this feature.
        
       ___________________________________________________________________
       (page generated 2021-09-07 23:01 UTC)