hngopher.com

       [HN Gopher] The Awk Programming Language, Second Edition
       ___________________________________________________________________
        
       The Awk Programming Language, Second Edition
        
       Author : 0x54MUR41
       Score  : 475 points
       Date   : 2023-06-29 07:34 UTC (15 hours ago)
        
 (HTM) web link (awk.dev)
 (TXT) w3m dump (awk.dev)
        
       | benhoyt wrote:
       | I was privileged to be one of the technical reviewers for this
       | book. There's a fair bit of the original content (which is still
       | great), but Kernighan's done a great job with some good
       | restructuring and some significant updates, too. The early
       | chapters are very hands-on, with something of a focus on
       | "exploratory data processing", particularly with CSV files. Big
       | data with AWK, you could say.
       | 
       | Gawk and awk will soon have a new "--csv" option that enables
       | proper CSV input mode (parsing files with quoted and multiline
       | fields per the CSV RFC). I'm really glad Arnold Robbins added a
       | robust "--csv" implementation to Gawk, too, because that's really
       | the most-heavily used version of AWK nowadays. I've already got
       | CSV support in my own GoAWK implementation, and I'll be adding "
       | --csv" to make it compatible.
       | 
       | I'm really glad this new updated version is coming out!
        
         | Simon_O_Rourke wrote:
         | > Gawk and awk will soon have a new "--csv" option that enables
         | proper CSV input mode
         | 
         | Awesome!!!! Super excited to see this!
        
         | calvinmorrison wrote:
         | Its a crying shame we never settled on a control character
         | separated text format. There's a ascii control characters for
         | record and field (unit) separators. A bit of user space support
         | for that would have been great.
        
           | PeterisP wrote:
           | Tab-delimited "csv" formats are quite common (e.g. the CONLL
           | format family for many natural language processing tasks) and
           | also supported by common tools such as MS Excel for decades
           | already.
        
           | hermitcrab wrote:
           | Some discussion of that here:
           | https://news.ycombinator.com/item?id=31220841
           | 
           | To be really useful as a format it would just need for text
           | editors to: -display something distinct for the field
           | separator (some editors do this) -treat the record separator
           | character like a carriage return (not aware of any editors
           | that do this)
        
             | throw0101c wrote:
             | > _To be really useful as a format it would just need for
             | text editors to_
             | 
             | This made me think of WordPerfect's "reveal codes"
             | functionality. :)
             | 
             | (Word's "Reveal Formatting" is supposedly similar.)
        
             | coldtea wrote:
             | > _To be really useful as a format it would just need for
             | text editors to: -display something distinct for the field
             | separator_
             | 
             | Which would be trivial too.
        
               | hermitcrab wrote:
               | The programming might be straightforward. Trying to
               | persuade the product owners to do it is a different
               | matter.
        
             | calvinmorrison wrote:
             | Right. The issue is the user space support at the end of
             | the day.
        
           | galleywest200 wrote:
           | It is a shame. I have been using tab-separated sheets
           | recently as it allows me to simply not care about almost any
           | possible character in my strings...apart from tabs of course.
           | But those are far less common than commas, and putting
           | strings in quotes 100% of the time looks messy to me.
        
             | calvinmorrison wrote:
             | Way less common would be using ascii 30 and ascii 31. ascii
             | 29 and you can cram multiple datasets into one file
        
           | lolive wrote:
           | Most important comment I have ever read on HN ever !
        
           | bachmeier wrote:
           | As I recall, you can tell Awk to use the control characters
           | as record and field separators. Not helpful if you're getting
           | your data from others, but if you're working by yourself, you
           | have the option. I've come to use control characters as a
           | default because it makes life so much easier.
        
             | ufo wrote:
             | What do you recommend for viewing and editing such files?
        
               | ac29 wrote:
               | Visidata works with arbitrary separators. I just tried
               | with a CSV separated with  (ASCII unit separator) and it
               | worked just fine.
        
               | lolive wrote:
               | Excel too?
        
           | JdeBP wrote:
           | Miller handles it.
           | 
           | * https://miller.readthedocs.io/en/6.8.0/file-
           | formats/#csvtsva...
           | 
           | I have programs that handle it.
           | 
           | * https://jdebp.uk/Softwares/nosh/guide/commands/console-
           | flat-...
        
         | nmz wrote:
         | It's nice that everyone is supporting this, I've written a
         | portable awk module that takes control of the parsing and it is
         | SLOW (and a little buggy). I'm a little bummed that nobody will
         | use it but this is truly a step in the right direction.
         | 
         | I guess for the people that are still using nawk, you can set
         | up an AWK envvar so you can { awk -f $AWKU/ucsv.awk -f <(echo
         | '{print NR, $1}') }
         | 
         | https://github.com/Nomarian/Awk-Batteries/blob/master/Units/...
        
         | anyfactor wrote:
         | Our data product is delivered in CSV format. Even though I
         | create user documentation mainly using csvkit, grep and sed, I
         | would love to convert all those solutions to AWK. Sometimes AWK
         | is more readable than sed and csvkit requires installation.
         | 
         | It will be nice to have a awk cookbook for CSV. In terms of CSV
         | maniupulation and querying there is only a limited number of
         | operations and I think there is potential to standardize those
         | operation using AWK.
        
         | cauthon wrote:
         | This is amazing, I may never use pandas again
        
         | lost_tourist wrote:
         | Would you say the first few chapters are enough to get the
         | 75-80% usefulness for mere mortals like me who will never try
         | to master the full language? Or is the material fairly
         | sprinkled throughout the whole tome?
        
           | benhoyt wrote:
           | Yes, definitely. The first three chapters would be more than
           | enough for that: 1) An Awk Tutorial, 2) Awk in Action, and 3)
           | Exploratory Data Analysis. For most people who just want to
           | use AWK for one-liners on the command line, you can stop
           | there. The rest of the chapters are about writing larger
           | (still small! but not one-liner) programs in AWK to create
           | reports, little languages, and experiment with algorithms.
        
         | tomcam wrote:
         | Ben is not just any old technical reviewer. He wrote a version
         | of AWK in go and has done a ton of other work in the AWK
         | ecosystem.
        
       | donatj wrote:
       | I love awk. It's everywhere and every time I am writing a shell
       | script and work myself into a corner, awk has been the way out.
       | 
       | I know exactly enough to be dangerous and have meant to deep dive
       | for almost a decade.
        
         | IggleSniggle wrote:
         | See, when I'm writing a shell script interactively and work
         | myself into a corner, I reach for awk, struggle with it for a
         | bit, and then either:
         | 
         | 1) succeed, and regret the messiness of the solution
         | 
         | or
         | 
         | 2) fail, and find a non-awk way to handle it.
         | 
         | I really tried to like awk, but its portability hasn't been
         | enough of a feature to raise it above other scripting languages
         | for me. Especially if I'm going to end up in an editor
        
         | coliveira wrote:
         | awk can be mastered by just reading the man page. The book
         | doesn't take long to read either. Once you understand the
         | simple principles, you can write an infinite number of scripts
         | for all kinds of tasks.
        
       | rochak wrote:
       | I wish I use awk all the time but everytime I use it the
       | knowledge I gain doesn't stick. Could be due to its arcane syntax
       | which is just too hard for me to remember.
        
       | dzogchen wrote:
       | Wow, hyped for this.
       | 
       | I picked up this little book from my University library once, and
       | it was a fantastic read.
        
       | jhoechtl wrote:
       | I love the csv-mode. It obviously takes some time
        
       | ahalbert wrote:
       | I love using Awk, the only thing I miss is that it can't handle
       | complex csv files. Does anyone know how to handle quoted CSV
       | strings like
       | 
       | > "foo","bar,baz"
        
         | asicsp wrote:
         | If quoted string is the only thing you need to handle extra
         | (i.e. no escaped quotes, newlines, etc) and if you have GNU
         | awk:                   $ echo '"foo","bar,baz"' | awk -v
         | FPAT='"[^"]*"|[^,]*' '{print $1}'         "foo"         $ echo
         | '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
         | "bar,baz"
         | 
         | For a more robust solution, see
         | https://stackoverflow.com/q/45420535 or use other tools like
         | https://github.com/BurntSushi/xsv
        
           | poetaster wrote:
           | I wanted to ask why not the more simple form:
           | 
           | echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $1}'
           | "foo
           | 
           | echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $2}'
           | bar,baz
           | 
           | echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $3}'
           | boo"
           | 
           | Realizing that I have to strip the quotes that remain.
           | 
           | Edit. formatting.
           | 
           | EDit, again, from your link, the following is more terse and
           | too my taste (still needs strips):
           | 
           | awk -v FPAT='("[^"]*")+'
        
         | wmwragg wrote:
         | I usually use this awk function to parse CSV in awk:
         | # This function takes a line i.e. $0, and treats it as a line
         | of CSV, breakin         # it into individual fields, and
         | storing them in the passed in field array. It         # returns
         | the number of fields found, 0 if none found. It takes account
         | of CSV         # quoting, and also commas within CSV quoted
         | fields, but doesn't remove them         # from the parsed
         | field.         # use in code like:         #   number_of_fields
         | = parse_csv_line($0, csv_fields)         #   csv_fields[2]  #
         | get second parsed field in $0         function
         | parse_csv_line(line, field,   _field_count) {
         | _field_count = 0           # Treat each line as a CSV line and
         | break it up into individual fields           while (match(line,
         | /(\"([^\"]|\"\")+\")|([^,\"\n]+)/)) {
         | field[++_field_count] = substr(line, RSTART, RLENGTH)
         | line = substr(line, RSTART+RLENGTH+1, length(line))           }
         | return _field_count         }
         | 
         | It's not perfect but gets the job done most of the time and
         | works across all awk implementations.
        
         | lysium wrote:
         | They are planning built-in support for that, see that other
         | comment https://news.ycombinator.com/item?id=36518146
        
         | JdeBP wrote:
         | Convert it with Miller first:                   mlr --icsv
         | --otsv cat examplefile
         | 
         | * https://miller.readthedocs.io/en/latest/10min/
        
         | dbro wrote:
         | Yes, this is what csvquote does. It does nothing else, just
         | this so that programs like awk, sed, cut, etc. can work
         | properly.
         | 
         | https://github.com/dbro/csvquote
        
         | geophile wrote:
         | I like the idea of Unix pipelines, but I hate all the
         | sublanguages, awk being one of the biggest. I scratched my itch
         | and built my own shell, marcel:
         | https://github.com/geophile/marcel.
         | 
         | I mention this specifically, here, because of the CSV point.
         | Marcel handles CSV, e.g. "read --csv foobar.csv" reads the
         | foobar.csv file, parses the input (getting quotes and commas
         | correct), and yields a stream of Python tuples, splitting each
         | line of the CSV into the elements of the output tuples.
         | 
         | Marcel also supports JSON input, translating JSON structures
         | into Python equivalents. (The "What's New" section of marcel's
         | README has more information on JSON support, which was just
         | added.)
        
       | binary_ninja wrote:
       | Awk has always been a language that I loved but I have struggled
       | to use besides quick jobs for parsing text files. I understand it
       | is meant to be use for exactly that, but the fact that is simple,
       | fast and lightweight sometimes makes me want to do something more
       | with it, but when I start trying to do something besides parsing
       | text I find that it starts becoming awkward (pun intented?).
        
         | tripflag wrote:
         | I have found a handful of unconventional applications for awk
         | -- I once needed a tiny pcm pulsewave generator, and awk was
         | surprisingly decent for the job [1].
         | 
         | Aside from that I've mostly been using it for quick statistics
         | [2], but it quickly moves into perl territory...
         | 
         | 1:
         | https://github.com/9001/asm/blob/hovudstraum/etc/bin/beeps#L...
         | 
         | 2: https://ocv.me/doc/unix/oneliners/#965bfcb8
        
         | PhilipRoman wrote:
         | I find it pretty nice for writing simple preprocessors. For
         | example I have one which takes anything between two marker
         | lines and pipes it through a command (one invocation per
         | block). Awk has an amazing pipe operator which lets you do
         | something like this:                   ... {             print
         | $0 | "command"         }
         | 
         | "command" is executed once, and the pipe is kept open until
         | closed explicitly by close("command"), at which point the next
         | invocation will execute it again. The command string itself
         | acts as a key for the pipe file descriptor.
         | 
         | And of course, no mention of awk is complete without the "uniq"
         | implementation, which beats the coreutils uniq in every way
         | possible (by supporting arbitrary expressions as keys and not
         | requiring sorted input):                   !a[$0]++
        
         | usrbinbash wrote:
         | > but the fact that is simple, fast and lightweight
         | 
         | I see awk as a DSL to be honest. Yes, it _can_ be used as a
         | general purpose language, but that quickly becomes, as you say,
         | awkward :D
         | 
         | Like many DSLs, it is simple, fast and lightweight _as long as
         | it is used for it 's intended purpose_. Once you start using it
         | for something else, these advantages evaporate pretty quickly,
         | because then you have to essentially work around the DSL design
         | to get it to do what you want.
        
           | snitty wrote:
           | DSL == Domain Specific Language?
        
             | Rediscover wrote:
             | Yes
        
               | coliveira wrote:
               | One simple thing I do with awk is to create a command
               | processor: read one line at a time and do things on my
               | data as a response. This is very useful because you can
               | make your command as powerful as needed and call other
               | unix tools as a result.
        
               | rsolva wrote:
               | Do you have an example of this that is available
               | somewhere?
        
         | kqr wrote:
         | This is exactly why I moved from AWK to Perl for these quick
         | jobs a couple of years ago. If you stick to an AWK-like subset,
         | Perl is also simple, fast and lightweight. If you want to grow
         | your scripts (and you have a lot of discipline) Perl - in
         | contrast to AWK - gives you enough noose to hang^W^W^W^Wthe
         | tools you need.
        
           | joeythedolphin wrote:
           | Perl? Wow. Is that better than bash, python or even nodejs?
           | Why write in Perl over these? Serious question, was
           | propaghandized to hate Perl.
        
             | gpvos wrote:
             | Absolutely. It is comparable to python in some ways, but
             | makes it much easier to write quick one-liners using
             | regexes and data manipulation, and to scale those up to
             | real programs. It fills the gap between bash scripts using
             | awk, grep and sed, and C/java/C#. Compared to bash
             | scripting, perl is a real programming language. The
             | documentation and library ecosystem are excellent,
             | backwards compatibility is legendary, yet it supports
             | modern Unicode. The syntax is weird, but try it for a bit,
             | read the man pages, it's not that hard. The OO system is
             | weirder, and I wouldn't make complex class hierarchies in
             | it, but it is usable.
        
               | marttt wrote:
               | I like how Awk is just a single executable. A single-
               | executable Perl that includes only the core library would
               | be great. There is Microperl [0, 1], but no idea how well
               | it compiles with more up-to-date Perl versions.
               | 
               | 0: https://github.com/bentxt/microperl-standalone
               | 
               | 1: Original article from 2000 by the author Simon Cozens:
               | https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0003.ht
               | ml
        
             | theonemind wrote:
             | There's a limited problem domain where it's unquestionably
             | the best. Perl beats awk and bash at their own game on
             | their home turf. That's the best way to put it. It's
             | faster, has more shortcuts, less warts, more power, and
             | more readability when well written, and while aged and not
             | huge by modern standards, CPAN (like pypi or npm) is
             | incredible for a hyper-powered awk and bash mash-up for
             | those tasks at the edge of of that limited problem domain.
             | It's installed almost everywhere, so almost always
             | available.
             | 
             | That stuff is just awkward and painful in Python by
             | comparison.
        
             | bandrami wrote:
             | Perl is super-specialized at reporting (that's in fact the
             | "r" in Perl). In particular there's a bunch of extremely
             | useful implicitly defined variables that take their context
             | from your place in a line-by-line loop through a text file.
        
             | ilc wrote:
             | Perl is a great language, but please listen to this old
             | perl programmer's advice:
             | 
             | 1. You can write totally unreadable perl. It is probably
             | the single worst language in this regard most programmers
             | will run into. Be careful to make your code readable.
             | 
             | 2. Keep your amount of perl small. 200-300 lines is a good
             | bit of it.
             | 
             | So for quick bang it out scripts that want to parse text
             | etc... perl is great. For writing a major application, not
             | so much.
        
             | sigzero wrote:
             | Better than BASH? Mostly. Better than Python, subjective as
             | you would have to use them both yourself. I lean towards
             | Perl as I like sigils to denote things. I have nothing
             | against Python though. Both are typically installed as a
             | default now. I have never used nodejs for sys admin work.
        
             | IggleSniggle wrote:
             | I write bash python and nodejs all day, and have no
             | professional history with Perl.
             | 
             | One day while avoiding working on something important, I
             | spent half a day learning Perl in order to implement
             | something related to a build tool that was being used in
             | the important thing I was avoiding.
             | 
             | I was blown away. It's a really delightful language. Its
             | big downfall is that it makes it feel good to do something
             | "clever."
             | 
             | Perl is a joy to write, and a devil to read. I liked it,
             | and wish I had started my career earlier so I could have
             | enjoyed Perl in its heyday.
             | 
             | I have similar feelings about Ruby.
        
               | gpvos wrote:
               | You need to make sure that you write the clever bits
               | clearly. Maybe add a comment. It takes some discipline,
               | but isn't hard.
               | 
               | In fact, Perl remains remarkably robust if you stack
               | clever tricks on top of each other.
        
             | j1elo wrote:
             | I don't write Perl code, but its CLI has been a very good
             | way to replace _sed_ with something decent. _sed_ not
             | supoorting Perl regex syntax, the most commonly kind of
             | regex out there by large, is frankly disappointing. Even
             | _grep_ was able to put it together and add the _-P_ switch.
             | But _sed_ is still stuck in the prehistoric syntax of ERE (
             | "Extended Regular Expressions", as described in man pages)
             | which e.g. instead of _\d_ for a digit, use _[[:digit:]]_ ,
             | a syntax present in... zero? other tools or programming
             | environments.
        
             | radiator wrote:
             | When discussing such languages, I would like to point out
             | that Raku is also an option.
        
             | SoftTalker wrote:
             | One other advantage is that Perl will be found in the base
             | install of almost any unix-like system. Python, nodejs,
             | even bash may not.
        
             | tyingq wrote:
             | The same shortcut syntax that people complain about does
             | make perl really handy for one-time tasks where you're
             | iterating on ideas. Lots of features there that make that
             | easy. One example:                 #!/usr/bin/perl
             | while (<>) {           # various processing here
             | # $ARGV is set to either "-" for piped input, or the
             | current filename           # $_ is the data of the current
             | line       }
             | 
             | That (<>) construct accepts data from stdin, redirection or
             | file(s) named as arguments and iterates over the data.
             | There's lots of things like that throughout the language.
        
               | jandrese wrote:
               | And you can avoid even that minor boilerplate with the -n
               | or -p flag. It even supports BEGIN and END like awk.
        
             | Woeps wrote:
             | Perl better? maybe or maybe not.
             | 
             | It can be very useful and they are pretty robust. I often
             | found Perl scripts running for years and years without
             | issues at different companies.
             | 
             | My main issue with Perl-scripts is that they often are not
             | "readable" by anybody but the original creator. Which of
             | course left the company. (not a fault of Perl itself tough)
             | 
             | But your millage may vary and any script can be made
             | (un)readable.
        
               | thesuperbigfrog wrote:
               | >> My main issue with Perl-scripts is that they often are
               | not "readable" by anybody but the original creator.
               | 
               | Anyone writing Perl scripts like this should not be
               | trusted with _any_ programming language.
               | 
               | Perl scripts are no less readable than bash scripts or
               | Awk scripts. This is because so much of Perl was written
               | to do the same work as bash, awk, sed, and the other
               | related Unix text processing command line programs, but
               | all under one roof.
               | 
               | Don't believe me? Take a look for yourself:
               | 
               | https://learn.perl.org/
               | 
               | http://blob.perl.org/books/impatient-perl/iperl.htm
        
               | ilovecurl wrote:
               | Perl can also be hilariously unreadable: https://www.foo.
               | be/docs/tpj/issues/vol4_3/tpj0403-0017.html
        
               | thesuperbigfrog wrote:
               | >> Perl can also be hilariously unreadable: https://www.f
               | oo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html
               | 
               | Most programming languages can be obfuscated. That does
               | not mean people write code in those programming languages
               | like that:
               | 
               | C: https://www.ioccc.org/
               | 
               | Javascript: view-source:https://www.google.com/
               | 
               | The truth is that insulting Perl is considered stylish by
               | some, so many people do despite knowing little to nothing
               | about Perl and having never used it.
               | 
               | However, if you _want_ Perl to be _hilariously
               | unreadable_ , why not write it in Latin:
               | 
               | https://metacpan.org/dist/Lingua-Romana-
               | Perligata/view/lib/L...
               | 
               | Or Klingon:
               | 
               | https://metacpan.org/pod/Lingua::tlhInganHol::yIghun
        
               | anthk wrote:
               | [flagged]
        
               | wott wrote:
               | It's like when we, Gen-X ers, were repeating bad stuff
               | about COBOL without having seen a single line of it.
               | 
               | Then I saw a real COBOL program and... well... it was
               | even worse than what I had imagined :-)
        
               | jandrese wrote:
               | I've always found it weird that people bash on Perl
               | relentlessly for being hard to read and then turn around
               | and praise Rust's syntax when it is full of stuff like
               | this:                   fn print_d(t: &'static impl
               | Display) {
        
             | throwawaaarrgh wrote:
             | Have you ever tried to dig a hole? What tool did you use?
             | 
             | - Want to cut through and move loam, compost, sandy, and
             | compacted soil? You're gonna want a rounded shovel.
             | 
             | - Want to break up rocky, clay soil? A pick mattock will
             | penetrate deep, breaking up soil, shattering smaller rocks,
             | and is used as a lever to uproot. A tiller is a faster
             | method but disturbs the soil more.
             | 
             | - Want to dig a narrow, deep hole? An augur will quickly
             | break up rocks and soil in a shaft and move them upwards.
             | 
             | What do you use the Perl tool for?
             | 
             | - Quickly and efficiently open files, read line by line,
             | analyze text, and perform any kind of operation you can
             | think of, with complex data structures, objects and modular
             | code, using very few lines of code.
             | 
             | - Executing external commands with a shell, returning their
             | output, and making complex yet short programs easily with
             | arguments to the interpreter from a command line.
        
               | anthk wrote:
               | Perl can do sh/awk/sed and a bunch more at once.
        
             | throw0101a wrote:
             | > _Perl? Wow. Is that better than bash, python or even
             | nodejs? Why write in Perl over these?_
             | 
             | It depends on scale.
             | 
             | If you have some quick parsing to do, then awk will get you
             | started quickly, but as you expand your experimentation on
             | what you want to extract/manipulate, it may not be easy to
             | add onto the awk beginnings of your "one liner".
             | 
             | But if you start with awk-like+ syntax but invoking it with
             | Perl, then if you find you have to expand, Perl has more
             | elbow room.
             | 
             | The intention is not to 'go big', which those other
             | languages may be better at, but to more easily 'start
             | small'.
             | 
             | + IIRC, Larry Wall wanted a utility that had awk/(s)ed-like
             | syntax for text manipulation, just 'with more'.
        
         | bluetomcat wrote:
         | It's a language for creating quick alternative views from line-
         | and column-oriented text streams. That means, take the output
         | of another tool and represent it in a different way.
        
         | asicsp wrote:
         | I use awk mostly for one-liners and resort to Python when I
         | need more than a few lines of code.
        
       | ducktective wrote:
       | Also watch his recent interview on Computerphile:
       | https://www.youtube.com/watch?v=GNyQxXw_oMQ
       | 
       | And: Brian Kernighan adds Unicode support to Awk
       | https://news.ycombinator.com/item?id=32534173
        
         | bardak wrote:
         | Honestly after watching a lot of Kernighan interviews and
         | reading his original book on C he is a very great communicator.
         | I wonder how different the software world would have been
         | without him at Bell Labs. Would Unix and C have become as
         | widely used as quickly?
        
         | throw0101a wrote:
         | With Lex Friedman from ~2 years ago:
         | 
         | * https://www.youtube.com/watch?v=O9upVbGSBFo
        
       | getpost wrote:
       | I know lots of people like awk, but I pretend it doesn't exist.
       | Why? Here's my comment on this from 6 years ago[0],
       | 
       | >I used awk until I learned Python (long ago). For me, awk was
       | yet another example of the "worse is better" approach to things
       | so common in unix. For example, if you make a syntax error, you
       | might get a message like "glob: exec error," rather than an
       | informative message. "Worse is better" is probably a good
       | strategy in business and for getting things done, but still,
       | mediocrity and the sense of entitlement that so often goes with
       | carelessness, sickens me.
       | 
       | [0] https://news.ycombinator.com/item?id=13457265
       | 
       | Long live the Unix Hater's Handbook! (Unix is fine, and so are
       | the criticisms herein. Some of these criticisms have been
       | eclipsed by ongoing development.)
       | https://en.wikipedia.org/wiki/The_UNIX-HATERS_Handbook
        
         | pmarreck wrote:
         | I will bet you $1000 that time spent learning Awk will lead to
         | better results much faster than time spent polluting your
         | privileged user directories with Python's excuse for
         | "dependency management"
        
         | ghshephard wrote:
         | You are missing out. As a former data engineer/current SRE, I
         | spend my entire day with VSCode/Python/Notebooks/CoPilot
         | banging out python code - but whenever I need to do a complex
         | analysis of a semistructured text file in < 60 seconds, awk is
         | my twitch reflex tool. It can trivially do state transition
         | based on patterns in the file, as well as populate hashes from
         | one file and use them in analysis of the next file in just a
         | few characters.
         | 
         | Awk's claim to fame in my world is that it's cognitive
         | activation energy for anyone who has taken the 3-4 hours to
         | learn the language from start to finish (and that's the awesome
         | thing about the language - it really is about 3 hours of
         | concentrated attention) - is essentially nil. You see a bunch
         | of ugly not really structured text 500 MB files that you can't
         | pull into pandas, or easily parse into python dicts? No problem
         | - awk will tear through them for you and get the information
         | you want in < 60 seconds, including the time you took to write
         | your (almostl always single line) of code.
         | 
         | That's Awk's sweet spot.
        
           | classichasclass wrote:
           | In general Perl fits that niche for me better, but sometimes
           | awk is what you have.
        
           | getpost wrote:
           | Point taken. I have a Python program that is an elemental
           | version of awk, and I use that for the odd task. I can modify
           | it if needed and I have the entire Python library to help me.
           | Is the text Unicode? HTML? These little details matter.
           | 
           | I'm not complaining that someone banged out awk (speaking
           | figuratively) on a Friday afternoon to do something and not
           | have to stay after work. Excellent! My complaint is that the
           | failure to address technical debt has negatively affected the
           | productivity of millions, if not tens of millions, of people,
           | often working under pressure, for DECADES.
        
             | ghshephard wrote:
             | I'm not sure what technical debt you are referring to. Awk
             | is designed to do one very simple job, and it does so using
             | a language that I can usually teach to new SREs in < 2
             | Hours with 9-10 follow up tasks that drill in their
             | understanding.
             | 
             | It's benefited from extraordinarily enlightened
             | stewardship, kept it's minimalism and strengths, and will
             | finally get a key enhancement (UTF-8 support).
             | 
             | The first edition manual is probably the greatest example
             | I've ever seen of technical writing as well.
        
         | momentoftop wrote:
         | Specifically, Awk is a good solution to a problem that should
         | never have existed in the first place. Why am I having to write
         | these bespoke parsers for the random mess of output formats
         | that you get from the UNIX command line?
         | 
         | Well, the fact is that I have to write such parsers. That's
         | very sad, but has no chance of being fixed. So it's good to
         | know Awk.
         | 
         | I think Erik Naggum had this exact criticism of Perl.
        
       | kar1181 wrote:
       | One of the first utilities I had to get to grips with way back
       | was awk, and it serves me well to this day. Best bang for buck
       | investment of time in my entire career. Even today I still use
       | some variant of awk -F(x) '{print $x}'.
        
       | fgh wrote:
       | Who wrote the second edition?
        
         | Lyngbakr wrote:
         | I read a comment on HN the other day by someone reviewing the
         | book and I believe they said it was Brian Kernigan.
        
         | fuzztester wrote:
         | It was mentioned recently here in another HN thread that Brian
         | Kernighan is writing it.
        
           | B1FF_PSUVM wrote:
           | The lowercase 'bwk' used in the text makes me believe that
           | ...
        
       | apienx wrote:
       | Thanks for your work! Awk is a rabbit hole.
       | 
       | "Dark corners are basically fractal - no matter how much you
       | illuminate, there is always a smaller but darker one." - - Brian
       | Kernighan (quoted in the GNU Awk book)
        
       | sigzero wrote:
       | I am looking forward to this coming out.
        
       | asicsp wrote:
       | Have to wait, as "The book will be available by the end of
       | September"
       | 
       | See https://hn.algolia.com/?q=The+AWK+Programming+Language for
       | discussion on the first edition
       | 
       | Didn't know there was a list of `awk` implementations:
       | https://www.gnu.org/software/gawk/manual/html_node/Other-Ver...
        
       | technofiend wrote:
       | Seems like the best time to ask since this is an awk thread: if
       | anyone has a line on the original artwork or a source for the awk
       | t-shirt please let me know. From memory it's of a gangly bird
       | jumping / parachuting from an airplane (DC3?) and captioned with
       | awk's infamous catch-all error message: "Awk: bailing out near
       | line one".
        
       | proger wrote:
       | Find and AWK together, a match made in heaven. Thanks for the
       | link.
        
         | lkuty wrote:
         | do you have some resources regarding the use of awk with find ?
        
       | penguin_booze wrote:
       | I wish awk had support for addressing a range of fields, like
       | from $1 to $7. `cut` supports it, FWIW.
        
         | mplanchard wrote:
         | You can always loop through the fields, but it's a little
         | messy, especially for one-liners
        
           | penguin_booze wrote:
           | Yes, that's an option. The range lookup is an ergonomic
           | feature. Imagne what it would have been like, if we couldn't
           | do foo[-3:] in Python.
        
       | shaftoe444 wrote:
       | Can I preorder this?
        
       | pmarreck wrote:
       | Awk is old but great, designed to chew through lines of text
       | files with ease, and has great defaults that minimize the amount
       | of awk code you actually have to write to do anything. It's
       | underrated.
        
       | siraben wrote:
       | Awk is awesome! Glad that they are looking to modernize the book.
       | It wasn't really necessary, all the code examples in the original
       | edition of the book still run just fine, although some are
       | somewhat dated, like printing ASCII bar graphs. They also had
       | examples of writing VMs, parsers and interpreters in the book,
       | which run on modern implementations.[0]
       | 
       | The language has some quirks. To declare temporary variables,
       | it's common practice to add extra arguments to functions that
       | won't be used. And traversal of associative arrays is
       | implementation-dependent. I'm not sure what the situation is
       | regarding locale and UTF-8 support.
       | 
       | EDIT: Looks like Brian Kernighan added Unicode support last
       | year.[1]
       | 
       | [0] https://github.com/siraben/awk-vm/blob/master/vm.awk
       | 
       | [1]
       | https://github.com/onetrueawk/awk/commit/9ebe940cf3c652b0e37...
        
         | bluetomcat wrote:
         | Is there a particular benefit in writing a VM in AWK, placed in
         | a big BEGIN block? Very similar code can be written in Perl or
         | Python. Isn't the strength of AWK in its line-matching
         | capability, being able to pattern-match a line against a block
         | of code?
        
           | ufo wrote:
           | I love telling about that example to my programming language
           | friends.
           | 
           | > Hey you should read the AWK book, it even says how to write
           | a VM!
           | 
           | > Why would I ever want to use AWK for that?
           | 
           | > Well, the input is a text file with one space-delimited
           | instruction per line.
           | 
           | > Hmm... You have a point.
        
           | siraben wrote:
           | > Is there a particular benefit in writing a VM in AWK
           | 
           | Not really. Later on the book just ran out of line-matching
           | examples to go through and started doing regular programming
           | instead :P. When I actually write AWK code I rely on line-
           | matching and using a variable to handle state.
        
             | pdw wrote:
             | At the time, awk was the only scripting language (other
             | than shell) generally available on Unix systems. Perl, Tcl,
             | Python didn't exist yet. So awk was often used for general-
             | purpose programming.
        
           | chasil wrote:
           | AWK runs _everywhere_. Perl and Python do not.
           | 
           | Busybox has their own independent AWK implementation.
           | 
           | https://busybox.net/ https://frippery.org/busybox/
           | 
           | Also see the first edition of the AWK manual online here:
           | 
           | https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
        
         | anthk wrote:
         | On wm's, why not a Z-machine? It's ideal for this.
        
         | kqr wrote:
         | What would you suggest as an alternative to printing ASCII bar
         | graphs? I do that all the time. Takes 20 seconds and often
         | makes distributions, modalities, and patterns over time obvious
         | right away.
        
           | zimpenfish wrote:
           | `sparklines`[1] is good for an overall low-res view.
           | `termgraph`[2] is sometimes better for a higher-res, more
           | capable view (but can be finicky about the data.)
           | 
           | [1] https://github.com/deeplook/sparklines
           | 
           | [2] https://github.com/mkaz/termgraph
        
             | kqr wrote:
             | But both require depending on a third party library --
             | hardly something on a whim if ASCII bar charts do the job?
        
               | llimllib wrote:
               | gnuplot is an alternative that is available on almost as
               | many systems as awk, and can do the job as well
               | 
               | edit: this prompted me to write up a little note showing
               | how: https://notes.billmill.org/visualization/graphs/gnup
               | lot/A_ba...
        
               | dima55 wrote:
               | If you do this sort of thing more than once ever, look at
               | the feedgnuplot tool
               | (http://github.com/dkogan/feedgnuplot). It'll make your
               | life easier
        
               | llimllib wrote:
               | Neat! Once you're installing something to do terminal
               | plots though, https://github.com/red-data-tools/YouPlot
               | looks the nicest I've seen.
               | 
               | (The nice feature of feedgnuplot of course is that you
               | can _also_ render the plots to images, which youplot
               | can't)
        
               | zimpenfish wrote:
               | Sure but, e.g., sparklines can show me the shape of my 60
               | numbers[1] more effectively on a single line of 60
               | characters[2] than an ASCII bar chart which would be 60
               | lines (without binning).
               | 
               | [1] 27623 14272 22218 21267 19037 989 27116 32405 23261
               | 27104 7793 9432 7776 28832 13521 10783 29261 32193 30367
               | 20358 22611 2023 19607 9844 3516 6510 16533 8378 22986
               | 17043 14628 13392 22799 23847 29212 23690 17779 17059
               | 28211 26180 32061 22740 7911 12018 4508 9801 9578 15350
               | 9554 15517 11112 405 22054 2743 26609 7843 713 10975 2830
               | 1126
               | 
               | [2] http://rjp-hosted-files.s3.amazonaws.com/sparkline-
               | demo.png
        
       | DarkNova6 wrote:
       | I don't know about Awk, but I feel the urge to write a library
       | named "ward" for it.
        
         | schoen wrote:
         | Maybe the person who deals with security issues for an awk
         | implementation could be called the awk ward.
        
       | bsdooby wrote:
       | Currently looking @ alternatives (not that I dislike AWK, far
       | from it):
       | 
       | Tokay: https://github.com/tokay-lang/tokay
       | 
       | frawk: https://github.com/ezrosent/frawk
        
         | sgu999 wrote:
         | What do you think of them? Tokay in particular looks very
         | polished.
        
           | bsdooby wrote:
           | TBH: no conclusion yet (did not find time ATM to try it out
           | in fill detail)...sorry
        
         | lambertsimnel wrote:
         | Have you considered tab?
         | 
         | https://tkatchev.bitbucket.io/tab/
        
         | geophile wrote:
         | Take a look at marcel: https://github.com/geophile/marcel
        
       | radiator wrote:
       | This is good news, because you have to pay a lot for a used copy
       | of the first edition nowadays. I hope the spirit remains the same
       | as in the first edition.
        
       | csours wrote:
       | I FINALLY started learning awk in the past couple weeks. I think
       | I was intimidated because awk can be very terse, and there are
       | some default actions that aren't clear when you first start
       | looking at awk scripts.
       | 
       | My other problem is that I want to accomplish things, not learn a
       | tool, and it generally takes me a bit longer than it should to
       | decide to actually learn something and not just hack at it.
       | 
       | Is it still worth it to be "the awk guy" at work?
        
         | simmonmt wrote:
         | yes, because you'll be done with your thing before others
         | figure out how to lay out your spreadsheet. also your solution
         | will be reusable.
         | 
         | (based on my experience where people who could've benefited
         | from awk for a one-liner dependably reach for sheets/excel
         | rather than something like python or perl)
        
       | ineedasername wrote:
       | Amazing, takes me back.
       | 
       | ~
       | 
       | One of my first big projects at my first job fresh out of college
       | was using sed & awk to semi-automate the transformation of semi-
       | unstructured data into a database.
       | 
       | IIRC I couldn't completely automate because it contained author
       | names, from global naming conventions. (parsing names correctly
       | is deceptively complex) They had somewhat arbitrary #'s of
       | initials ranging from 0-3.
       | 
       | Again, IIRC, I could easily accommodate 0 or 1 initial (followed
       | by \\.) but trying for more would make the regex I was using too
       | greedy and pull in part of the article abstract. These were
       | scientific books and journals.
       | 
       | So I scripted a sed & awk program to detect the possibility of >
       | 1 initials and when that occured, I'd pipe the record into nano
       | for a quick review where I manually inserted the correct \\.
       | characters for the initials.
       | 
       | It was decades of back-catalogue publications for digitization so
       | I sat there for days, listening to music on an original 1st gen
       | iPod, waiting for my duct-taped kludge of a program to pipe one
       | of thousands of records into a nano session every few minutes.
       | This was on an Apple G4 workstation running OS X, where I earned
       | my real bash scripting chops. It was an awful hack by today's
       | standards, but at the time, accomplishing what was expected to be
       | a 1-year long project in ~1 month, it was seen as nearly
       | miraculous.
        
       | MikeTheGreat wrote:
       | Ok, dumb question: Is the link supposed to link to the actual
       | book (i.e., is the book free and/or open source) or is this just
       | a page of miscellaneous interesting links about the book (which
       | we can pay for, later, when it's published).
       | 
       | I was expecting the book, but the page itself says "This page is
       | a placeholder for material related to the second edition of The
       | AWK Programming Language."
       | 
       | It's fine if this is a placeholder page (and an awesome excuse to
       | read talk about AWK here on HN :) ) but I want to be sure that
       | I'm not missing the book itself.
        
         | RGBCube wrote:
         | What I understand from the page is that the Second Edition of
         | the book will reside in the page when it is released (the
         | reason why it says it is a "placeholder").
        
       | andrewstuart wrote:
       | Awk and ChatGPT are best friends.
        
         | ketanmaheshwari wrote:
         | How so?
        
           | andrewstuart wrote:
           | Ask ChatGPT to write your awk scripts - it does a prettyy
           | damn good job at a first pass.
        
       ___________________________________________________________________
       (page generated 2023-06-29 23:01 UTC)