[HN Gopher] The Awk Programming Language, Second Edition
___________________________________________________________________
The Awk Programming Language, Second Edition
Author : 0x54MUR41
Score : 475 points
Date : 2023-06-29 07:34 UTC (15 hours ago)
(HTM) web link (awk.dev)
(TXT) w3m dump (awk.dev)
| benhoyt wrote:
| I was privileged to be one of the technical reviewers for this
| book. There's a fair bit of the original content (which is still
| great), but Kernighan's done a great job with some good
| restructuring and some significant updates, too. The early
| chapters are very hands-on, with something of a focus on
| "exploratory data processing", particularly with CSV files. Big
| data with AWK, you could say.
|
| Gawk and awk will soon have a new "--csv" option that enables
| proper CSV input mode (parsing files with quoted and multiline
| fields per the CSV RFC). I'm really glad Arnold Robbins added a
| robust "--csv" implementation to Gawk, too, because that's really
| the most-heavily used version of AWK nowadays. I've already got
| CSV support in my own GoAWK implementation, and I'll be adding "
| --csv" to make it compatible.
|
| I'm really glad this new updated version is coming out!
| Simon_O_Rourke wrote:
| > Gawk and awk will soon have a new "--csv" option that enables
| proper CSV input mode
|
| Awesome!!!! Super excited to see this!
| calvinmorrison wrote:
| Its a crying shame we never settled on a control character
| separated text format. There's a ascii control characters for
| record and field (unit) separators. A bit of user space support
| for that would have been great.
| PeterisP wrote:
| Tab-delimited "csv" formats are quite common (e.g. the CONLL
| format family for many natural language processing tasks) and
| also supported by common tools such as MS Excel for decades
| already.
| hermitcrab wrote:
| Some discussion of that here:
| https://news.ycombinator.com/item?id=31220841
|
| To be really useful as a format it would just need for text
| editors to: -display something distinct for the field
| separator (some editors do this) -treat the record separator
| character like a carriage return (not aware of any editors
| that do this)
| throw0101c wrote:
| > _To be really useful as a format it would just need for
| text editors to_
|
| This made me think of WordPerfect's "reveal codes"
| functionality. :)
|
| (Word's "Reveal Formatting" is supposedly similar.)
| coldtea wrote:
| > _To be really useful as a format it would just need for
| text editors to: -display something distinct for the field
| separator_
|
| Which would be trivial too.
| hermitcrab wrote:
| The programming might be straightforward. Trying to
| persuade the product owners to do it is a different
| matter.
| calvinmorrison wrote:
| Right. The issue is the user space support at the end of
| the day.
| galleywest200 wrote:
| It is a shame. I have been using tab-separated sheets
| recently as it allows me to simply not care about almost any
| possible character in my strings...apart from tabs of course.
| But those are far less common than commas, and putting
| strings in quotes 100% of the time looks messy to me.
| calvinmorrison wrote:
| Way less common would be using ascii 30 and ascii 31. ascii
| 29 and you can cram multiple datasets into one file
| lolive wrote:
| Most important comment I have ever read on HN ever !
| bachmeier wrote:
| As I recall, you can tell Awk to use the control characters
| as record and field separators. Not helpful if you're getting
| your data from others, but if you're working by yourself, you
| have the option. I've come to use control characters as a
| default because it makes life so much easier.
| ufo wrote:
| What do you recommend for viewing and editing such files?
| ac29 wrote:
| Visidata works with arbitrary separators. I just tried
| with a CSV separated with (ASCII unit separator) and it
| worked just fine.
| lolive wrote:
| Excel too?
| JdeBP wrote:
| Miller handles it.
|
| * https://miller.readthedocs.io/en/6.8.0/file-
| formats/#csvtsva...
|
| I have programs that handle it.
|
| * https://jdebp.uk/Softwares/nosh/guide/commands/console-
| flat-...
| nmz wrote:
| It's nice that everyone is supporting this, I've written a
| portable awk module that takes control of the parsing and it is
| SLOW (and a little buggy). I'm a little bummed that nobody will
| use it but this is truly a step in the right direction.
|
| I guess for the people that are still using nawk, you can set
| up an AWK envvar so you can { awk -f $AWKU/ucsv.awk -f <(echo
| '{print NR, $1}') }
|
| https://github.com/Nomarian/Awk-Batteries/blob/master/Units/...
| anyfactor wrote:
| Our data product is delivered in CSV format. Even though I
| create user documentation mainly using csvkit, grep and sed, I
| would love to convert all those solutions to AWK. Sometimes AWK
| is more readable than sed and csvkit requires installation.
|
| It will be nice to have a awk cookbook for CSV. In terms of CSV
| maniupulation and querying there is only a limited number of
| operations and I think there is potential to standardize those
| operation using AWK.
| cauthon wrote:
| This is amazing, I may never use pandas again
| lost_tourist wrote:
| Would you say the first few chapters are enough to get the
| 75-80% usefulness for mere mortals like me who will never try
| to master the full language? Or is the material fairly
| sprinkled throughout the whole tome?
| benhoyt wrote:
| Yes, definitely. The first three chapters would be more than
| enough for that: 1) An Awk Tutorial, 2) Awk in Action, and 3)
| Exploratory Data Analysis. For most people who just want to
| use AWK for one-liners on the command line, you can stop
| there. The rest of the chapters are about writing larger
| (still small! but not one-liner) programs in AWK to create
| reports, little languages, and experiment with algorithms.
| tomcam wrote:
| Ben is not just any old technical reviewer. He wrote a version
| of AWK in go and has done a ton of other work in the AWK
| ecosystem.
| donatj wrote:
| I love awk. It's everywhere and every time I am writing a shell
| script and work myself into a corner, awk has been the way out.
|
| I know exactly enough to be dangerous and have meant to deep dive
| for almost a decade.
| IggleSniggle wrote:
| See, when I'm writing a shell script interactively and work
| myself into a corner, I reach for awk, struggle with it for a
| bit, and then either:
|
| 1) succeed, and regret the messiness of the solution
|
| or
|
| 2) fail, and find a non-awk way to handle it.
|
| I really tried to like awk, but its portability hasn't been
| enough of a feature to raise it above other scripting languages
| for me. Especially if I'm going to end up in an editor
| coliveira wrote:
| awk can be mastered by just reading the man page. The book
| doesn't take long to read either. Once you understand the
| simple principles, you can write an infinite number of scripts
| for all kinds of tasks.
| rochak wrote:
| I wish I use awk all the time but everytime I use it the
| knowledge I gain doesn't stick. Could be due to its arcane syntax
| which is just too hard for me to remember.
| dzogchen wrote:
| Wow, hyped for this.
|
| I picked up this little book from my University library once, and
| it was a fantastic read.
| jhoechtl wrote:
| I love the csv-mode. It obviously takes some time
| ahalbert wrote:
| I love using Awk, the only thing I miss is that it can't handle
| complex csv files. Does anyone know how to handle quoted CSV
| strings like
|
| > "foo","bar,baz"
| asicsp wrote:
| If quoted string is the only thing you need to handle extra
| (i.e. no escaped quotes, newlines, etc) and if you have GNU
| awk: $ echo '"foo","bar,baz"' | awk -v
| FPAT='"[^"]*"|[^,]*' '{print $1}' "foo" $ echo
| '"foo","bar,baz"' | awk -v FPAT='"[^"]*"|[^,]*' '{print $2}'
| "bar,baz"
|
| For a more robust solution, see
| https://stackoverflow.com/q/45420535 or use other tools like
| https://github.com/BurntSushi/xsv
| poetaster wrote:
| I wanted to ask why not the more simple form:
|
| echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $1}'
| "foo
|
| echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $2}'
| bar,baz
|
| echo '"foo","bar,baz","boo"' | awk -F"\",\"" '{print $3}'
| boo"
|
| Realizing that I have to strip the quotes that remain.
|
| Edit. formatting.
|
| EDit, again, from your link, the following is more terse and
| too my taste (still needs strips):
|
| awk -v FPAT='("[^"]*")+'
| wmwragg wrote:
| I usually use this awk function to parse CSV in awk:
| # This function takes a line i.e. $0, and treats it as a line
| of CSV, breakin # it into individual fields, and
| storing them in the passed in field array. It # returns
| the number of fields found, 0 if none found. It takes account
| of CSV # quoting, and also commas within CSV quoted
| fields, but doesn't remove them # from the parsed
| field. # use in code like: # number_of_fields
| = parse_csv_line($0, csv_fields) # csv_fields[2] #
| get second parsed field in $0 function
| parse_csv_line(line, field, _field_count) {
| _field_count = 0 # Treat each line as a CSV line and
| break it up into individual fields while (match(line,
| /(\"([^\"]|\"\")+\")|([^,\"\n]+)/)) {
| field[++_field_count] = substr(line, RSTART, RLENGTH)
| line = substr(line, RSTART+RLENGTH+1, length(line)) }
| return _field_count }
|
| It's not perfect but gets the job done most of the time and
| works across all awk implementations.
| lysium wrote:
| They are planning built-in support for that, see that other
| comment https://news.ycombinator.com/item?id=36518146
| JdeBP wrote:
| Convert it with Miller first: mlr --icsv
| --otsv cat examplefile
|
| * https://miller.readthedocs.io/en/latest/10min/
| dbro wrote:
| Yes, this is what csvquote does. It does nothing else, just
| this so that programs like awk, sed, cut, etc. can work
| properly.
|
| https://github.com/dbro/csvquote
| geophile wrote:
| I like the idea of Unix pipelines, but I hate all the
| sublanguages, awk being one of the biggest. I scratched my itch
| and built my own shell, marcel:
| https://github.com/geophile/marcel.
|
| I mention this specifically, here, because of the CSV point.
| Marcel handles CSV, e.g. "read --csv foobar.csv" reads the
| foobar.csv file, parses the input (getting quotes and commas
| correct), and yields a stream of Python tuples, splitting each
| line of the CSV into the elements of the output tuples.
|
| Marcel also supports JSON input, translating JSON structures
| into Python equivalents. (The "What's New" section of marcel's
| README has more information on JSON support, which was just
| added.)
| binary_ninja wrote:
| Awk has always been a language that I loved but I have struggled
| to use besides quick jobs for parsing text files. I understand it
| is meant to be use for exactly that, but the fact that is simple,
| fast and lightweight sometimes makes me want to do something more
| with it, but when I start trying to do something besides parsing
| text I find that it starts becoming awkward (pun intented?).
| tripflag wrote:
| I have found a handful of unconventional applications for awk
| -- I once needed a tiny pcm pulsewave generator, and awk was
| surprisingly decent for the job [1].
|
| Aside from that I've mostly been using it for quick statistics
| [2], but it quickly moves into perl territory...
|
| 1:
| https://github.com/9001/asm/blob/hovudstraum/etc/bin/beeps#L...
|
| 2: https://ocv.me/doc/unix/oneliners/#965bfcb8
| PhilipRoman wrote:
| I find it pretty nice for writing simple preprocessors. For
| example I have one which takes anything between two marker
| lines and pipes it through a command (one invocation per
| block). Awk has an amazing pipe operator which lets you do
| something like this: ... { print
| $0 | "command" }
|
| "command" is executed once, and the pipe is kept open until
| closed explicitly by close("command"), at which point the next
| invocation will execute it again. The command string itself
| acts as a key for the pipe file descriptor.
|
| And of course, no mention of awk is complete without the "uniq"
| implementation, which beats the coreutils uniq in every way
| possible (by supporting arbitrary expressions as keys and not
| requiring sorted input): !a[$0]++
| usrbinbash wrote:
| > but the fact that is simple, fast and lightweight
|
| I see awk as a DSL to be honest. Yes, it _can_ be used as a
| general purpose language, but that quickly becomes, as you say,
| awkward :D
|
| Like many DSLs, it is simple, fast and lightweight _as long as
| it is used for it 's intended purpose_. Once you start using it
| for something else, these advantages evaporate pretty quickly,
| because then you have to essentially work around the DSL design
| to get it to do what you want.
| snitty wrote:
| DSL == Domain Specific Language?
| Rediscover wrote:
| Yes
| coliveira wrote:
| One simple thing I do with awk is to create a command
| processor: read one line at a time and do things on my
| data as a response. This is very useful because you can
| make your command as powerful as needed and call other
| unix tools as a result.
| rsolva wrote:
| Do you have an example of this that is available
| somewhere?
| kqr wrote:
| This is exactly why I moved from AWK to Perl for these quick
| jobs a couple of years ago. If you stick to an AWK-like subset,
| Perl is also simple, fast and lightweight. If you want to grow
| your scripts (and you have a lot of discipline) Perl - in
| contrast to AWK - gives you enough noose to hang^W^W^W^Wthe
| tools you need.
| joeythedolphin wrote:
| Perl? Wow. Is that better than bash, python or even nodejs?
| Why write in Perl over these? Serious question, was
| propaghandized to hate Perl.
| gpvos wrote:
| Absolutely. It is comparable to python in some ways, but
| makes it much easier to write quick one-liners using
| regexes and data manipulation, and to scale those up to
| real programs. It fills the gap between bash scripts using
| awk, grep and sed, and C/java/C#. Compared to bash
| scripting, perl is a real programming language. The
| documentation and library ecosystem are excellent,
| backwards compatibility is legendary, yet it supports
| modern Unicode. The syntax is weird, but try it for a bit,
| read the man pages, it's not that hard. The OO system is
| weirder, and I wouldn't make complex class hierarchies in
| it, but it is usable.
| marttt wrote:
| I like how Awk is just a single executable. A single-
| executable Perl that includes only the core library would
| be great. There is Microperl [0, 1], but no idea how well
| it compiles with more up-to-date Perl versions.
|
| 0: https://github.com/bentxt/microperl-standalone
|
| 1: Original article from 2000 by the author Simon Cozens:
| https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0003.ht
| ml
| theonemind wrote:
| There's a limited problem domain where it's unquestionably
| the best. Perl beats awk and bash at their own game on
| their home turf. That's the best way to put it. It's
| faster, has more shortcuts, less warts, more power, and
| more readability when well written, and while aged and not
| huge by modern standards, CPAN (like pypi or npm) is
| incredible for a hyper-powered awk and bash mash-up for
| those tasks at the edge of of that limited problem domain.
| It's installed almost everywhere, so almost always
| available.
|
| That stuff is just awkward and painful in Python by
| comparison.
| bandrami wrote:
| Perl is super-specialized at reporting (that's in fact the
| "r" in Perl). In particular there's a bunch of extremely
| useful implicitly defined variables that take their context
| from your place in a line-by-line loop through a text file.
| ilc wrote:
| Perl is a great language, but please listen to this old
| perl programmer's advice:
|
| 1. You can write totally unreadable perl. It is probably
| the single worst language in this regard most programmers
| will run into. Be careful to make your code readable.
|
| 2. Keep your amount of perl small. 200-300 lines is a good
| bit of it.
|
| So for quick bang it out scripts that want to parse text
| etc... perl is great. For writing a major application, not
| so much.
| sigzero wrote:
| Better than BASH? Mostly. Better than Python, subjective as
| you would have to use them both yourself. I lean towards
| Perl as I like sigils to denote things. I have nothing
| against Python though. Both are typically installed as a
| default now. I have never used nodejs for sys admin work.
| IggleSniggle wrote:
| I write bash python and nodejs all day, and have no
| professional history with Perl.
|
| One day while avoiding working on something important, I
| spent half a day learning Perl in order to implement
| something related to a build tool that was being used in
| the important thing I was avoiding.
|
| I was blown away. It's a really delightful language. Its
| big downfall is that it makes it feel good to do something
| "clever."
|
| Perl is a joy to write, and a devil to read. I liked it,
| and wish I had started my career earlier so I could have
| enjoyed Perl in its heyday.
|
| I have similar feelings about Ruby.
| gpvos wrote:
| You need to make sure that you write the clever bits
| clearly. Maybe add a comment. It takes some discipline,
| but isn't hard.
|
| In fact, Perl remains remarkably robust if you stack
| clever tricks on top of each other.
| j1elo wrote:
| I don't write Perl code, but its CLI has been a very good
| way to replace _sed_ with something decent. _sed_ not
| supoorting Perl regex syntax, the most commonly kind of
| regex out there by large, is frankly disappointing. Even
| _grep_ was able to put it together and add the _-P_ switch.
| But _sed_ is still stuck in the prehistoric syntax of ERE (
| "Extended Regular Expressions", as described in man pages)
| which e.g. instead of _\d_ for a digit, use _[[:digit:]]_ ,
| a syntax present in... zero? other tools or programming
| environments.
| radiator wrote:
| When discussing such languages, I would like to point out
| that Raku is also an option.
| SoftTalker wrote:
| One other advantage is that Perl will be found in the base
| install of almost any unix-like system. Python, nodejs,
| even bash may not.
| tyingq wrote:
| The same shortcut syntax that people complain about does
| make perl really handy for one-time tasks where you're
| iterating on ideas. Lots of features there that make that
| easy. One example: #!/usr/bin/perl
| while (<>) { # various processing here
| # $ARGV is set to either "-" for piped input, or the
| current filename # $_ is the data of the current
| line }
|
| That (<>) construct accepts data from stdin, redirection or
| file(s) named as arguments and iterates over the data.
| There's lots of things like that throughout the language.
| jandrese wrote:
| And you can avoid even that minor boilerplate with the -n
| or -p flag. It even supports BEGIN and END like awk.
| Woeps wrote:
| Perl better? maybe or maybe not.
|
| It can be very useful and they are pretty robust. I often
| found Perl scripts running for years and years without
| issues at different companies.
|
| My main issue with Perl-scripts is that they often are not
| "readable" by anybody but the original creator. Which of
| course left the company. (not a fault of Perl itself tough)
|
| But your millage may vary and any script can be made
| (un)readable.
| thesuperbigfrog wrote:
| >> My main issue with Perl-scripts is that they often are
| not "readable" by anybody but the original creator.
|
| Anyone writing Perl scripts like this should not be
| trusted with _any_ programming language.
|
| Perl scripts are no less readable than bash scripts or
| Awk scripts. This is because so much of Perl was written
| to do the same work as bash, awk, sed, and the other
| related Unix text processing command line programs, but
| all under one roof.
|
| Don't believe me? Take a look for yourself:
|
| https://learn.perl.org/
|
| http://blob.perl.org/books/impatient-perl/iperl.htm
| ilovecurl wrote:
| Perl can also be hilariously unreadable: https://www.foo.
| be/docs/tpj/issues/vol4_3/tpj0403-0017.html
| thesuperbigfrog wrote:
| >> Perl can also be hilariously unreadable: https://www.f
| oo.be/docs/tpj/issues/vol4_3/tpj0403-0017.html
|
| Most programming languages can be obfuscated. That does
| not mean people write code in those programming languages
| like that:
|
| C: https://www.ioccc.org/
|
| Javascript: view-source:https://www.google.com/
|
| The truth is that insulting Perl is considered stylish by
| some, so many people do despite knowing little to nothing
| about Perl and having never used it.
|
| However, if you _want_ Perl to be _hilariously
| unreadable_ , why not write it in Latin:
|
| https://metacpan.org/dist/Lingua-Romana-
| Perligata/view/lib/L...
|
| Or Klingon:
|
| https://metacpan.org/pod/Lingua::tlhInganHol::yIghun
| anthk wrote:
| [flagged]
| wott wrote:
| It's like when we, Gen-X ers, were repeating bad stuff
| about COBOL without having seen a single line of it.
|
| Then I saw a real COBOL program and... well... it was
| even worse than what I had imagined :-)
| jandrese wrote:
| I've always found it weird that people bash on Perl
| relentlessly for being hard to read and then turn around
| and praise Rust's syntax when it is full of stuff like
| this: fn print_d(t: &'static impl
| Display) {
| throwawaaarrgh wrote:
| Have you ever tried to dig a hole? What tool did you use?
|
| - Want to cut through and move loam, compost, sandy, and
| compacted soil? You're gonna want a rounded shovel.
|
| - Want to break up rocky, clay soil? A pick mattock will
| penetrate deep, breaking up soil, shattering smaller rocks,
| and is used as a lever to uproot. A tiller is a faster
| method but disturbs the soil more.
|
| - Want to dig a narrow, deep hole? An augur will quickly
| break up rocks and soil in a shaft and move them upwards.
|
| What do you use the Perl tool for?
|
| - Quickly and efficiently open files, read line by line,
| analyze text, and perform any kind of operation you can
| think of, with complex data structures, objects and modular
| code, using very few lines of code.
|
| - Executing external commands with a shell, returning their
| output, and making complex yet short programs easily with
| arguments to the interpreter from a command line.
| anthk wrote:
| Perl can do sh/awk/sed and a bunch more at once.
| throw0101a wrote:
| > _Perl? Wow. Is that better than bash, python or even
| nodejs? Why write in Perl over these?_
|
| It depends on scale.
|
| If you have some quick parsing to do, then awk will get you
| started quickly, but as you expand your experimentation on
| what you want to extract/manipulate, it may not be easy to
| add onto the awk beginnings of your "one liner".
|
| But if you start with awk-like+ syntax but invoking it with
| Perl, then if you find you have to expand, Perl has more
| elbow room.
|
| The intention is not to 'go big', which those other
| languages may be better at, but to more easily 'start
| small'.
|
| + IIRC, Larry Wall wanted a utility that had awk/(s)ed-like
| syntax for text manipulation, just 'with more'.
| bluetomcat wrote:
| It's a language for creating quick alternative views from line-
| and column-oriented text streams. That means, take the output
| of another tool and represent it in a different way.
| asicsp wrote:
| I use awk mostly for one-liners and resort to Python when I
| need more than a few lines of code.
| ducktective wrote:
| Also watch his recent interview on Computerphile:
| https://www.youtube.com/watch?v=GNyQxXw_oMQ
|
| And: Brian Kernighan adds Unicode support to Awk
| https://news.ycombinator.com/item?id=32534173
| bardak wrote:
| Honestly after watching a lot of Kernighan interviews and
| reading his original book on C he is a very great communicator.
| I wonder how different the software world would have been
| without him at Bell Labs. Would Unix and C have become as
| widely used as quickly?
| throw0101a wrote:
| With Lex Friedman from ~2 years ago:
|
| * https://www.youtube.com/watch?v=O9upVbGSBFo
| getpost wrote:
| I know lots of people like awk, but I pretend it doesn't exist.
| Why? Here's my comment on this from 6 years ago[0],
|
| >I used awk until I learned Python (long ago). For me, awk was
| yet another example of the "worse is better" approach to things
| so common in unix. For example, if you make a syntax error, you
| might get a message like "glob: exec error," rather than an
| informative message. "Worse is better" is probably a good
| strategy in business and for getting things done, but still,
| mediocrity and the sense of entitlement that so often goes with
| carelessness, sickens me.
|
| [0] https://news.ycombinator.com/item?id=13457265
|
| Long live the Unix Hater's Handbook! (Unix is fine, and so are
| the criticisms herein. Some of these criticisms have been
| eclipsed by ongoing development.)
| https://en.wikipedia.org/wiki/The_UNIX-HATERS_Handbook
| pmarreck wrote:
| I will bet you $1000 that time spent learning Awk will lead to
| better results much faster than time spent polluting your
| privileged user directories with Python's excuse for
| "dependency management"
| ghshephard wrote:
| You are missing out. As a former data engineer/current SRE, I
| spend my entire day with VSCode/Python/Notebooks/CoPilot
| banging out python code - but whenever I need to do a complex
| analysis of a semistructured text file in < 60 seconds, awk is
| my twitch reflex tool. It can trivially do state transition
| based on patterns in the file, as well as populate hashes from
| one file and use them in analysis of the next file in just a
| few characters.
|
| Awk's claim to fame in my world is that it's cognitive
| activation energy for anyone who has taken the 3-4 hours to
| learn the language from start to finish (and that's the awesome
| thing about the language - it really is about 3 hours of
| concentrated attention) - is essentially nil. You see a bunch
| of ugly not really structured text 500 MB files that you can't
| pull into pandas, or easily parse into python dicts? No problem
| - awk will tear through them for you and get the information
| you want in < 60 seconds, including the time you took to write
| your (almostl always single line) of code.
|
| That's Awk's sweet spot.
| classichasclass wrote:
| In general Perl fits that niche for me better, but sometimes
| awk is what you have.
| getpost wrote:
| Point taken. I have a Python program that is an elemental
| version of awk, and I use that for the odd task. I can modify
| it if needed and I have the entire Python library to help me.
| Is the text Unicode? HTML? These little details matter.
|
| I'm not complaining that someone banged out awk (speaking
| figuratively) on a Friday afternoon to do something and not
| have to stay after work. Excellent! My complaint is that the
| failure to address technical debt has negatively affected the
| productivity of millions, if not tens of millions, of people,
| often working under pressure, for DECADES.
| ghshephard wrote:
| I'm not sure what technical debt you are referring to. Awk
| is designed to do one very simple job, and it does so using
| a language that I can usually teach to new SREs in < 2
| Hours with 9-10 follow up tasks that drill in their
| understanding.
|
| It's benefited from extraordinarily enlightened
| stewardship, kept it's minimalism and strengths, and will
| finally get a key enhancement (UTF-8 support).
|
| The first edition manual is probably the greatest example
| I've ever seen of technical writing as well.
| momentoftop wrote:
| Specifically, Awk is a good solution to a problem that should
| never have existed in the first place. Why am I having to write
| these bespoke parsers for the random mess of output formats
| that you get from the UNIX command line?
|
| Well, the fact is that I have to write such parsers. That's
| very sad, but has no chance of being fixed. So it's good to
| know Awk.
|
| I think Erik Naggum had this exact criticism of Perl.
| kar1181 wrote:
| One of the first utilities I had to get to grips with way back
| was awk, and it serves me well to this day. Best bang for buck
| investment of time in my entire career. Even today I still use
| some variant of awk -F(x) '{print $x}'.
| fgh wrote:
| Who wrote the second edition?
| Lyngbakr wrote:
| I read a comment on HN the other day by someone reviewing the
| book and I believe they said it was Brian Kernigan.
| fuzztester wrote:
| It was mentioned recently here in another HN thread that Brian
| Kernighan is writing it.
| B1FF_PSUVM wrote:
| The lowercase 'bwk' used in the text makes me believe that
| ...
| apienx wrote:
| Thanks for your work! Awk is a rabbit hole.
|
| "Dark corners are basically fractal - no matter how much you
| illuminate, there is always a smaller but darker one." - - Brian
| Kernighan (quoted in the GNU Awk book)
| sigzero wrote:
| I am looking forward to this coming out.
| asicsp wrote:
| Have to wait, as "The book will be available by the end of
| September"
|
| See https://hn.algolia.com/?q=The+AWK+Programming+Language for
| discussion on the first edition
|
| Didn't know there was a list of `awk` implementations:
| https://www.gnu.org/software/gawk/manual/html_node/Other-Ver...
| technofiend wrote:
| Seems like the best time to ask since this is an awk thread: if
| anyone has a line on the original artwork or a source for the awk
| t-shirt please let me know. From memory it's of a gangly bird
| jumping / parachuting from an airplane (DC3?) and captioned with
| awk's infamous catch-all error message: "Awk: bailing out near
| line one".
| proger wrote:
| Find and AWK together, a match made in heaven. Thanks for the
| link.
| lkuty wrote:
| do you have some resources regarding the use of awk with find ?
| penguin_booze wrote:
| I wish awk had support for addressing a range of fields, like
| from $1 to $7. `cut` supports it, FWIW.
| mplanchard wrote:
| You can always loop through the fields, but it's a little
| messy, especially for one-liners
| penguin_booze wrote:
| Yes, that's an option. The range lookup is an ergonomic
| feature. Imagne what it would have been like, if we couldn't
| do foo[-3:] in Python.
| shaftoe444 wrote:
| Can I preorder this?
| pmarreck wrote:
| Awk is old but great, designed to chew through lines of text
| files with ease, and has great defaults that minimize the amount
| of awk code you actually have to write to do anything. It's
| underrated.
| siraben wrote:
| Awk is awesome! Glad that they are looking to modernize the book.
| It wasn't really necessary, all the code examples in the original
| edition of the book still run just fine, although some are
| somewhat dated, like printing ASCII bar graphs. They also had
| examples of writing VMs, parsers and interpreters in the book,
| which run on modern implementations.[0]
|
| The language has some quirks. To declare temporary variables,
| it's common practice to add extra arguments to functions that
| won't be used. And traversal of associative arrays is
| implementation-dependent. I'm not sure what the situation is
| regarding locale and UTF-8 support.
|
| EDIT: Looks like Brian Kernighan added Unicode support last
| year.[1]
|
| [0] https://github.com/siraben/awk-vm/blob/master/vm.awk
|
| [1]
| https://github.com/onetrueawk/awk/commit/9ebe940cf3c652b0e37...
| bluetomcat wrote:
| Is there a particular benefit in writing a VM in AWK, placed in
| a big BEGIN block? Very similar code can be written in Perl or
| Python. Isn't the strength of AWK in its line-matching
| capability, being able to pattern-match a line against a block
| of code?
| ufo wrote:
| I love telling about that example to my programming language
| friends.
|
| > Hey you should read the AWK book, it even says how to write
| a VM!
|
| > Why would I ever want to use AWK for that?
|
| > Well, the input is a text file with one space-delimited
| instruction per line.
|
| > Hmm... You have a point.
| siraben wrote:
| > Is there a particular benefit in writing a VM in AWK
|
| Not really. Later on the book just ran out of line-matching
| examples to go through and started doing regular programming
| instead :P. When I actually write AWK code I rely on line-
| matching and using a variable to handle state.
| pdw wrote:
| At the time, awk was the only scripting language (other
| than shell) generally available on Unix systems. Perl, Tcl,
| Python didn't exist yet. So awk was often used for general-
| purpose programming.
| chasil wrote:
| AWK runs _everywhere_. Perl and Python do not.
|
| Busybox has their own independent AWK implementation.
|
| https://busybox.net/ https://frippery.org/busybox/
|
| Also see the first edition of the AWK manual online here:
|
| https://archive.org/details/pdfy-MgN0H1joIoDVoIC7
| anthk wrote:
| On wm's, why not a Z-machine? It's ideal for this.
| kqr wrote:
| What would you suggest as an alternative to printing ASCII bar
| graphs? I do that all the time. Takes 20 seconds and often
| makes distributions, modalities, and patterns over time obvious
| right away.
| zimpenfish wrote:
| `sparklines`[1] is good for an overall low-res view.
| `termgraph`[2] is sometimes better for a higher-res, more
| capable view (but can be finicky about the data.)
|
| [1] https://github.com/deeplook/sparklines
|
| [2] https://github.com/mkaz/termgraph
| kqr wrote:
| But both require depending on a third party library --
| hardly something on a whim if ASCII bar charts do the job?
| llimllib wrote:
| gnuplot is an alternative that is available on almost as
| many systems as awk, and can do the job as well
|
| edit: this prompted me to write up a little note showing
| how: https://notes.billmill.org/visualization/graphs/gnup
| lot/A_ba...
| dima55 wrote:
| If you do this sort of thing more than once ever, look at
| the feedgnuplot tool
| (http://github.com/dkogan/feedgnuplot). It'll make your
| life easier
| llimllib wrote:
| Neat! Once you're installing something to do terminal
| plots though, https://github.com/red-data-tools/YouPlot
| looks the nicest I've seen.
|
| (The nice feature of feedgnuplot of course is that you
| can _also_ render the plots to images, which youplot
| can't)
| zimpenfish wrote:
| Sure but, e.g., sparklines can show me the shape of my 60
| numbers[1] more effectively on a single line of 60
| characters[2] than an ASCII bar chart which would be 60
| lines (without binning).
|
| [1] 27623 14272 22218 21267 19037 989 27116 32405 23261
| 27104 7793 9432 7776 28832 13521 10783 29261 32193 30367
| 20358 22611 2023 19607 9844 3516 6510 16533 8378 22986
| 17043 14628 13392 22799 23847 29212 23690 17779 17059
| 28211 26180 32061 22740 7911 12018 4508 9801 9578 15350
| 9554 15517 11112 405 22054 2743 26609 7843 713 10975 2830
| 1126
|
| [2] http://rjp-hosted-files.s3.amazonaws.com/sparkline-
| demo.png
| DarkNova6 wrote:
| I don't know about Awk, but I feel the urge to write a library
| named "ward" for it.
| schoen wrote:
| Maybe the person who deals with security issues for an awk
| implementation could be called the awk ward.
| bsdooby wrote:
| Currently looking @ alternatives (not that I dislike AWK, far
| from it):
|
| Tokay: https://github.com/tokay-lang/tokay
|
| frawk: https://github.com/ezrosent/frawk
| sgu999 wrote:
| What do you think of them? Tokay in particular looks very
| polished.
| bsdooby wrote:
| TBH: no conclusion yet (did not find time ATM to try it out
| in fill detail)...sorry
| lambertsimnel wrote:
| Have you considered tab?
|
| https://tkatchev.bitbucket.io/tab/
| geophile wrote:
| Take a look at marcel: https://github.com/geophile/marcel
| radiator wrote:
| This is good news, because you have to pay a lot for a used copy
| of the first edition nowadays. I hope the spirit remains the same
| as in the first edition.
| csours wrote:
| I FINALLY started learning awk in the past couple weeks. I think
| I was intimidated because awk can be very terse, and there are
| some default actions that aren't clear when you first start
| looking at awk scripts.
|
| My other problem is that I want to accomplish things, not learn a
| tool, and it generally takes me a bit longer than it should to
| decide to actually learn something and not just hack at it.
|
| Is it still worth it to be "the awk guy" at work?
| simmonmt wrote:
| yes, because you'll be done with your thing before others
| figure out how to lay out your spreadsheet. also your solution
| will be reusable.
|
| (based on my experience where people who could've benefited
| from awk for a one-liner dependably reach for sheets/excel
| rather than something like python or perl)
| ineedasername wrote:
| Amazing, takes me back.
|
| ~
|
| One of my first big projects at my first job fresh out of college
| was using sed & awk to semi-automate the transformation of semi-
| unstructured data into a database.
|
| IIRC I couldn't completely automate because it contained author
| names, from global naming conventions. (parsing names correctly
| is deceptively complex) They had somewhat arbitrary #'s of
| initials ranging from 0-3.
|
| Again, IIRC, I could easily accommodate 0 or 1 initial (followed
| by \\.) but trying for more would make the regex I was using too
| greedy and pull in part of the article abstract. These were
| scientific books and journals.
|
| So I scripted a sed & awk program to detect the possibility of >
| 1 initials and when that occured, I'd pipe the record into nano
| for a quick review where I manually inserted the correct \\.
| characters for the initials.
|
| It was decades of back-catalogue publications for digitization so
| I sat there for days, listening to music on an original 1st gen
| iPod, waiting for my duct-taped kludge of a program to pipe one
| of thousands of records into a nano session every few minutes.
| This was on an Apple G4 workstation running OS X, where I earned
| my real bash scripting chops. It was an awful hack by today's
| standards, but at the time, accomplishing what was expected to be
| a 1-year long project in ~1 month, it was seen as nearly
| miraculous.
| MikeTheGreat wrote:
| Ok, dumb question: Is the link supposed to link to the actual
| book (i.e., is the book free and/or open source) or is this just
| a page of miscellaneous interesting links about the book (which
| we can pay for, later, when it's published).
|
| I was expecting the book, but the page itself says "This page is
| a placeholder for material related to the second edition of The
| AWK Programming Language."
|
| It's fine if this is a placeholder page (and an awesome excuse to
| read talk about AWK here on HN :) ) but I want to be sure that
| I'm not missing the book itself.
| RGBCube wrote:
| What I understand from the page is that the Second Edition of
| the book will reside in the page when it is released (the
| reason why it says it is a "placeholder").
| andrewstuart wrote:
| Awk and ChatGPT are best friends.
| ketanmaheshwari wrote:
| How so?
| andrewstuart wrote:
| Ask ChatGPT to write your awk scripts - it does a prettyy
| damn good job at a first pass.
___________________________________________________________________
(page generated 2023-06-29 23:01 UTC)