[HN Gopher] Understanding Awk
___________________________________________________________________
Understanding Awk
Author : todsacerdoti
Score : 332 points
Date : 2021-09-30 15:27 UTC (7 hours ago)
(HTM) web link (earthly.dev)
(TXT) w3m dump (earthly.dev)
| adamgordonbell wrote:
| Thanks for sharing this. I'm the author.
|
| When I wrote my introduction to JQ someone mentioned JQ was
| tricky but super-useful like AWK. I nodded along with this, but
| actually, I had no idea how Awk worked.
|
| So I learned how it worked and wrote this up. It is a bit long,
| but if you don't know Awk that well, or at all, I think it should
| get the basics across to you by going step by step through
| examining the book reviews for The Hunger Games trilogy.
|
| Let me know what you think. And also let me know if you have any
| interesting Awk one-liners to share.
| choffman wrote:
| I really appreciate you writing this guide. As a long time
| Linux user, I've always wanted to learn AWK, but it seemed too
| daunting. Three minutes into your guide and I immediately saw
| how I could use it in my day-to-day usage.
| adamgordonbell wrote:
| Thank you! It took me longer to write then I expected it
| would. I was originally just going to do some small examples
| of each idea.
|
| But once I got the idea of aggregating the book review data
| from amazon I felt I had to see it through.
| foobarian wrote:
| The funny thing is, by and large my only use case for awk is to
| print out whitespace delimited columns where the amount of
| whitespace is variable. Surprisingly hard to do with other Unix
| tools.
|
| Neat discussions around that sort of thing at least here:
| https://news.ycombinator.com/item?id=23427479
| goohle wrote:
| ls -l | tr -s ' ' | cut -d ' ' -f 5
| foobarian wrote:
| Exactly! Exactly! And now fix it to work with tabs :-)
| tyingq wrote:
| And leading whitespace. Compare: $ printf
| " one two three" | tr -s ' ' | cut -d ' ' -f 1
| $ printf " one two three" | awk '{print $1}' one
| goohle wrote:
| ps ax | sed 's/^\s\+//; s/\s\+/ /g;' | cut -d ' ' -f 4
| goohle wrote:
| echo -e '1\t2\t3\t4\t5' | expand -t 1 | cut -d ' ' -f 3
| tyingq wrote:
| The syntax isn't nearly as nice, but Perl can be handy if
| you're doing something more after splitting into columns. And
| it's usually already there / installed, like awk. For just
| columns: $ printf "a b c d e\n1 2 3 4 5"
| | perl -lanE 'say "$F[2] $F[4]"' c e 3 5
| adamgordonbell wrote:
| It surprized me that AWK had dictionaries and no
| declaration of vars that make it feel like a modern
| scripting langauge even though it was written in the 70s.
|
| It turns out though that this is because Perl and later
| Ruby were inspired by AWK and even support these line by
| line processing idioms with BEGIN and END sa well.
| ruby -n -a -e 'puts "#{$F[0] $F[1]}"' ruby -ne
| ' BEGIN { $words = Hash.new(0) }
| $_.split(/[^a-zA-Z]+/).each { |word|
| $words[word.downcase] += 1 } END {
| ...
| flandish wrote:
| A long while ago I wrote up a little processor to determine
| field lengths in a given file - I forgot the original reason.
| ( https://github.com/sullivant/csvinfo )
|
| However, I feel I really should have taken the time to learn
| Awk better as it could probably be done there, and simply!
| (It was a good excuse to tinker with rust, but that's an
| aside.)
| tyingq wrote:
| For some idea, a one liner to find the (last) longest
| username and length in /etc/passwd: $ awk
| -F: '{len=length($1);if(len>max){max=len;user=$1}}END{print
| user,max}' /etc/passwd
| flandish wrote:
| Thanks for that reply! It's good to work with an example.
| genewitch wrote:
| I'll mark this on my GitHub when I get back on a computer,
| I take public datasets and make graphs and transforms and
| reports. The big survey companies have weird data records
| and having to write a parser is my least favorite part. I
| think other people who ingest my content don't appreciate
| the effort, but that's a near universal feeling I think,
| heh.
| adamgordonbell wrote:
| choose from your link does look nice for simple column
| selection. echo -e "foo bar baz" |
| choose -1 -2
|
| vs awks echo -e "foo bar baz" | awk '{
| print $2, $3}'
|
| I love the effort people are putting into reinventing the
| core unix tools.
|
| I think I'll stick with Awk for now though.
| foobarian wrote:
| The problem with new tools is
|
| $ choose
|
| bash: choose: command not found...
| twic wrote:
| If i don't use awk, i throw tr -s ' ' into the pipeline, and
| then the delimiter is a single space, so you can just cut.
| kevinwang wrote:
| As someone who's never used awk before, I really enjoyed this
| write-up and I think it was very well written!
| mousepilot wrote:
| chiming in, I had a feeling that the article and the comments
| here would contain some jewels and both have exceeded
| expectations.
| nrclark wrote:
| I'm always happy when I see posts that promote AWK. It's a very
| underappreciated tool in my opinion. I was a Linux user for 20
| years before I got familiar with it. AWK is super powerful for
| text processing, and I like that it's included in Busybox for use
| on the embedded systems that I design.
|
| For any complex text processing, it's way better and more robust
| than having a super long pipeline of a bunch of sed/grep.
|
| Most recently, I used awk in a script that parses /proc/mount to
| grab the mountpoint of a partition, or print something different
| if the partition isn't mounted. Doable with a bunch of sed/grep
| and some shell logic? Definitely. But easier and cleaner in AWK,
| and equally easy to inline in a shell-script.
| throwaway894345 wrote:
| I do a lot of work with structured data--json, yaml, etc. For
| me, this is how I feel about jq. One of my favorite use-cases
| is querying Kubernetes resources. E.g., `kubectl get secret
| <secret-name> -o json | jq -r '.data | map_values(@base64d)'`
| (fetch a secret and decode all of its values).
| kbenson wrote:
| I've never bothered to learn much AWK, but that's mostly
| because Perl is my bread and butter language and has been for
| 20 years, and focusing on knowledge of that seemed a better
| investment (especially since with a few judicious flags, Perl
| is a passable AWK replacement even for very small one liners).
|
| That said, if you just want to supplement your knowledge of
| other shell tools and pull out something that can do some
| obvious text munging, AWK has always looked attractive for the
| task to me.
| chasil wrote:
| The problem is that awk is in POSIX, and perl is not.
|
| There are two common sources of awk for Windows, for example,
| that drop one exe to provide the interpreter:
|
| http://unxutils.sourceforge.net/
|
| https://frippery.org/busybox/
|
| Perl simply wasn't designed to do that.
| newaccount2021 wrote:
| But perl is available by default in almost every free *nix,
| and for most people, Windows isn't a requirement
| Lio wrote:
| Yep awk is lovely and well worth the time to learn.
|
| This is probably not important for embedded but doesn't a
| pipeline of small scripts (which could be in awk) give you
| better threading support?
|
| Xargs, GNU parallel or even make then scale that out really
| quickly.
| SavantIdiot wrote:
| Came here to say this. Glad to see /bin getting respect.
|
| To anyone processing huge quantities of text and text files,
| someone very likely had the same problem you faced back in the
| 1980's and there's a Unix/GNU tool for it already.
| dylan604 wrote:
| I was introduced to *nix from processing very large text
| files that text editors I was familiar with choked and died.
| Someone showed me sed/awk/grep, and it took seconds to
| process when other GUI editors couldn't open the file. Never
| looked back.
| GekkePrutser wrote:
| Not having to parse the output at all is even better. I really
| like the way Powershell can pass structured data like this.
|
| I'm a huge Linux/Unix fan but sometimes a rethink really works
| out. I hope Linux will get something similar. I know Powershell
| is available for Linux but without an adapted userland there's
| not much benefit
| invisible wrote:
| For some purposes, awk+xargs can replace hours of work to write
| a tool to automate some process. It's my go-to for ops work
| that I don't expect to live very long and just needs to
| _happen_.
|
| Also, happy 1337 karma day :).
| 5e92cb50239222b wrote:
| > awk+xargs can replace hours of work
|
| Including machine hours of work.
|
| Wasn't there a famous story of replacing a Hadoop cluster
| with an awk script (which was a couple orders of magnitude
| faster)?
|
| Oh yes, there was:
| https://news.ycombinator.com/item?id=17135841
| dapids wrote:
| In fairness it's xargs that is providing the command
| parallelization, not awk, but I agree both combined are a
| good match.
| genewitch wrote:
| If one considers the idea of map reduce to be taking a set
| of data and ending up with a subset that is relevant, I've
| used tons of simple things to do that, and never Hadoop.
|
| I think parsing logs to find pain areas or potential
| exploit/exfil is a map reduce job, for instance, and grep
| or awk can manage that just fine.
| freedomben wrote:
| Nice article. Seems we went through a very similar progression!
| :-D
|
| If anyone is interested in learning more, I built a conference
| talk to teach awk, and a set of exercises also that has gotten
| pretty positive feedback:
|
| Presentation: https://youtu.be/43BNFcOdBlY
|
| Exercises (for you to try): https://github.com/FreedomBen/awk-
| hack-the-planet
|
| Exercises (me solving): https://youtu.be/4UGLsRYDfo8
| stevebmark wrote:
| There are things I've come to dislike and avoid when programming
| in general:
|
| - Avoid programming in strings (especially in Bash, where nested
| quotes are full of pitfalls)
|
| - Avoid magic switches that change behavior (like -F)
|
| - Avoid terse or cryptic variable names (like $NF)
|
| - Avoid terse and magical syntax (sorry Perl, happy to leave you
| behind me)
|
| - Avoid programs that are hard to read
|
| - Avoid programs that are difficult to debug while writing them
|
| - Avoid programs that ignore types
|
| For these reasons, I prefer to avoid awk for anything except the
| most trivial of tasks. I think the prevalence of scripting
| languages and the speed of execution and debugging today has made
| awk not as necessary as it may have been in the 70s. And as to
| the first point, I'm aware you can write awk scripts in files,
| and I feel like if your script has gotten complex enough that you
| need a file, you're creating something unmaintainable and
| unreadable that would be better suited in a different programming
| language.
|
| Edit: I should add this article is great and a good introduction
| to awk, regardless of my personal taste for the tool.
| throwaway38941 wrote:
| I've been doing systems work for 20 years. Here's why most of
| those things are actually good:
|
| - Strings are subtly complex, but strings are not variables.
| You can assign a string, and later handle it as a variable, and
| not deal with any of the specifics of string-iness. Likewise,
| you can take a variable, and later treat it as a string (for
| loosely or not-typed variables).
|
| - Magic switches are not magic, they are options. Virtually
| every program takes options. Sometimes they impact a lot of
| things, sometimes a little. Only the context determines how
| much is "too much".
|
| - Terse/cryptic variables allow you to write complex
| expressions in a compact form. This allows you to read more in
| a small space, making it easier to reason about or form complex
| expressions. Human languages are flush with these, as is
| mathematics. But you have to balance the terse, cryptic and
| magical with guilelessness, or it becomes a mess.
|
| - Terse and magical syntax is, again, a feature, not a bug.
| Using magical syntax I can do in a few characters what would
| take me many lines with a traditional language, and as we all
| know, increased number of lines correlates to bugs, in addition
| to simply making it harder to grok.
|
| - Types aren't ignored, but they may be very loosely enforced.
| If you want to write a quick program to get something done,
| typing is a curse. If you want to write a very thorough
| program, typing is a blessing. In many cases, loosely or
| untyped programs actually work _better_ than their typed
| cousins, because they allow for more unexpected behaviors
| without failing. Failing early and often may be a modern trend,
| but... it literally means things fail more, and this is often
| not desirable.
|
| Caveats:
|
| - Programs that are hard to read do indeed suck, and it takes
| lots of experience to make some kinds of programs easier to
| read. But that's not an indictment of the program, it's an
| indictment of the person who wrote it. We don't indict English
| when somebody writes a document that's impossible to
| comprehend.
|
| - Interestingly, some of the more popular languages are the
| worst to debug. Perl is probably one of the easiest languages
| to debug, not inconsequently because of how good the
| interpreter is at suggesting to the user what the actual
| problem was and almost exactly how to fix it.
| [deleted]
| jrumbut wrote:
| The thing that prevents awk from being a major part of my daily
| routine is that it (amazingly) has poor CSV support. Consider
| the following:
|
| col1,col2,col3
|
| 1,2,3
|
| 4,"hello, \"world\"",6
|
| "7 buckets",,9
|
| To get the usual awk experience with this very common file
| format, exactly the type of thing you want to parse with awk,
| you first need to install gawk, then use a big FPAT regex that
| needs to be adjusted for any new CSV variant.
|
| I would love to see awk with "CSV mode", where it intelligently
| handles formats like this if you just pass a flag. I think awk
| would do well to differentiate itself with excellent 2d dataset
| parsing functionality, but at least catchup up to the average
| scripting language would be great.
|
| I'm half expecting someone to say "just pass -csv it does what
| you want" and if so I'll be very excited.
| nmz wrote:
| You can just use https://github.com/Nomarian/Awk-
| Batteries/blob/master/Units/... and use as so
| awk -f ./ucsv.awk -e '{print $5}'
|
| Also this
|
| > 4,"hello, \"world\"",6
|
| Is incorrect per https://tools.ietf.org/html/rfc4180 so you
| should just fix it with a sed -i 's/\\\"/""/g' and then just
| parse as normal.
|
| https://github.com/Nomarian/Awk-Batteries/wiki/Formats
| sk5t wrote:
| 'miller' and 'xsv' are pretty good tools for wrangling CSV.
| (And regexp is kind of a terrible tool for it, too many edge
| cases.)
| jrumbut wrote:
| Yeah, I don't want to have to write a CSV library each
| time, that's what I'm trying to get at.
|
| I just end up using Python/Perl but I do have a soft spot
| for awk so it would be cool if good support was built-in.
| sk5t wrote:
| Who's writing a library? Just use xsv or miller to
| extract the bits you want from the CSV, change the
| delimiter or escapes to something more convenient, etc.,
| then feed that to awk or other CSV-unaware text
| processors.
| jrumbut wrote:
| I was agreeing with your point about regexes, that it's
| good to avoid trying to deal with all the corner cases
| yourself when you're just trying to write a small script.
| sk5t wrote:
| Ah, understood! CSV is funny, it seems like a more
| trivial thing than it really is, and its human
| readability sort of invites broken approaches in a way
| that something like Parquet would not.
|
| XML is somewhere in the middle--I've seen some horrible
| abuses of CDATA sections way back when--but at least
| there are accepted ways to prove what's invalid.
| nickcw wrote:
| There is an answer to CSV mode a bit further down the page
|
| https://news.ycombinator.com/item?id=28708145
|
| ...but if your files are CSV, there is a CSV extension for
| gawk @include "csv" BEGIN { CSVMODE
| = 1 }
| jrumbut wrote:
| Well there you go, for the sake of my pride at least it's
| an extension.
|
| It's funny searches for awk CSV seem to yield a bunch of SO
| questions where the answers are increasingly cumbersome
| regexes instead of this extension.
|
| Of course, you can't count of this extension being widely
| installed, but it's great for my own desktop.
| nmz wrote:
| that's because the extension only works in gawk. its not
| portable anywhere else.
| [deleted]
| m463 wrote:
| I use awk for one-liners, no more.
|
| Looking at my command history, I mostly use awk to extract a
| field like this: <something> | awk '{print
| $3}'
|
| (I know "cut" is supposed to do the same thing, but it was
| never reliable for me - maybe tabs/spaces?)
| likpok wrote:
| Consider the input a b
|
| Awk will treat it as having two columns (by default), while
| cut will treat each space as it's own column.
|
| Awk is also a little nicer for whitespace; cut makes
| specifying the delimiter (with say "-d\ ") a little more
| vexing.
| chasil wrote:
| Here is a GAWK program of mine that implements outgoing SMTP.
| While not a one-liner, this is much shorter and less tedious
| than trying to do it in C. $ cat
| /bin/awkmail #!/bin/gawk -f BEGIN {
| smtp="/inet/tcp/0/smtp.yourco.com/25"; ORS="\r\n";
| r=ARGV[1]; s=ARGV[2]; sbj=ARGV[3]; # /bin/awkmail to from
| subj < in print "helo " ENVIRON["HOSTNAME"]
| |& smtp; smtp |& getline j; print j print
| "mail from: " s |& smtp; smtp |& getline
| j; print j if(match(r, ",")) {
| split(r, z, ",") for(y in z) { print "rcpt to: "
| z[y] |& smtp; smtp |& getline j; print j } }
| else { print "rcpt to: " r |& smtp; smtp |&
| getline j; print j } print "data"
| |& smtp; smtp |& getline j; print j print
| "From: " s |& smtp; ARGV[2] = "" #
| not a file print "To: " r
| |& smtp; ARGV[1] = "" # not a file if(length(sbj))
| { print "Subject: " sbj |& smtp; ARGV[3] = "" } # not a
| file print "" |& smtp
| while(getline > 0) print |& smtp
| print "." |& smtp; smtp |&
| getline j; print j print "quit"
| |& smtp; smtp |& getline j; print j close(smtp)
| } # /inet/protocol/local-port/remote-host/remote-port
| meltedcapacitor wrote:
| Cheap fix: the space after MAIL FROM: and RCPT TO: is not
| standard compliant.
| m463 wrote:
| IMHO, that's too big for awk, why not python?
|
| for example: #!/usr/bin/python
| import smtplib from email.mime.text import MIMEText
| msg = 'hi' subj='read this!'
| smtp_server='mail.example.com'
| smtp_from='me@example.com'
| smtp_to='you@example.com' m = MIMEText(msg)
| m['To'] = smtp_to m['From'] = smtp_from
| m['Subject'] = subj s =
| smtplib.SMTP(smtp_server) s.sendmail(smtp_from,
| [smtp_to], m.as_string()) s.quit()
|
| of course, you seem to think in gawk so if that works for
| you that's what you should continue doing!
|
| by the way, I hacked this example from another script which
| attached a logfile: with
| open(arg.logfile) as f: log_contents = f.read()
| m = MIMEText(log_contents)
|
| you can also use: from email.mime.image
| import MIMEImage from email.mime.text import
| MIMEText from email.mime.multipart import
| MIMEMultipart
|
| and then: m = MIMEMultipart()
| m.attach(MIMEText('\n\n%s\n\n'%xkcd_img_title))
| m.attach(MIMEImage(xkcd_img))
| chousuke wrote:
| Your script doesn't even do the same thing. You are
| importing a library that implements SMTP, which is
| missing the point.
|
| The AWK script doesn't need libraries, so it can actually
| be useful in places where you have awk but not Python.
| jrumbut wrote:
| That's a beautiful use of the language, it reminds me of
| some of the awk CGI efforts out there.
|
| For example: https://www.gnu.org/software/gawk/manual/gawki
| net/html_node/...
| ChuckMcM wrote:
| I take it you LOVE ada :-)
|
| There is a lot of wisdom in the things you avoid, however I
| would ask one question, "How often do you use it?"
|
| For me, the best systems are those that can be wordy and
| prescriptive but as you get to know them you can use more short
| hand so they "get out of the way" as it were. A good example of
| that philosophy is keyboard short cuts. When I'm learning a
| program I'm happy to pause and sling the mouse around to find
| the thing I need in the labeled menu stack with an appropriate
| name which also tells me what the keyboard short cut is for
| that thing. Then as I get better I can just use the short cut
| and my workflow gets faster. Once I've internalized the keymap
| my flow is held up by how fast I can think, not by how fast I
| can take my hand off the keyboard, move the mouse, click and
| then put it back on the keyboard.
|
| Awk is one of those things that once you internalize what it
| can do, you can use it for a lot of stuff, and you can do it
| quickly.
| ketanmaheshwari wrote:
| One tip I have to make large-ish awk programs readable is to name
| the columns in the BEGIN section. Then, you'd use $colname
| instead of $1, $2, etc. for instance:
|
| BEGIN{ item_type = 1; item_name = 2; price = 3; sale = 4; #etc }
|
| Now, in place of $1, you'd say $item_type which significantly
| improves overall readability of the code.
| jayknight wrote:
| I've also used this to address columns by name for files with
| lots of columns that I'm too lazy to count:
| https://unix.stackexchange.com/a/359699
| dredmorbius wrote:
| You can also put a similar code block at the start of a general
| processing entry. This applies on both flat (uniform record)
| and hierarchical (multiple record-type) data.
|
| E.g.: { name = $1 dob = $2
| grade = $3 # ... # Do stuff with name /
| dob / grade, etc. }
|
| If the data are structured, so that there are multiple record
| types (typically defined by prefix or some other regex) you can
| put variable assignments within each block.
| /^rectype1/ { var1 = $1; var2 = $2, ... } /^rectype2/ {
| varA = $1; varB = $2, ... }
|
| I prefer to leave BEGIN blocks for defining constants or tables
| and such.
| ulucs wrote:
| Nice tip, so basically like excel with tables
| dima55 wrote:
| If you want to do that, use vnlog instead. You're 90% there
| already.
|
| https://github.com/dkogan/vnlog/
| ufo wrote:
| One thing that I would love to hear about is suggestions of how
| to make my files/output more awk-friendly.
| adamgordonbell wrote:
| This isn't your question but if your files are CSV, there is a
| CSV extension for gawk @include "csv"
| BEGIN { CSVMODE = 1 }
| tejtm wrote:
| Tab separated values all the things
| buzzwords wrote:
| Thanks for this tutorial and everyone else that posted some great
| tips and links. I find myself needing to use awk once in a blue
| moon and every time it eats a lot of my time. I hope I remember
| your tutorial next time I need it.
| jrochkind1 wrote:
| This is a great model of how to do a tutorial.
| 1vuio0pswjnm7 wrote:
| Its common as in the OP to see awk recommended for something as
| simple as extracting a column from tab or space-separated values.
| IMO, its quite a bit of typing to do on the fly at a command
| prompt. Performance-wise, it could be significantly slower that
| other utilities that are equally as ubiquitous as awk.
| echo one two three|awk '{print $2}'
|
| Are there other ways to do this. Are they faster.
| cat > awc #!/bin/sh test $# -eq 1||exit
| exec tr \\40 \\11|exec cut -f "$1"|exec tr \\11 \\40 ^D
| echo one two three|awc 2
|
| Test it on a file to see if it is faster than awk.
| time awk '{print $2}' file time awc 2 < file
| fmakunbound wrote:
| For those kinds of tasks I use Awk to process the data into a
| SQLite database. Then I do the queries on that since it's easier
| and more advanced things (grouping, having) are much easier
| declaratively.
| mongol wrote:
| Yes! Another recent thread recently discussed best practice and
| whether something like that exist. I believe this is a good
| example.
| iefbr14 wrote:
| It's awksome :)
| bright_day wrote:
| kkkkkk
| calvinmorrison wrote:
| Can't recommend the gawk manual enough, and "The awk manual"
| enough
|
| https://www.gnu.org/software/gawk/manual/gawk.pdf
|
| and
|
| http://www.cs.unibo.it/~sacerdot/doc/awk/nawkA4.pdf
|
| enough
| chasil wrote:
| The original language specification, written by the authors, is
| now free online. Chapter 2 covers the whole language in a
| little over 40 pages.
|
| https://archive.org/download/pdfy-MgN0H1joIoDVoIC7/The_AWK_P...
| calvinmorrison wrote:
| have a copy on my bookshelf! Didn't have a pdf though nice.
|
| The gawk one is useful if you're into some of the gnuism
| specifics
| dredmorbius wrote:
| Severely underrated comment.
|
| Having relied heavily on the (unofficial, non-GNU) gawk manpage
| extensively (it's quite good), I instantly started learning
| very useful features reading the GNU docs. (I still need to
| fully internalise those). Yes, the full manual is very much
| better than the manpage.
|
| (Also recommend _The AWK Programming Language_ mentioned here,
| though I 'd suggest the GNU manual adds to that as well.)
| corpMaverick wrote:
| I find it amusing that AWK is coming back. I used it extensively
| back on the day, but let it go when I picked up Perl 4 and then
| Perl 5. So Perl is no longer king for unix scripting. It was
| replaced by other languages; but it seems like there is a niche
| that they were not able to fill since AWK is back.
| xphos wrote:
| This was one of the best awk tutorials I've read its very concise
| and digestible. I sometimes use awk but the more complex things
| get the more i feel like i cannot use it. This tutorial made me
| feel otherwise
| ChuckMcM wrote:
| And if you learned awk(1) first, then when you saw perl for the
| first time it immediately made sense to you as a 'super awk'.
| abzug wrote:
| That happened to me. AWK -> Perl -> Ruby.
| theophrastus wrote:
| At some point in every bioinformatics lecture i always manage
| something akin to: "Learn awk! (or perl) You'll need it. Your
| data will come from various disparate sources, and you need to
| get them into some well-defined useful format from the get go."
| cafard wrote:
| Thanks! I had been putting it off, but after looking at the
| article, I wrote a little but useful script with a line of awk in
| it.
| naikrovek wrote:
| so this isn't related to the article so much, but to something
| the article reminded me about: why do people use /usr/bin/env to
| find a program rather than setting the PATH within the script to
| a known-good value then using that to locate things?
|
| the path that /usr/bin/env returns is (essentially) a global
| variable that can change underneath you, right? I mean that just
| screams "variable that may be changed by others" to me.
|
| I've never understood why /usr/bin/env exists.
| dredmorbius wrote:
| Portability.
|
| The /usr/bin/env trick will work on a wide range of systems, in
| which even common utilities might have numerous locations:
| /bin, /sbin, /usr/bin, /usr/bin/local, /opt, or others. If
| you're writing scripts for portability and ohers, this has
| value.
|
| That said, /usr/bin/env fails on Android/Termux AFAIU.
| dmux wrote:
| I've never gone further than thinking about it, but I've always
| been curious as to how simple it would be to use Awk as an
| interpreter for a really simple Tcl-like language:
| set a 1 set b 2 define add (n,m) $n + $m
| set result [add a b]
|
| I think it would be simple enough to come up with some Awk
| pattern/actions to parse the above and execute the commands.
| Stratoscope wrote:
| I used to love Awk! I still do, even if I don't use it much any
| more.
|
| Awk has a reputation for being hard to read (as noted in
| stevebmark's comment), but when I was using it actively, I tried
| to treat it as a serious programming language and write readable
| programs in it.
|
| Several years ago I tracked down a couple of my old Awk programs
| from around 1990 and posted them here:
|
| https://github.com/geary/awk
|
| SHANEY.AWK is an implementation of the infamous Mark V. Shaney:
|
| https://www.clear.rice.edu/comp200/09fall/textriff/sci_am_pa...
|
| This was probably the first program that made me really impressed
| with Awk. People were writing rather complicated Shaney
| implementations in C, and I thought, "this could be really simple
| in Awk." And it was!
|
| LJPII.AWK is the Awk program I'm most proud of. This was in the
| days when we had tiny screens and no multiple monitors and you
| always printed out your code to read it. In my circles we also
| fond of inserting "separator lines" between functions, in various
| formats such as this one: // - - - - - - - - - -
| - - - - - - - -
|
| So I wrote LJPII to print source code in "two up" format (two
| pages side by side in landscape mode) on my LaserJet II. It also
| converted the separator lines into graphical boxes, and tried to
| avoid splitting a function across multiple pages. It wasted some
| paper but made nicely readable printouts.
|
| I wish I still had some of my old printouts, but they are long
| gone. One of these days I will have to see if I can update the
| code to work with the LaserJet emulation in my Brother printer!
| (It should mostly work, but I wrote this in the old Thompson Awk
| for DOS, so there are a couple of non-standard things in it.)
|
| Looking at the code again, it's amusing to see some old Windows
| Hungarian notation which was popular/notorious back then, for
| example an "f" prefix for a boolean (flag) value, and "af" prefix
| for an array of flags.
|
| Hungarian aside, I tried to make this code as readable as I
| could.
|
| Random fun fact! Someone who used to be an avid Awk programmer is
| Will Hearst (William Randolph Hearst III). It's been many years
| since I talked with him, so no idea if he still does any Awk
| programming.
| whymarrh wrote:
| "If you like this you might also like" https://ferd.ca/awk-
| in-20-minutes.html
|
| I too am happy to see more Awk material in the world, once I
| learned a bit about it I started reaching for it more and more.
| MisterTea wrote:
| > _Awk is a record processing tool_
|
| Actually, AWK is a domain specific programming language. When you
| start treating AWK as such then you can really gain an
| appreciation for it. I too treated it as a dumb one liner
| relegated to ingesting cryptic regexp one liners in shell
| scripts. After reading the original AWK book it completely
| changed my outlook on the language. I had no idea you could
| define functions or perform basic math so one could use it for
| very basic tabular operations such as spread sheets. AWK can even
| be used as a standalone language outside of shell scrips by
| writing a program, insert a shebang on the first line calling
| awk, and mark the file as executable.
| adamgordonbell wrote:
| shebangs and more complex scripts are covered in the article.
|
| But yes, I agree that the original AWK book is really good.
| After covering some basics and the language reference, it has
| some fun projects that you can build with AWK.
| EvanKelly wrote:
| Lots of great AWK tutorials in here that are more in depth, but
| I'll share another. I always go back to Brian Kernighan's
| personal help file:
|
| https://www.cs.princeton.edu/courses/archive/spring19/cos333...
|
| Brian Kernighan has a knack for explaining languages very
| precisely and elegantly.
| cf100clunk wrote:
| And for the flash card type of learners it is good to see the
| "HANDY ONE-LINE SCRIPTS FOR AWK" page is still available. See
| the links in the Credits section at the bottom for more great
| reading:
|
| https://www.pement.org/awk/awk1line.txt
|
| That author also edited the "USEFUL ONE-LINE SCRIPTS FOR SED"
| page:
|
| https://www.pement.org/sed/sed1line.txt
| zabzonk wrote:
| Well, this is OK I guess. But if you really want to learn Awk you
| want the book "The AWK Programming Language", mostly written by
| Brian Kernighan (he's the K in AWK and in K&R), and as usual for
| all of his books, it's brilliant.
| dang wrote:
| Significant past threads. I had to leave a ton of submissions
| out! Any others that are particularly good?
|
| _Awk: The Power and Promise of a 40-Year-Old Language_ -
| https://news.ycombinator.com/item?id=28441887 - Sept 2021 (118
| comments)
|
| _Awk is the coolest tool you don 't know_ -
| https://news.ycombinator.com/item?id=27039608 - May 2021 (20
| comments)
|
| _CGI with Awk on OpenBSD Httpd (2020)_ -
| https://news.ycombinator.com/item?id=27037113 - May 2021 (22
| comments)
|
| _The State of the Awk_ -
| https://news.ycombinator.com/item?id=25142867 - Nov 2020 (58
| comments)
|
| _Awk: `Begin { ` Part 1_ -
| https://news.ycombinator.com/item?id=24940661 - Oct 2020 (106
| comments)
|
| _Show HN: Awk-JVM - A toy JVM in Awk_ -
| https://news.ycombinator.com/item?id=23612910 - June 2020 (27
| comments)
|
| _Running Awk in parallel to process 256M records_ -
| https://news.ycombinator.com/item?id=23394024 - June 2020 (101
| comments)
|
| _The State of the AWK_ -
| https://news.ycombinator.com/item?id=23240800 - May 2020 (86
| comments)
|
| _Awk in 20 Minutes (2015)_ -
| https://news.ycombinator.com/item?id=23048054 - May 2020 (126
| comments)
|
| _Show HN: An eBook with hundreds of GNU Awk one-liners_ -
| https://news.ycombinator.com/item?id=22758217 - April 2020 (48
| comments)
|
| _Learn Awk by Example (2019)_ -
| https://news.ycombinator.com/item?id=22455779 - March 2020 (29
| comments)
|
| _Awk As A Major Systems Programming Language, Revisited (2018)_
| - https://news.ycombinator.com/item?id=22304017 - Feb 2020 (80
| comments)
|
| _Why Learn Awk? (2016)_ -
| https://news.ycombinator.com/item?id=22108680 - Jan 2020 (235
| comments)
|
| _Learn Just a Little Awk (2010)_ -
| https://news.ycombinator.com/item?id=21101478 - Sept 2019 (69
| comments)
|
| _Awk by Example_ - https://news.ycombinator.com/item?id=20308865
| - June 2019 (21 comments)
|
| _Removing duplicate lines from files keeping the original order
| with Awk_ - https://news.ycombinator.com/item?id=20037366 - May
| 2019 (154 comments)
|
| _GNU Awk 5.0_ - https://news.ycombinator.com/item?id=19671983 -
| April 2019 (49 comments)
|
| _Learn just a little Awk (2010)_ -
| https://news.ycombinator.com/item?id=17322412 - June 2018 (244
| comments)
|
| _The Awk Programming Language (1988) [pdf]_ -
| https://news.ycombinator.com/item?id=17140934 - May 2018 (207
| comments)
|
| _Learn to use Awk with hundreds of examples_ -
| https://news.ycombinator.com/item?id=15549318 - Oct 2017 (116
| comments)
|
| _Awk for multimedia_ -
| https://news.ycombinator.com/item?id=15410259 - Oct 2017 (24
| comments)
|
| _Awk driven IoT_ - https://news.ycombinator.com/item?id=14735752
| - July 2017 (35 comments)
|
| _Skip grep, use awk_ -
| https://news.ycombinator.com/item?id=14692233 - July 2017 (130
| comments)
|
| _Awk vs. Perl (2009)_ -
| https://news.ycombinator.com/item?id=14647022 - June 2017 (71
| comments)
|
| _The Awk Programming Language (1988) [pdf]_ -
| https://news.ycombinator.com/item?id=13451454 - Jan 2017 (103
| comments)
|
| _Show HN: 3D shooter in your terminal using raycasting in Awk_ -
| https://news.ycombinator.com/item?id=10896901 - Jan 2016 (55
| comments)
|
| _Awk in 20 Minutes_ -
| https://news.ycombinator.com/item?id=8893302 - Jan 2015 (85
| comments)
|
| _An Awk Primer_ - https://news.ycombinator.com/item?id=7961848 -
| June 2014 (28 comments)
|
| _A Crash Course In Awk_ -
| https://news.ycombinator.com/item?id=6578960 - Oct 2013 (37
| comments)
|
| _Why Awk for AI? (1997)_ -
| https://news.ycombinator.com/item?id=5725291 - May 2013 (53
| comments)
|
| _Ask HN: Do people build websites in Awk?_ -
| https://news.ycombinator.com/item?id=5041323 - Jan 2013 (12
| comments)
|
| _Why you should learn just a little Awk - A Tutorial by Example_
| - https://news.ycombinator.com/item?id=2932450 - Aug 2011 (76
| comments)
|
| _Announcing my first e-book "Awk One-Liners Explained"_ -
| https://news.ycombinator.com/item?id=2674284 - June 2011 (24
| comments)
|
| _AWK-ward Ruby_ - https://news.ycombinator.com/item?id=2486231 -
| April 2011 (31 comments)
|
| _Music with AWK_ - https://news.ycombinator.com/item?id=2294909
| - March 2011 (15 comments)
|
| _Exercise #1: Learning awk Basics_ -
| https://news.ycombinator.com/item?id=2210085 - Feb 2011 (20
| comments)
|
| _Why you should learn at least a little bit of Awk_ -
| https://news.ycombinator.com/item?id=1738688 - Sept 2010 (62
| comments)
|
| _Don 't MAWK AWK - the fastest and most elegant big data munging
| language_ - https://news.ycombinator.com/item?id=815529 - Sept
| 2009 (22 comments)
___________________________________________________________________
(page generated 2021-09-30 23:00 UTC)