[HN Gopher] A brief interview with Awk creator Dr. Brian Kernigh...
___________________________________________________________________
A brief interview with Awk creator Dr. Brian Kernighan (2022)
Author : breck
Score : 109 points
Date : 2024-07-17 18:28 UTC (4 hours ago)
(HTM) web link (pldb.io)
(TXT) w3m dump (pldb.io)
| fuzzy_biscuit wrote:
| I know the title said brief, but it still took me by surprise
| that there were only three questions.
| phatfish wrote:
| No excuses not to read the whole article before commenting this
| time!
| kansai wrote:
| Describing the K of K&R as "Awk creator" is like describing
| Einstein as a "refrigerator engineer".
| overhead4075 wrote:
| It's an awk interview. "Awk creator" is relevant.
| midiguy wrote:
| I agree it was an incredibly awkward interview
| riiii wrote:
| He didn't say it was irrelevant.
| felideon wrote:
| Right, the headline gave the appropriate context for the
| interview in a short sentence. Not everyone knows Kernighan
| created awk, as opposed to a respected person that happens to
| have opinions on awk.
| Gormo wrote:
| AWK's name is actually an initialism of the last names of
| its three creators: Aho, Weinberger, and Kernighan.
| usr1106 wrote:
| Aho, one of the authors of the dragon books. When I
| studied CS 40 years ago it was given standard literature
| on compiler construction, a mandatory course.
|
| No idea whether it is still used today. Well, no idea
| whether there is anything fundamentally new in compilers
| such an old book would not cover.
| ryangs wrote:
| Indeed, still in use. Also presumably the Aho of the Aho-
| Corasick string matching algorithm.
| JohnFen wrote:
| Why do you say that? developing Awk was no less of an
| accomplishment than C.
| alexthehurst wrote:
| This seems like a rather large claim that would benefit from
| some kind of supporting argument.
| kragen wrote:
| c-descended languages include c++, java, and c#
|
| awk-descended languages include perl, tcl, js, python and
| lua
| 1234554321a wrote:
| He's the closest man to C still alive. Yet it's not really
| "his" accomplishment. C was created by Ritchie alone, Kernighan
| only wrote the book.
| aap_ wrote:
| I would say Ken Thompson and Steve Johnson are the closer.
| audleman wrote:
| I read this comment in the spirit of "you might not have known
| that K did much more that write awk, but he's a genius." Which
| I didn't know, so I appreciate the context.
| Zambyte wrote:
| It's interesting to me that he refers to association arrays as
| "newish", when they showed up in Lisp nearly 20 years earlier.
| chasil wrote:
| This is from the first edition free PDF online of _The AWK
| Programming Language_ :
|
| "Associative arrays were inspired by SNOBOL4 tables, although
| they are not as general. Awk was born on a slow machine with a
| small memory, and the properties of arrays were a result of
| that environment. Restricting subscripts to be strings is one
| manifestation, as is the restriction to a single dimension
| (even with syntactic sugar). A more general implementation
| would allow multi-dimensional arrays, or at least allow arrays
| to be array elements."
|
| Stable release - SNOBOL4 / 1967
|
| https://en.wikipedia.org/wiki/SNOBOL
| jerf wrote:
| Two things to keep in mind: Pre-internet, it was quite easy to
| be skilled and knowledgeable and yet never have been exposed to
| something another community somewhere would consider basic, or
| for it to take many many years to percolate across. I remember
| getting into programming even in the 1990s and I can tell you a
| lot of the C world would still consider associative data
| structures like a hash table something to be a bit exotic and
| only to be pulled out if you had no other choice. It was very
| easy to not be very up to date on these things.
|
| Second, while the programming world fancies itself a very fast
| mover, it really isn't, and it especially wasn't back then. 20
| years isn't necessarily as long as you would think in 2024.
| We're still trying to convince people in 2024 that you really
| don't need a language that will happily index outside of
| arrays, and how long has _that_ been a staple in existing
| languages? (You may think that 's silly, but we've had that
| fight on HN, yes, yea verily, this very year. Granted, this is
| winding down, but it's still a live argument.)
|
| There's a lot of things that showed up in some language in the
| 1960s or 1970s and wasn't widely adopted for another few
| decades.
|
| (Though I suspect one problem with Lisp is that back in the
| day, its performance was pretty awful relative to the other
| things of the day. It was not always the best advertisement for
| its features. A bit too ahead of its time sometimes.)
| Zambyte wrote:
| I totally get the first point, but this comment from Brian
| was made when the internet had been available for many, many
| years. It seems more like it felt newish at the time, and
| that is how he has seen it since then. I guess that's more
| why I find it to be an interesting comment.
|
| Regarding the second point, when I hear the term "associative
| arrays" or "associative lists" I think of them in a pretty
| high level of abstraction. Like really: a C struct is an
| association abstraction. A simple array of structs is not
| often considered an "associative array", but it can easily be
| used for the same things.
|
| Maybe it would be more accurate to say that the way they were
| using associative arrays way newish, rather than the
| construct itself.
| kragen wrote:
| in a sense lisp alists are 'associative arrays', sure. but,
| although they were used in the metacircular interpreter that
| bootstrapped lisp and still inspires us today, they aren't a
| language feature, and the lisp ones aren't efficient enough for
| the things you use them for in awk
|
| you can of course build an associative array in any language,
| especially if you're satisfied with the linear search used by
| lisp alists. you can hack one together in a few minutes almost
| without thinking: .intel_syntax
| noprefix lookup: push rbx push rbp
| mov rbx, rdi # pointer to alist node pointer
| mov rbp, rsi # key 2: mov rdi, [rbx] # load
| pointer to alist node test rdi, rdi # is it
| null? jnz 3f # if not null, skip not-
| found case mov rax, rbx # pointer to null
| pointer is return value xor rdx, rdx # but
| associated value is null (sets zero flag) jmp
| 4f # jump to shared epilogue 3: mov rdi,
| [rdi] # load pointer from key field of alist node
| mov rsi, rbp # reload key from callee-saved register
| call comkey # sets zero flag if rsi and rdi are equal keys
| jne 1f # skip following return-value case code if !=
| mov rax, rbx # pointer to node pointer is return value
| mov rdx, [rbx] # also let's follow that pointer to the node
| mov rdx, [rdx + 16] # and return its value field too in rdx
| test rax, rax # clear zero flag (valid pointer is not null)
| 4: pop rbp pop rbx ret
| 1: mov rbx, [rbx] # load pointer to alist node again
| add rbx, 8 # load pointer to next-node pointer field
| jmp 2b
|
| (untested, let me know if you find bugs)
|
| but that's very different from having them built in as a
| language feature, like snobol4, mumps, awk, perl, python, tcl,
| lua, and js do. the built-in language feature lisp had for
| structuring data 20 years earlier was cons, car, and cdr, which
| is not at all the same thing
|
| other programs on unix that contained associative arrays prior
| to awk included the linker (for symbols), the c compiler, the
| assembler, the kernel (in the form of directories in the
| filesystem), and the shell, for shell variables. none of these
| had associative arrays as a programming language feature,
| though. the bourne shell came closest in that you could
| concatenate variable names and say things like
| eval myarray_$key=$val
|
| here's the implementation of binary tree traversal for setting
| shell variables in the v7 bourne shell from 01979
| NAMPTR lookup(nam) REG STRING
| nam; { REG NAMPTR nscan=namep;
| REG NAMPTR *prev; INT LR;
| IF !chkid(nam) THEN failed(nam,notid);
| FI WHILE nscan DO IF
| (LR=cf(nam,nscan->namid))==0 THEN
| return(nscan); ELIF LR<0
| THEN prev = &(nscan->namlft); ELSE
| prev = &(nscan->namrgt); FI
| nscan = *prev; OD /* add
| name node */ nscan=alloc(sizeof *nscan);
| nscan->namlft=nscan->namrgt=NIL;
| nscan->namid=make(nam); nscan->namval=0;
| nscan->namflg=N_DEFAULT; nscan->namenv=0;
| return(*prev = nscan); }
|
| if you're not sure, that's c (now you know where the ioccc came
| from)
| foobarian wrote:
| I love that Bourne abused C macros to make sh source look
| like Algol. Lots of references to this elsewhere but this is
| my favorite:
|
| https://www.tuhs.org/cgi-
| bin/utree.pl?file=V7/usr/src/cmd/sh...
| FelipeCortez wrote:
| There's a more comprehensive interview that also includes Aho and
| Weinberger in the book Masterminds of Programming. Highly
| recommended
| chasil wrote:
| His wiki lists several interviews, for those inclined:
|
| https://en.wikipedia.org/wiki/Brian_Kernighan
|
| http://www.linuxjournal.com/article/7035
|
| http://www-2.cs.cmu.edu/~mihaib/kernighan-interview/index.ht...
|
| https://web.archive.org/web/20090428163341/https://www.princ...
|
| https://web.archive.org/web/20131126220450/http://princetons...
| JonChesterfield wrote:
| I'm very taken with the regex to lex to yacc to awk sequence of
| developments. There's a very convincing sense of building on
| prior work to achieve more general results.
| kragen wrote:
| i've been reading _the unix programming environment_ from 01983
| this week (the old testament of kernighan and pike), about 35
| years later than i really should have. awk is really one of the
| stars of the book, the closest thing in it to currently popular
| languages like js, lua, python, perl, or tcl. (what awk calls
| 'associative arrays' js just calls 'objects'.)
|
| the 7th edition unix version of awk from 01979
| https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/aw...
| (from
| https://www.tuhs.org/Archive/Distributions/Research/Henry_Sp...)
| is only 2680 lines of source code, which is pretty astonishing.
| the executable is 46k and ran in the pdp-11's 64k address space.
| as far as i can tell, that version of awk, and also the one
| documnted four years later in _tupe_ , didn't even have user-
| defined functions. neither did the v7 version of the bourne
| shell, i think
|
| bc, however, did
| mananaysiempre wrote:
| > that version of awk [...] didn't even have user-defined
| functions
|
| Also known as "old awk" or (retronymically) "oawk". Heirloom
| still carries[1] what should be a descendant of it (alongside
| the more familliar nawk).
|
| Also,
|
| > the executable is 46k
|
| thus still a bit larger than Turbo Pascal 3.02[2] :) That
| didn't have a regex engine, admittedly.
|
| [1] https://heirloom.sourceforge.net/man/awk.1.html
|
| [2] https://prog21.dadgum.com/116.html
| coliveira wrote:
| Turbo Pascal was written in asm and highly optimized to
| reduce its size.
| tannhaeuser wrote:
| > _the closest thing [...] to currently popular languages like
| js_
|
| awk is more than that: having introduced the "function"
| keyword, for e in a, semicolon-less syntax, stringly-typing,
| delete x[e], etc, etc, it's quite obviously the base for
| JavaScript, and Brendan Eich said as much [1].
|
| [1]: https://brendaneich.com/2010/07/a-brief-history-of-
| javascrip...
| kragen wrote:
| i'd say perl is closer, but for (e in a) is from awk and not
| perl
|
| semicolonless syntax is in js, tcl, lua, and sh, but not awk
| or perl; in awk omitted semicolons are interpreted as string
| concatenation
| dakiol wrote:
| I read the same book and got the exact same conclusion. Then I
| read The UNIX--Haters Handbook and thought that Unix tools
| (like awk) are sadly not good enough anymore.
| kragen wrote:
| yeah, there are a lot of questionable design decisions in
| there that are only defensible in the context of the pdp-11's
| 64k address space and slow cpu
| gregw2 wrote:
| Little known random Brian Kernighan facts:
|
| * He joined Princeton's CS department in 2000 but taught at least
| one class there as early as 1993 while still at Bell Labs
| Research (on sabbatical?)
|
| * One of his students regularly brought a 386sx laptop (running
| pre-1.0 Linux) to class and when Brian was asked more obscure
| questions about what awk did which he couldn't remember, the
| student would run commands in awk and feed Brian the definitive
| implementation answer. So Brian had some exposure to Linux
| moderately early on.
|
| * Here's a writeup from him on putting AT&T's toll free phone
| directory on the internet back in fall 1994:
| https://www.cs.princeton.edu/~bwk/800.html
___________________________________________________________________
(page generated 2024-07-17 23:01 UTC)