--Topography-------------------------------------------------------------------- AWK Workshop / Discussion - April 11 - 23, 2026 * what: an informal exploration of plain AWK, aka "new AWK" * where: SDF.org - both pcom ("awk" room) and irc.sdf.org ("#awk") * when: Saturdays, 10-11am MDT ; Tuesdays & Thursdays, 6-7pm MDT -------------------------------------------------------------------------------- [b******s] guess it's too late to back out.. [b******s] Day 1 bullet points are posted on www page [b******s] https://rawtext.club/~woog/misc/awk_workshop.html [a******r] greetings to all from Georgia. my first time in pcom [b******s] welcome [b******s] pcom used to be a place of danger and depravity [j******e] o/ [b******s] hey, glad you could make it [b******s] probably this Day 1 stuff will bore you [b******s] okay, looks like it's 10am [b******s] the Day 1 bullet points are on the www page but I can just start pasting them in and we can discuss any of interest [b******s] Day 1: Basic intro, pattern-action, BEGIN / END blocks, Unix OS interaction [b******s] AWK dates back to the early days of Unix development at Bell labs. [b******s] developed initially as an extended grep(1) for report generation, etc [b******s] AWK is interpreted (no compiling) with no type declarations [b******s] Arnold Robbins (AR), the Gawk developer, calls AWK "data-driven"; describe data (via regex) then take action [b******s] the main body often consists of a sequence of pattern { action } pairs [b******s] hi j******h ; to review use 'r' or 'R #lines' [b******s] 'c' clears [b******s] anyway, AWK's default treats each line as a "record" w/ fields separated by "whitespace" [j******h] thanks b******s! I accidentally joined 'com' at first, not pcom. Glad I found my way here eventually. ah good [b******s] ya I picked pcom bc even pre-vals can join [b******s] BTW, on SDF there appear to be several AWKs installed, and a few more that could be requested [b******s] AWKs currently installed on SDF: native awk(1), mawk(1), gawk(1) [b******s] NetBSD's pkgsrc also has original-awk (OTA) and misc. heirloom AWKs [b******s] => can request on bboard>>REQUESTS [b******s] A. W. & K., the original AWK developers consider AWK a "small language", not really a full stand-alone general-purpose language [b******s] @j******e you can use any of them but the native NetBSD awk is closest to the One True AWK you saw reference to elsewhere [a******r] gawk and mawk just add features to awk(1) ? [b******s] it's just a few of us so if you want to stop just say so [b******s] @******r mawk is very similar to the native netbsd awk but seems faster and has some other nice features [j******e] Fair enough. [b******s] gawk has lots of extensions; basically it's A.R.'s attempt to make AWK a general purpose lang [b******s] questions so far? [b******s] onward then [b******s] The 3 AWK code blocks: BEGIN{} , body + usr-def. funcs., END{} [b******s] I guess you could break user-defined functions out as a 4th block [b******s] as you might guess, BEGIN is for setup *before* the input data is read [b******s] can use for setting vars., format strings, etc [b******s] heh [b******s] ..and END is for any post-processing of the read in data [a******r] you oca you can have more than one begin block, right? [b******s] yup, you can have multiple BEGINs and ENDs which get concatenated into one block each in the order given [j******e] I seem to recall there being tiers of BEGIN/END blocks, but maybe that was a gawk extension, and maybe I just imagined it. [a******r] sorry, i jumped ahead [b******s] other than BEGIN & END, order mostly doesn't matter, i.e... [b******s] $ awk -f begin.awk -f body.awk -f end.awk -f funcs.awk [b******s] you can rearrange those in any order, same within a script [j******e] Is there something akin to an #include directive, so that you can list all the files in one primary .awk file? [p******e] o/ [b******s] not in plain AWK, the .include thing is a gawk extension. I'm sure you could write something though [j******e] If not, I guess you could just write everything in one file... [b******s] ya, or use a shell wrapper script to pull everything in [p******e] Question on order [b******s] ? [j******e] The shell wrapper is a good idea. [p******e] you mean order doesn't matter when passing files containing BEGIN/END/etc? Or the order of the patterns? [j******e] Seems the simplest solution. [j******h] A shell wrapper, or a Makefile. [b******s] @p******e: the order of the various blocks; the order of the pattern-action pairs matters alot [j******e] j******h: Would a Makefile make sense in a non-compiled language? [b******s] ya a Makefile would be very good, just need to remember how to write Makefiles.. [p******e] ok [p******e] @j******e a makefile always makes sense. [j******h] j******e: Sure, if your only purpose is to glob for files matching *.awk and concatenate them into one giant script that awk will eventually run. [b******s] make(1) can be used for lots of things actually [b******s] anyway, this makes a good point, that at least for plain AWK it's best to think of it as part of the POSIX toolkit [b******s] continue? [j******e] UNIX Philosophy 101 [j******e] Please. [a******r] please [b******s] AWk seems to work best with semi-uniform line-based text; CSV & unicode is mostly supported [j******h] yes please [b******s] data can be read via stdin (piped||typed) or files: [j******e] I assume unless you have line breaks in your CSV cells. [b******s] ya I think it can be a bit of a challenge still working w/ CVS [b******s] er, csv [j******e] I believe I wrote a tool to convert CSV to/from a more AWK-friendly format. [b******s] likely need to "clean" the data first [j******h] TSV for the win. [j******e] I can look that up later if you like. [b******s] data input examples: $ echo 'hello' |awk '//' ; awk '//' world.txt [b******s] BTW, the '//' is a pattern w/o an action; just matches everything [j******h] The nice thing about Tab-Separated Values is that you don't have to change the FS variable when running awk on them; the default "split on whitespace" is fine. [b******s] right, and it will ignore leading/trailing whitespace [b******s] as for output, that can be stdout or another POSIX tool [j******e] unless it's meant to be included in the cell. [b******s] lol, sounds tramatic.. [b******s] here's an amusing illustration of pritning to a cmd: ------------------------------------------------------------------------ output can be written to stdout, files or other POSIX tools # ex. # $ echo 'AWKward..' |awk -vMoo='cowsay' '{print |Moo}END{close(Moo)}' # ___________ # < AWKward.. > # ----------- # \ ^__^ # \ (oo)\_______ # (__)\ )\/\ # ||----w | # || || # ------------------------------------------------------------------------ [b******s] use 'r' to review dumps [p******e] what's close(Moo) for? [b******s] except for when you read/write from stdin/stdout it's recommended to use close() sinks/sources [b******s] *except [b******s] mostly it's when you are repeatedly reading/writing to a sink/source [p******e] I think I'm lost. Isn't Moo a variable? [b******s] it is! you can set vars either with '-v' or passing them as CLI args [j******e] Can you elaborate on the -vMoo='cowsay'? It looks like a variable, but the fact that it needs to be close()'d makes me wonder if there's something more going on there. [a******r] cowsay is a program [b******s] right, it's as if we had a BEGIN{Moo="cowsay"} ; cowsay is a program so we are piping output to it in the body block [j******e] a******r: Yeah, it's more the -vMoo= part I'm confused about. [a******r] does the print command run the Moo, which = cowsay? [b******s] it could be written as # awk 'BEGIN{print "AWKward" | "cowsay"}' [j******h] I should have looked up the quote from yeti in a long-ago bboard thread. yeti sang the praises of awk in contrast to shell, with something like "variables are constant, no $xxyyzz in a string is expanded by accident,..." [p******e] I think the print prints the input line ("AWKward..") followed by "|Moo", so "AWKward.. | cowsay" [p******e] OH, you're directly piping to "the content" of Moo? [b******s] ya (I think) [p******e] no I'm lost again [p******e] XD XD XD [j******e] p******e: I think that's what the pipe is for, yes. [b******s] ya, if 'Moo' was a file you'd use 'print ... > Moo' [b******s] but it's a cmd here [b******s] basically was just trying to illustrate an alternate sink for the output [j******h] I think what yeti was getting at was that interpolating the value of a variable into a string is not something you can trigger by accident in awk, you have to ask for it explicitly. [p******e] ok I'm back onboard now [j******e] b******s: In your later example, that would write *to* Moo? [b******s] ya, '>' or '>>' are like shell, the write [j******e] b******s: And I presume < Moo would read from Moo? [b******s] *they [b******s] well ya I think you could do something like $ awk -f- no block [p******e] Cool, cool [b******s] ya it seems to tie back to the assumptions baked into AWK regarding how data should be processed [b******s] continue? [j******e] Yes, please. [b******s] and an END example for balance (post-procssing of data): [a******r] yes ------------------------------------------------------------------------ # END mostly for wrap-up, process/print collected data from body: # ex. # $ seq 0 9 | awk '{Sum = Sum + $1} END{print "Sum =", Sum}' # Sum = 45 # ------------------------------------------------------------------------ [b******s] use 'r' to disply [b******s] here the data is piped in from seq(1) and collected in the body block in Sun variable [b******s] after all the data is read END kicks in and prints Sum [b******s] the body has an action w/ no pattern so it's applied to every line of data [b******s] ya, 0-9 , each on a separate line [j******e] I'm surprised you didn't have to do BEGIN{Sum = 0} [b******s] ya that is one of those baked-in AWK assumptions; any unassigned var has a value of 0 or "" [b******s] we'll talk more about that another day [j******e] Are uninitialized variables treated implicitly as 0, or only when math is done on them? [b******s] well there's no types so I think basically they are strings unless used as numbers [p******e] Q: variables start with capital just as an idiomatic thing? [b******s] so if we have Moo="42" then do print Moo / 3 you'll get 14 [j******e] In other words: "" is treated as 0 for math purposes, I guess. [j******h] they do have associative arrays, though. Some people might regard that as a distinct type. [b******s] if we do Moo="42" then print Moo, "is the answer" you get "42 is the answer" [b******s] BTW, all AWk math is floating point, for good or bad [j******e] And the space is added because it's the column separator? [b******s] the default output field separator (OFS) is " " [b******s] you can change it of course [b******s] one last thing, the body block cal be thought of as a data gauntlet where each record will keep getting matched in the order of the pattern-action pairs unless it's interupted along the way [p******e] I'd say for bad? [p******e] lol [b******s] ya I think it'll depend on what you're trying to do; for serious maths probably AWK isn't the right choice, or at least not the main part of the choice [b******s] bc(1) can do serious maths [b******s] or use python/perl, whatever [b******s] well 11am here; there are some short example code bits torwards the end of the Day1 sheet you can play around with. [b******s] any other questions? [a******r] thanks! thanks for your time and all of the resources you listed! [b******s] no problem! glad ppl actually showed up! I only see tftp in IRC, lol [j******h] I found y******i's bboard quote, if anybody is interested. [b******s] ya y******i is pretty knowledgeable on AWK and much more [j******e] (p)com > IRC [p******e] b******s++ [b******s] ty ty [j******e] ...but also != IRC [b******s] ya I may hang a bit in there to see if questions come up [j******h] y******i: "Apart from some minor glitches, I think awk makes a better scripting language than *sh. Once gotten used to it is more readable than *sh. Strings are constants, no $xyzzy in a string is expanded by accident. Those awk pipes neatly glue together shell commands so I think it could be the main scripting language, if it gets some minor improvements." [j******e] b******s++ Thanks for putting this all together. I've got to head out though. [b******s] ya have a great weekend ya'all [j******h] thanks, you too! [p******e] "BEGIN mostly for var setup; FILENAME and vars set as args *NOT* accessible.." are those two lsentences linked? I mean: do filename and args become accessible only after BEGIN? [b******s] ya, kind of skipped over that [b******s] ya vars set as CLI args as well has vars like FILENAME aren't available in BEGIN, at least not directly [b******s] WRT FILENAME it's because it gets reset to each data file given as an arg [p******e] ok [b******s] the code block just below that shows that you *could* access these indirectly via the ARGV array ------------------------------------------------------------------------ # ..HOWEVER, cmd line args *ARE* accessible via the ARGV[] array! # ex. # $ awk 'BEGIN{print "say =", say; \ # for(i in ARGV)printf "ARGV[%d] = %s\n", i, ARGV[i]}' cow say=Moo 42 # say = # ARGV[2] = say=Moo # ARGV[3] = 42 # ARGV[0] = awk # ARGV[1] = cow ------------------------------------------------------------------------ [p******e] makes sense [p******e] odd that ARGV gets iterated off order [b******s] ya, another AWk quirk; arrays are associative and stored in seemingly in random order [p******e] internal storing of the array? [b******s] but you could use the ARGC var -- the # of args -- to iterate over ARGV 0-ARGC [a******r] so you could have a standard begin file, then a data specific begin file, and list both on the command line and they get joined? [b******s] ya they'll all get mashed together [b******s] BTW, plain AWK has issues w/ CLI args that are hyphenated; you have to do $awk -f code.awk -- --help fubar derp [b******s] unfortunately for executable scripts you can't use the "--" so you'd have to use a shell wrapper or use mawk or gawk which have '-We' or '-E' options that avoid the issue