[HN Gopher] Awk Technical Notes
___________________________________________________________________
Awk Technical Notes
Author : todsacerdoti
Score : 94 points
Date : 2023-03-28 12:41 UTC (1 days ago)
(HTM) web link (maximullaris.com)
(TXT) w3m dump (maximullaris.com)
| meltedcapacitor wrote:
| Awk is an improvement on most of its successors.
|
| (h/t Tony Hoare)
| jrochkind1 wrote:
| My very first job getting paid to write software was writing in
| scripts in Awk to parse and analyze some software log files, for
| a faculty software researcher, in, maybe, 1997? i didn't know Awk
| before, it's just what I inherited. Spent a few hours with the
| O'Reilly book, and I was like, okay, sure, let's go.
|
| As the stuff we were doing in that project got more complex, at
| some point someone suggested to teenage me "You might want to
| look at Perl for this now," and then I moved to that. (with the
| Camel O'Reilly book, of course!)
|
| Haven't touched either one in years now.
|
| Learning new things can be much more overwhelming for me now, I
| don't know how much is me vs environment. But I am nostalgic for
| those days where I'd sit down with a print book, and within hours
| have a grasp of the fundamentals, or within days feel like I had
| basic fundamental conceptual understanding of the whole dang
| thing (not of every possible feature, but of the conceptual
| framework, the big picture).
| cbazz wrote:
| [flagged]
| version_five wrote:
| This commentor is a troll, see history if the above comment
| isn't enough.
|
| Edit: what the fuck is going on?
| cbazz wrote:
| I don't get how you could conclude I'm a troll? I'm not
| spamming nor arguing with anyone, just sharing my opinions
| and experiences.
| meindnoch wrote:
| No, you're copy/pasting low-effort ChatGPT babble.
| bioemerl wrote:
| Looks like they have a history of using chatGPT to post
| comments, specifically.
| version_five wrote:
| Right, but somehow they've been the top post for 1/2 hour
| and I got modded way down for pointing out it was an
| obvious troll. I hesitate to comment because I assume
| that's what the script kiddie is looking for out of this.
| bioemerl wrote:
| I think the problem you're running into is that this
| particular comment looks human written?
| 2h wrote:
| I used AWK for many years, but one day I realized that I had
| pushed AWK beyond whats its meant for, same as the author here.
| classic red flag from the article: function
| NUMBER( res) { return (tryParse1("-", res) || 1) &&
| (tryParse1("0", res) || tryParse1("123456789", res) &&
| (tryParseDigits(res)||1)) && (tryParse1(".", res) ?
| tryParseDigits(res) : 1) && (tryParse1("eE", res) ?
| (tryParse1("-+",res)||1) && tryParseDigits(res) : 1) &&
| asm("number") && asm(res[0]) }
|
| why put yourself through this, when you can just do something
| like this instead: package parse
| import "strconv" func parse_float(s string)
| (float64, error) { return strconv.ParseFloat(s, 64)
| } func parse_int(s string) (int64, error) {
| return strconv.ParseInt(s, 10, 64) }
| donio wrote:
| Not disagreeing with the overall point but that particular
| example is from an AWK JSON parser implementation so the whole
| point is to do it in AWK. If you look at the entire file it's
| not too bad considering.
|
| Funnily the actual Go JSON decoder code ends up doing something
| similar during scanning:
|
| https://github.com/golang/go/blob/master/src/encoding/json/d...
| pmarreck wrote:
| Depends on the AWK implementation, apparently.
| bash> awk -v v="80.1%" 'BEGIN{print v+0.1}' 80.2
|
| gawk has `strtonum`. But yes, parsing in awk generally looks
| like a pain. With plain positive/negative ints though, not so
| hard: echo "123456" | awk '{ if
| ($0 ~ /^-?[0-9]+$/) { num = 0
| sign = 1 start = 1 if
| (substr($0, 1, 1) == "-") { sign = -1
| start = 2 } for (i = start; i
| <= length($0); i++) { digit = substr($0, i,
| 1) num = num * 10 + digit }
| num = sign * num print "The integer is:", num
| } else { print "Invalid input string:", $0
| } }'
| czx4f4bd wrote:
| As mentioned, the example you quoted is from a pure-AWK JSON
| parser. I don't dispute that AWK has issues, but AWK is one of
| those languages that magically coerces strings to numbers, so
| you can just write `"1" + 2 + "3.5"` and it'll work.
| elteto wrote:
| Big, big fan of AWK. It sometimes feels like ancient, alien UNIX
| technology to me. But lately I've been gravitating more and more
| towards perl. You can write the same one liners (with perl -e and
| friends), it has superb support for regexes and it's just a more
| capable language (as expected, not bashing AWK).
| pcwalton wrote:
| You can use Ruby for this task too. I used to use Perl for
| throwaway one-liners, but on advice I switched to Ruby because
| of the bigger community and I'm pretty happy with it.
|
| (Python isn't as nice for one-liner text processing, both
| because of the lack of Awk heritage--so no built-in regex
| syntax--and because of the indentation-based syntax requiring
| newlines for most things.)
| jalk wrote:
| OMG never realized that $ is an infix operator - Plenty of times
| where I needed something like $(NF-1) and instead used verbose
| stuff like NF==5 { ... } NF==6 { ... }
| jrochkind1 wrote:
| I don't think you actually mean "infix"?
| colonwqbang wrote:
| > AWK was designed to not require a GC
|
| ...
|
| > The most substantial consequence is that it's forbidden to
| return an array from a function, you can return only a scalar
| value.
|
| This doesn't make sense to me. Does someone understand what it
| means?
|
| In e.g. C++ a function can return an array without any GC or
| refcounting, by "moving" the array into the caller's stack.
| doctor_eval wrote:
| I am not an expert but I'd say it's because Awk arrays are
| associative; they are more like maps than slices, to use Go
| terminology. And IIRC (it's been a while) the array values are
| not strongly typed. So I think you could even say:
| a[1] = "hello" a["world"] = 2
|
| That means that - unlike C arrays - Awk arrays are not a
| simple, addressable byte range, but a complex data structure
| with lots of pointers.
|
| I suppose you could come up with a way to serialise the array
| and pop it on the stack but that would be a lot of work, and
| for the kind of things I use Awk for, the arrays would often be
| huge.
| [deleted]
| eimrine wrote:
| The author is really persuading me to learn awk because he use to
| talk about the very reasons I avoid to do it as a faulty reasons,
| and I consider his reasoning as decent.
| version_five wrote:
| I'm already a casual awk enthusiast but I'm really hoping to find
| an opportunity to use it for a "real" software project soon. I've
| been reading the gawk user manual, and suffice to say, the power
| and features of the language is dramatically underutilized for
| most of the things people normally do with it (my most common use
| case is probably a hybrid of grep and cut)
|
| https://www.gnu.org/software/gawk/manual/gawk.html
| dc-programmer wrote:
| I'm usually not a big side project guy, but I successfully used
| AWK to solve a IRL problem last year. It really helped solidify
| my understanding of the language.
|
| The problem was that the Garmin GPS data for a bike ride I had
| just completed had split into multiple rides. I used AWK to
| stitch together the data into one file. I also did some basic
| linear interpolation to fill in missing data points.
|
| The GPS data is formatted as XML and I was able to parse it
| fairly robustly using AWK.
| account-5 wrote:
| How did you parse XML with AWK? I would never think of using
| AWK for XML data. I'd even stear clear of CSV data unless I
| could guarantee no in field commas or newlines.
| ufo wrote:
| I find that I tend to use AWK for text munging tasks that are
| too small to call a "project".
| Arnavion wrote:
| I wrote an IRC bot in it, one of those "paste a line of code
| and the bot will evaluate it and print the result" bots that
| you find in programming language channels. It's not a
| particularly big or "real" project, but it definitely fulfills
| the need of having a bot in that particular IRC channel.
|
| awk is great for it because IRC (or at least the subset that
| the bot cares about) is relatively easy to parse, and shelling
| out the shell script that does the actual code evaluation and
| prints the result back is also fairly straightforward. Someone
| else used to have such a bot before but they had written it in
| Rust with a bajillion dependencies; if I had done that I
| would've had to update dependencies and redeploy it every other
| week. In contrast I deployed my awk version once and then
| basically haven't touched it in years.
___________________________________________________________________
(page generated 2023-03-29 23:01 UTC)