[HN Gopher] Scripting with Go (2022)
       ___________________________________________________________________
        
       Scripting with Go (2022)
        
       Author : gus_leonel
       Score  : 115 points
       Date   : 2023-08-20 10:18 UTC (12 hours ago)
        
 (HTM) web link (bitfieldconsulting.com)
 (TXT) w3m dump (bitfieldconsulting.com)
        
       | simonw wrote:
       | If you're not familiar with Go there is one detail missing from
       | this post (though it's in the script README) - what a complete
       | program looks like. Here's the example from
       | https://github.com/bitfield/script#a-realistic-use-case
       | package main              import (
       | "github.com/bitfield/script"         )              func main() {
       | script.Stdin().Column(1).Freq().First(10).Stdout()         }
        
         | alexk307 wrote:
         | The whole point of using Go is to explicitly handle errors as
         | they happen. All of these steps can fail, but it's not clear
         | how they fail and if the next steps should proceed or be
         | skipped on previous failures. This is harder to reason about,
         | debug, and write than grep and bash.
        
           | mst wrote:
           | It defaults to not running the rest if a step fails, and the
           | error result is accessible via usual mechanisms.
           | _, err := script.Foo(...).Bar(...).Stdout();       if err !=
           | nil {         log.Fatal(err)       }
           | 
           | is sufficient for a quick scripting hack designed to be run
           | interactively.
           | 
           | I don't see it as a lot different to bash scripts with -e and
           | pipefail set, which is generally preferable anyway.
           | 
           | Plenty of go code does                 if err != nil {
           | return nil, err;       }
           | 
           | for each step and there are plenty of cases where you only
           | care -if- it failed plus a description of some sort of the
           | failure - if you want to proceed on some errors you'd split
           | the pipe up so that it pauses at moments where you can check
           | that and compensate accordingly.
           | 
           | (and under -e plus pipefail, "error reported to stdout
           | followed by aborting" is pretty much what you get in bash as
           | well, so I'm unconvinced it's actually going to be harder to
           | debug)
        
           | coldtea wrote:
           | > _The whole point of using Go is to explicitly handle errors
           | as they happen_
           | 
           | That's hardly the whole point of using Go.
           | 
           | The friendlier syntax (and in this case DSL) is an ever
           | bigger point.
           | 
           | In any case, you can trivially get at the error at the point
           | it occured:
           | 
           | n, err := script.File("test.txt").Match("Error").CountLines()
        
           | simonw wrote:
           | I believe error handling looks like this:
           | package main              import (
           | "github.com/bitfield/script"         )              func
           | main() {              _, err :=
           | script.Stdin().Column(1).Freq().First(10).Stdout()
           | if err != nil {                 log.Fatal(err)             }
           | }
           | 
           | Errors are "remembered" by the pipeline and can be processed
           | when you get to a sink method.
        
           | fpoling wrote:
           | From a technical point of view nothing prevents the scripting
           | package to be just as informative with errors as bash and
           | have a helper to log and clear the error. If it is not
           | already the case, I call it a bug.
        
         | jvictor118 wrote:
         | If one were actually going to use something like this, I'd
         | think it'd be worth implementing a little shebang script that
         | can wrap a single-file script in the necessary boilerplate and
         | call go run!
        
           | jayd16 wrote:
           | Hmm, I wonder if this is Microsoft's real endgame with
           | allowing the single line C# syntax.
        
           | simonw wrote:
           | That's a really fun idea. I got that working here:
           | https://til.simonwillison.net/bash/go-script
           | 
           | Now you can run this:                   cat file.txt |
           | ./goscript.sh -c
           | 'script.Stdin().Column(1).Freq().First(10).Stdout()'
           | 
           | Or write scripts like this - call it 'top10.sh':
           | #!/tmp/goscript.sh
           | script.Stdin().Column(1).Freq().First(10).Stdout()
           | 
           | Then run this:                   chmod 755 topten.sh
           | echo "one\none\ntwo" | ./topten.sh
        
       | ComputerGuru wrote:
       | Tangentially related: I posted a shebang for scripting in rust
       | some years ago, if anyone is interested:
       | https://neosmart.net/blog/self-compiling-rust-code/
        
       | booleandilemma wrote:
       | I like Go, but its insistence on not permitting unused imports
       | and unused variables make it unsuitable for scripting, imo.
       | 
       | For scripting I want something that I can be fast and messy in.
       | Go is the opposite of that.
       | 
       | It's ok, a language doesn't have to be good at everything.
        
         | nprateem wrote:
         | There should totally be a compiler flag to not require those
        
       | ilyt wrote:
       | Perl was _literally_ made for that, just use it
        
       | nerdbaggy wrote:
       | I ended up using this for my cli scripting needs.
       | https://github.com/google/zx
        
       | nullwarp wrote:
       | Oh very neat, thanks for posting I will definitely give this a
       | try.
        
       | earthboundkid wrote:
       | This post is several years old fwiw.
        
         | everybodyknows wrote:
         | There's a cute little icon telling us "Feb 21" right at the top
         | but omitting the _year_ , which would have been ever so
         | helpful.
        
       | geenat wrote:
       | Would love to use more golang- amazing build system and cross
       | compiler built in. "All in one" binaries are the best thing ever.
       | I adore most of the ideas in the language.
       | 
       | .... but there are just soooo many little annoyances /
       | inconveniences which turn me off.
       | 
       | - No Optional Parameters. No Named Parameters. Throw us a bone
       | Rob Pike, it's 2023. Type inferred composite literals may be an
       | OK compromise.. if we ever see them:
       | https://github.com/golang/go/issues/12854
       | 
       | - Unused import = will not compile. Unused variable = Will not
       | compile. Give us the ability to turn off the warning.
       | 
       | - No null safe or nullish coalescing operator. (? in rust, ?? in
       | php, etc.)
       | 
       | - Verbosity of if err != nil { return err; }
       | 
       | - A ternary operator would be nice, and could bring if err != nil
       | to 1 line.
       | 
       | - No double declarations. "no new variables on left side of :="
       | .. For some odd reason "err" is OK here... Would be highly
       | convenient for pipelines, so each result doesn't need to be
       | uniquely named.
       | 
       | I'd describe Go as a "simple" language- Not an "easy" language.
       | 1-2 lines in Python is going to be 5-10 lines in golang.
       | 
       | Note: Nim has most of these..
        
         | skybrian wrote:
         | Have you tried Deno?
        
           | geenat wrote:
           | URL?
        
             | skybrian wrote:
             | https://deno.land/
             | 
             | This is Typescript, but you have language complaints, and
             | it will build binaries.
        
         | fpoling wrote:
         | The error handling verbosity in Go should be blamed partially
         | on the formatter that replaces one-liner if err != nil { return
         | err } with 3 lines.
        
         | myzie wrote:
         | Agreed!
         | 
         | Shameless plug: this is why I built Risor.
         | 
         | https://github.com/risor-io/risor
         | 
         | Keep in the Go ecosystem, retain compatibility with the Go
         | programs you already have, but have a much more concise
         | scripting capability at your disposal.
        
           | geenat wrote:
           | Looks more useful than OP.
        
         | IshKebab wrote:
         | I agree. Go has such amazing infrastructure it's a huge shame
         | the language is so stubbornly basic.
        
       | pdimitar wrote:
       | Shell scripting is quite fine up until certain complexity (say
       | 500-1000 lines), after which adding even a single small feature
       | becomes a huge drag. We're talking hours for something that would
       | take me 10 minutes in Golang and 15 in Rust.
       | 
       | Many people love to smirk and say "just learn bash properly, duh"
       | but that's missing the point that we never do big projects in
       | bash so our muscle memory of bash is always kind of shallow. And
       | by "we" I mean "a lot of programmers"; I am not stupid, but I
       | have to learn bash's intricacies every time almost from scratch
       | and that's not productive. It's very normal for things to slip up
       | from your memory when you're not using them regularly. To make
       | this even more annoying, nobody will pay me to work exclusively
       | with bash for 3 months until it gets etched deep into my memory.
       | So there's that too.
       | 
       | I view OP as a good reminder that maybe universal-ish tools to
       | get most of what we need from shell scripting exist even today
       | but we aren't giving them enough attention and energy and we
       | don't make them mainstream. Though it doesn't help that Golang
       | doesn't automatically fetch dependencies when you just do `go run
       | random_script.go`: https://github.com/golang/go/issues/36513
       | 
       | I am not fixating on Golang in particular. But IMO
       | _next_bash_or_something_ should be due Soon(tm). It 's not a huge
       | problem to install a single program when provisioning a new VM or
       | container either so I am not sure why are people so averse to it.
       | 
       | So yeah, nice article. I like the direction.
       | 
       | EDIT: I know about nushell, oilshell and fish but admittedly
       | never gave them a chance.
        
       | tambourine_man wrote:
       | The pipe like code with dot notation reminds me a lot of jQuery.
       | That's a compliment.
        
         | KnobbleMcKnees wrote:
         | I agree with this, and also in a complimentary way, but it all
         | seems very non-idiomatic for Go. But I am not a Go expert by
         | any means.
        
         | myzie wrote:
         | Take a look at Risor and its pipes capability.
         | 
         | https://github.com/risor-io/risor#quick-example
         | 
         | Stay in the Go ecosystem, but gain pipes, Python-like
         | f-strings, and more.
         | 
         | (I'm the author)
        
         | coldtea wrote:
         | Try jq if you haven't already.
        
           | tambourine_man wrote:
           | Yes, but jq's syntax is impossible to memorize for me.
           | 
           | gron | rg
           | 
           | FTW
        
       | js2 wrote:
       | Previous discussion (March 11, 2022 | 243 points | 66 comments):
       | 
       | https://news.ycombinator.com/item?id=30641883
        
       | simonw wrote:
       | Inspired by comments in this thread, I threw together a Bash
       | script that lets you do this:                   cat file.txt |
       | ./goscript.sh -c
       | 'script.Stdin().Column(1).Freq().First(10).Stdout()'
       | 
       | You can also use it as a shebang line to write self-contained
       | scripts.
       | 
       | Details here: https://til.simonwillison.net/bash/go-script
        
       | sgarland wrote:
       | Every time I see things like this, I feel like the person must be
       | unaware of awk.                 # the original one-liner to get
       | unique IP addresses       cut -d' ' -f 1 access.log | sort | uniq
       | -c | sort -rn | head       # turns into this with GNU awk
       | gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END
       | {c=0; for (i in a) if (c++ < 10) print a[i], i}' access.log
       | 
       | It's also far, far faster on larger files (base-spec M1 Air):
       | $ wc -lc fake_log.txt        1000000 218433264 fake_log.txt
       | $ hyperfine "gawk '{PROCINFO[\"sorted_in\"] = \"@val_num_desc\";
       | a[\$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}'
       | fake_log.txt"       Benchmark 1: gawk '{PROCINFO["sorted_in"] =
       | "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ <10)
       | print a[i], i}' fake_log.txt       Time (mean +- s):      1.250 s
       | +-  0.003 s    [User: 1.185 s, System: 0.061 s]       Range (min
       | ... max):    1.246 s ...  1.254 s    10 runs            $
       | hyperfine "cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn
       | | head"       Benchmark 1: cut -d' ' -f1 fake_log.txt | sort |
       | uniq -c | sort -rn | head       Time (mean +- s):      4.844 s +-
       | 0.020 s    [User: 5.367 s, System: 0.087 s]       Range (min ...
       | max):    4.817 s ...  4.873 s    10 runs
       | 
       | Interestingly, GNU cut is significantly faster than BSD cut on
       | the M1:                 $ hyperfine "gcut -d' ' -f1 fake_log.txt
       | | sort | uniq -c | sort -rn | head"       Benchmark 1: gcut -d' '
       | -f1 fake_log.txt | sort | uniq -c | sort -rn | head       Time
       | (mean +- s):      3.622 s +-  0.004 s    [User: 4.149 s, System:
       | 0.078 s]       Range (min ... max):    3.616 s ...  3.629 s    10
       | runs
        
         | [deleted]
        
         | zer8k wrote:
         | I don't understand the downvotes. This is a fair criticism. The
         | author even points out "programs as pipelines" which is
         | _literally_ the UNIX philosophy. There are tools that already
         | exist on UNIX-likes more people should use instead of reaching
         | for a script.
         | 
         | I can sympathize with the author w.r.t wanting to use a single
         | language you like for everything. However, after decades I've
         | found this to be untenable. There are languages that are just
         | simply better for one-off scripting (Perl, Python), and
         | languages that aren't (anything compiled). Trying to bolt an
         | interpreter onto a compiled language from the outside seems
         | like a lot of work for questionable gain.
        
           | sgarland wrote:
           | > The author even points out "programs as pipelines" which is
           | literally the UNIX philosophy.
           | 
           | Yes, and if the thing I'm trying to do has a small input, it
           | will only be done once, etc. I will often just pipe `grep` to
           | `sort` or whatever, because it's less typing, it's generally
           | clearer to a wider range of people, etc.
           | 
           | But on larger inputs, or even things like doing a single
           | pattern inversion mixed with a pattern match, I like awk.
        
           | tomcam wrote:
           | One reason the author could be doing this is to reduce
           | dependencies. Maybe they deploy to Windows or to some other
           | environment not guaranteed to have those utilities. Also
           | testing probably gets simplified.
        
           | ajross wrote:
           | > There are languages that are just simply better for one-off
           | scripting (Perl, Python), and languages that aren't (anything
           | compiled). Trying to bolt an interpreter onto a compiled
           | language from the outside seems like a lot of work for
           | questionable gain.
           | 
           | One reason is deployment. Writing code in python/node/etc...
           | implies the ability of the production environment to
           | bootstrap a rather complicated installation tree for the
           | elaborate runtimes required by the code and all its
           | dependencies. And so there are elaborate tools (npm, venv,
           | Docker, etc...) that have grown up around those requirements.
           | 
           | Compiled languages (and Go in particular shines here) spit
           | out a near-dependency-free[1] binary you can drop on the
           | target without fuss.
           | 
           | I deal with this in my day job pretty routinely. Chromebooks
           | have an old python and limited ability to pull down
           | dependencies for quick test runs. Static test binaries make
           | things a lot easier.
           | 
           | [1] Though there are shared libraries and runtime frameworks
           | there too. You can't deploy a Gnome 3 app with the same
           | freedom you can a TCP daemon, obviously.
        
             | LeBit wrote:
             | I agree with you.
             | 
             | But I think for python you could also deploy a binary with
             | pyinstaller.
        
           | karmakaze wrote:
           | The 'scripting' vs 'compiled' language is a false dichotomy.
           | Awk, Perl, Python are compiled programs. What makes a
           | 'scripting' language special? Dynamic typing? Lack of compile
           | step/delay?
           | 
           | I could imagine a lifetime of collecting scripting
           | macros/libs in lisp to be as good or better.
        
             | tempusr wrote:
             | Python is not a compiled language.
             | 
             | However, the reason Bash is so prolific amongst Sys Admins
             | such as myself is the fact that they are portable and
             | reliable to use across Debian, Arch or RHEL based
             | distributions.
             | 
             | You don't have to import extra libraries, ensure that you
             | are running the proper python environment, or be certain
             | that pip is properly installed and configured for whatever
             | extra source code beyond what is included out of the box.
             | 
             | Bash is the most consistent code you can write to perform
             | any task you need when you have to work with Linux.
        
               | dragonwriter wrote:
               | > Python is not a compiled language.
               | 
               | Python is (at least in the CPython implementation)
               | compiled, to python byte code which runs on the python
               | virtual machine.
               | 
               | Its not compiled to native code. (Unless you use one of
               | the compilers which do compile it to native code, though
               | they tend to support only a subset of python.)
        
               | pdimitar wrote:
               | Another commenter beat me to it but still: sh / bash /
               | zsh are quite fine up until certain complexity (say 500
               | lines), after which adding even a single small feature
               | becomes a _huge_ drag. We 're talking hours for something
               | that would take me 10 minutes in Golang and 15 in Rust.
        
               | mbreese wrote:
               | _> portable and reliable to use across Debian, Arch or
               | RHEL based distributions _
               | 
               | Until you try to use a newer feature or try the script in
               | a Mac or BSD or any older bash.
               | 
               | SH code is completely portable, but bash itself can have
               | quite a few novel features. Don't get me wrong - I'm
               | happy the language is dynamic and still growing. But it
               | can make things awkward when trying to use a script from
               | a newer system on an older server (and the author has
               | been "clever").
        
               | LeBit wrote:
               | Bash is fine for small scripts.
               | 
               | Once you use it to manage complex data structures and
               | flow, you are simply wasting time because you will have
               | to rewrite it in Python or Go.
        
             | heresie-dabord wrote:
             | > The 'scripting' vs 'compiled' language is a false
             | dichotomy.
             | 
             | Not false, but perhaps in need of better definition. The
             | term _script_ has often denoted a trivial set of commands
             | run by $interpreter.
             | 
             | "Scripting languages" have been seen as being in contrast
             | to C, C++, Pascal, Java, SmallTalk, &c. The scripting
             | languages remove from the user the need:
             | 
             | -a- to think about an extensive type system,
             | 
             | -b- to compile the logic, and
             | 
             | -c- to build for a specific architecture.
        
             | paulddraper wrote:
             | Static typing is the key differentiator.
             | 
             | That requires a level of bookkeeping which is helpful for
             | large programs and a nuicense for small programs.
        
             | riku_iki wrote:
             | > Dynamic typing?
             | 
             | actually amount of reasoning, which program requires to
             | perform in run time is close to interpretion.
        
         | ajross wrote:
         | And every time I see things like _that_ , I feel like the
         | person must be unaware of perl.
         | 
         | I've made this point before, but I still find it hilarious. For
         | more than a decade, _awk was dead_. Like, dead dead. There was
         | nothing you could do in awk that wasn 't cleaner and simpler
         | and vastly more extensible in perl. And, yes, perl was faster
         | than gawk, just like gawk is faster than shell pipelines.
         | 
         | Then python got big, people decided that they didn't want to
         | use perl for big projects[1], and so perl went out of vogue and
         | got dropped even for the stuff it did (and continues to do)
         | really well. Then a new generation came along having never
         | learned perl, and...
         | 
         | ... have apparently rediscovered awk?
         | 
         | [1] Also the perl 5 tree stagnated[2] as all the stakeholders
         | wandered off into the weeds to think about some new language.
         | They're all still out there, AFAIK.
         | 
         | [2] Around 2000-2005, perl was The Language to be seen writing
         | your new stuff in, so e.g. bioinformatics landed there and not
         | elsewhere. But by 2015, the TensorFlow people wouldn't be
         | caught dead writing perl.
        
           | sgarland wrote:
           | That's a fair criticism. I know Perl can do pretty amazing
           | things with text, but I've never bothered to learn it.
           | 
           | EDIT: I decided to ask GPT-4 to translate the gawk script to
           | Perl. I make zero claims that this is ideal (as stated, I
           | don't know Perl at all), but it _does_ produce the same
           | output, but slightly slower than the gawk script.
           | $ hyperfine "perl -lane '\$ips{\$F[0]}++; END {print
           | \"\$ips{\$_} \$_\" for (sort {\$ips{\$b} <=> \$ips{\$a}} keys
           | %ips)[0..9]}' fake_log.txt"       Benchmark 1: perl -lane
           | '$ips{$F[0]}++; END {print "$ips{$_} $_" for (sort {$ips{$b}
           | <=> $ips{$a}} keys %ips)[0..9]}' fake_log.txt       Time
           | (mean +- s):      1.499 s +-  0.006 s    [User: 1.447 s,
           | System: 0.050 s]       Range (min ... max):    1.490 s ...
           | 1.507 s    10 runs
        
             | ajross wrote:
             | I would have gone with an iteratively-built list, FWIW, and
             | avoided the overhead in parsing fields the script won't
             | use:                   perl -e 'for $i (<>) { $i =~ s/
             | .*//; push @list, $i; }; print(sort(@list));'
        
           | tgv wrote:
           | I learned perl around that time, and I thought it was awful.
           | And just about everything about it: the parameter passing,
           | the sigils that made BASIC look like Dijkstra's love child,
           | the funky array/scalar coercion, and the bloody fact that it
           | couldn't read from two files at once even though the docs
           | suggested it should work. They didn't say so explicitly,
           | because perl was pretty badly documented. My boss started
           | writing object oriented perl, and that made perl unreadable
           | even to perl experts.
           | 
           | AWK, on the other hand, is simplicity itself. Sure, it misses
           | a few things, but for searching through log files or db dumps
           | it's an excellent tool. And it's fast enough. If you really
           | need much more speed, there are other tools, but _I_ would
           | rather rewrite it in C than try perl again.
        
           | pclmulqdq wrote:
           | I am in the "awk > perl" camp. I think the idea of "vastly
           | more extensible" is a negative for my scripting language, and
           | "cleaner" just doesn't matter - I just want to write it the
           | one time I want to use it and then be done with it. The awk
           | language is really simple and quick to write.
           | 
           | By the way, I think this is why Perl lost to Python on larger
           | scripting and programming projects - it's just easier to
           | write (albeit harder to read, to antagoinze the Python lovers
           | out there).
        
           | tomjakubowski wrote:
           | Sample of one. I came of age on Linux in the late 90s/early
           | 00s. Through other nerds on IRC channels I became familiar
           | with Perl and didn't like it. I also picked up basic awk in
           | the context of one-liners for shell pipelines and it was
           | pretty nice for that. Easier to remember than the flags for
           | cut and friends.
           | 
           | Learning awk a bit more deeply in recent years has been good
           | too. I can write one liners that do more. I shipped a full
           | awk script once, for something unimportant, but I would never
           | do that again. For serious text munging these days I'd rather
           | write a Rust program.
        
           | voidfunc wrote:
           | Perl never recovered from its many ways to do things label.
           | It's a tired criticism of the language but it's lodged in the
           | brains of a generation of programmers which is unfortunate.
           | 
           | Also the classic sysadmin role which used to lean on Perl
           | heavily sort of evolved with rise of The Cloud and automation
           | tools like Chef, Puppet, and Ansible took over in that
           | 2005-2015 time frame.
        
           | telotortium wrote:
           | I mostly use awk over perl because awk is completely
           | documented in one man page, so it's easy to see whether awk
           | will be fit for purpose or whether I should write it using a
           | real programming language. I learned Perl over a decade ago,
           | but not the really concise dialect you would use on the
           | command line for stuff I'd use awk for, and I've forgotten
           | almost all of it now. At least with awk it's easy to relearn
           | the functions I need when I need it.
        
             | ajross wrote:
             | Right, which is sort of my point. 20 years ago, "everyone"
             | knew perl, at least to the extent of knowing the standard
             | idioms for different environments that you're talking
             | about. And in that world, "everyone" would choose perl for
             | these tasks, knowing that everyone else would be expert
             | enough to read and maintain them. Perl was the natural
             | choice.
             | 
             | And in a world where perl is a natural choice for these
             | tasks, awk doesn't have a niche. Because at the end of the
             | day awk is simply an inferior language.
             | 
             | Which is the bit I find funny: we threw out and forgot
             | about a great tool, and now we think that the ancestral toy
             | it replaced is a good idea again.
        
           | tptacek wrote:
           | They taught awk to my boy in bioinformatics as part of his
           | degree. I was like Vito Corleone in the funeral home when he
           | showed me the FASTA parsing awk code they were working on.
        
         | xdsdvsv wrote:
         | way to completely miss the point and turn this into a weird
         | pissing competition (btw your "simple" awk example is super
         | complicated and opaque to someone who doesn't have the awk man
         | page open in front of them)
         | 
         | The script package looks really cool and I'll definitely try it
         | out, cause honestly even though I do a lot of bash scripting
         | it's super painful for anything but something super simple.
        
           | sgarland wrote:
           | If someone doesn't know awk, then of course it'll be
           | complicated and opaque - the same is true of practically any
           | language. One-liners in general also tend to optimize for
           | space. If you wanted it to be pretty-printed and with
           | variable names that are more obvious:                 {
           | PROCINFO["sorted_in"] = "@val_num_desc"         top_ips[$1]++
           | }       END {         counter = 0         for (i in top_ips)
           | {           if (counter++ < 10) {             print
           | top_ips[i], i           }         }       }
           | 
           | But also, if you read further up in the thread, you'll see
           | that another user correctly identified the bottlenecks in the
           | original pipeline, and applying those optimizations made it
           | about 3x as fast as the awk one. Arguably, if you weren't
           | familiar with the tools (and their specific implementations,
           | like how GNU sort and BSD sort have wildly different default
           | buffer sizes), you'd still be facing the same problem.
           | 
           | At least half of what people complain about with shell
           | scripts can be solved by using ShellCheck [0], and
           | understanding what it's asking you to do. I disagree with the
           | common opinion of "anything beyond a few lines should be a
           | Python script instead." If you're careful with variable
           | scoping and error handling, bash is perfectly functional for
           | many uses.
           | 
           | [0]: https://www.shellcheck.net
        
             | subjectsigma wrote:
             | > If you're careful with variable scoping and error
             | handling, bash is perfectly functional for many uses.
             | 
             | "Loaded guns are perfectly functional for juggling, just be
             | careful with the trigger and you won't shoot yourself in
             | the foot!"
             | 
             | You are technically correct but why bother with being
             | careful when you could just avoid writing bash?
        
             | dharmab wrote:
             | > If someone doesn't know awk, then of course it'll be
             | complicated and opaque - the same is true of practically
             | any language
             | 
             | I don't think this is true. Before I learned Go, I could
             | follow along most Go programs pretty well, and learning Go
             | well enough to get started took less than an hour. Every
             | attempt I've made to learn more Awk, I've bounced off.
        
               | [deleted]
        
         | jeffbee wrote:
         | The overwhelming cost of the first shell pipeline, at least on
         | my machine, is caused by the default UTF-8 locale. As I have
         | found in almost every other case, `LC_ALL=C` radically speeds
         | this up.                 Original: 3.294s       w/ LC_ALL=C:
         | 1.055s       w/ larger sort buffer `-S5%`: 0.780s       Your
         | gawk: 1.772s       + LC_ALL=C: 1.772s
         | 
         | By the way, these changes immediately suggested themselves
         | after running the pipeline under `perf`. Profiling is always
         | the first step in optimization.
        
           | sgarland wrote:
           | Collation aside (which is absolutely a huge boost in speed
           | that I neglected to think about), I assumed that the rest of
           | the difference was coming from the fact that the initial
           | `cut` meant the rest of the pipeline had far less to deal
           | with, whereas `awk` is processing every line. Benchmarking
           | (and testing in `perf`) showed this to not be the case. I'd
           | need to compile `awk` with debug symbols, I think, to know
           | exactly where the slowdown is, but I'm going to assume it's
           | mostly due to `sort` being extremely optimized for doing one
           | thing, and doing it well.
           | 
           | I did find one other interesting difference between BSD and
           | GNU tools - BSD sort defaults to 90% for its buffer, GNU sort
           | defaults to 1024 KiB.
           | 
           | Combining all of these (and using GNU uniq - it was also
           | faster), I was able to get down to 463 msec on the M1 Air:
           | $ hyperfine "export LC_ALL=C; gcut -d' ' -f1 fake_log.txt |
           | gsort -S5% | guniq -c | gsort -rn -S5% | head"
           | Benchmark 1: export LC_ALL=C; gcut -d' ' -f1 fake_log.txt |
           | gsort -S5% | guniq -c | gsort -rn -S5% | head       Time
           | (mean +- s):     463.4 ms +-   3.3 ms    [User: 965.5 ms,
           | System: 93.3 ms]       Range (min ... max):   459.9 ms ...
           | 469.8 ms    10 runs
           | 
           | TIL, thank you.
        
             | xvector wrote:
             | Could you elaborate on how you arrived at 5% for your
             | buffer? Does specifying a buffer size really cause that
             | much of a speed up?
        
         | tejtm wrote:
         | It is always "horses for courses" and there may be times when
         | the five concurrent cores with the shell pipeline will beat the
         | single core awk script.
        
         | kermatt wrote:
         | Mawk can be even faster, although missing some features of GNU
         | Awk 5.
        
       | jerf wrote:
       | I don't do a lot of shell scripting type things in Go because
       | it's not a great language for it, but when I do, I take another
       | approach, which is just to panic. Generics offer a nice little
       | func Must[T any](x T, err error) T {             if err != nil {
       | panic(err)             }             return x         }
       | 
       | which you can wrap around any standard "x, err :=" function to
       | just make it panic, and even prior to generics you could wrap a
       | "PanicOnErr(justReturnsErr())".
       | 
       | In the event that you want to handle errors in some other manner,
       | you trivially can, and you're not limited to just the pipeline
       | design patterns, which are cool in some ways, but limiting when
       | that's all you have. (It can also be tricky to ensure the
       | pipeline is written in a way that doesn't generate a ton of
       | memory traffic with intermediate arrays; I haven't checked to see
       | what the library they show does.) Presumably if I'm writing this
       | in Go I have some other reason for wanting to do that, like
       | having some non-trivial concurrency desire (using concurrency to
       | handle a newline-delimited JSON file was my major use case, doing
       | non-trivial though not terribly extensive work on the JSON).
       | 
       | While this may make some people freak, IMHO the real point of
       | "errors as values" is not to force you to handle the errors in
       | some very particular manner, but to make you _think_ about the
       | errors more deeply than a conventional exceptions-based program
       | typically does. As such, it is perfectly legal and moral to think
       | about your error handling and decide that what you really want is
       | the entire program to terminate on the first error. Obviously
       | this is not the correct solution for my API server blasting out
       | tens of thousands of highly heterogeneous calls per second, but
       | for a shell script it is quite often the correct answer. As
       | something I have thought about and chosen deliberately, it 's
       | fine.
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _Scripting with Go_ -
       | https://news.ycombinator.com/item?id=30641883 - March 2022 (66
       | comments)
        
       | wudangmonk wrote:
       | The unix philosophy of having small programs that take in input,
       | process it and return a result has proven to a success, I just
       | never understood why the next logical step of having this program
       | in library form never became a thing. I guess shells are a bit
       | useful but not as useful as a decent repl (common-lisp or the
       | jupyter repl) where these programs can be used as if they were a
       | function.
        
       | perfmode wrote:
       | From Sanjay Ghemawat, 9 years ago
       | 
       | https://github.com/ghemawat/stream
        
       | Hendrikto wrote:
       | This is satire, right? I think commenters are completely missing
       | the point.
       | 
       | https://en.m.wikipedia.org/wiki/A_Modest_Proposal
        
         | dang wrote:
         | The submitted title was "Scripting with Go: A Modest Proposal"
         | but the phrase "modest proposal" doesn't appear in the article,
         | so I've taken it out.
         | 
         | " _Please use the original title, unless it is misleading or
         | linkbait; don 't editorialize._" -
         | https://news.ycombinator.com/newsguidelines.html
        
       | 1vuio0pswjnm7 wrote:
       | export LC_ALL=C           awk '!a[$1]++' access.log|head
       | 
       | If access.log is large enough, awk will fail.
       | 
       | When this happens, one can split access.log into pieces, process
       | separately then recombine.
       | 
       | But that's more or less what sort(1) does with large files,
       | creating temporary files in $TMPDIR or other user-specified
       | directory after -T if using GNU sort.
       | 
       | There was a way to eliminate duplicate lines from an unordered
       | list using k/q, without using temporary files but I stopped using
       | it after Kx, Inc. was sold off and I started using musl
       | exclusively. q requires glibc.
       | 
       | For example, something like                    #!/bin/sh
       | # usage: $0 file          echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1
       | 0:l"|exec q >null;
       | 
       | Can this be done in ngn k.
       | 
       | The other approach I use to avoid temporary files is to just put
       | the list in an SQL database, add a UNIQUE constraint, and update
       | the database.
        
       | fsmv wrote:
       | I put together a go "sh-bang" line so you can just chmod +x your
       | .go file and run it (and it works with go fmt unlike other
       | options).                   /*usr/bin/env go run "$0" "$@"; exit
       | $? #*/
       | 
       | It's fun try it out! Just make this the first line of the file.
        
       | nicechianti wrote:
       | terrible idea
        
       | kardianos wrote:
       | Interesting. I do something similar with my task
       | https://github.com/kardianos/task package, which is in tern
       | loosely based off of another package from 10-15 years ago.
        
         | tgv wrote:
         | That sounds interesting, but the package is unfortunately
         | undocumented. I tried
         | https://pkg.go.dev/github.com/kardianos/task, but that doesn't
         | help me understand it either. It's missing a high level
         | explanation of what to use it for, its limits and some decent
         | examples.
        
       ___________________________________________________________________
       (page generated 2023-08-20 23:01 UTC)