[HN Gopher] Scripting with Go (2022)
___________________________________________________________________
Scripting with Go (2022)
Author : gus_leonel
Score : 115 points
Date : 2023-08-20 10:18 UTC (12 hours ago)
(HTM) web link (bitfieldconsulting.com)
(TXT) w3m dump (bitfieldconsulting.com)
| simonw wrote:
| If you're not familiar with Go there is one detail missing from
| this post (though it's in the script README) - what a complete
| program looks like. Here's the example from
| https://github.com/bitfield/script#a-realistic-use-case
| package main import (
| "github.com/bitfield/script" ) func main() {
| script.Stdin().Column(1).Freq().First(10).Stdout() }
| alexk307 wrote:
| The whole point of using Go is to explicitly handle errors as
| they happen. All of these steps can fail, but it's not clear
| how they fail and if the next steps should proceed or be
| skipped on previous failures. This is harder to reason about,
| debug, and write than grep and bash.
| mst wrote:
| It defaults to not running the rest if a step fails, and the
| error result is accessible via usual mechanisms.
| _, err := script.Foo(...).Bar(...).Stdout(); if err !=
| nil { log.Fatal(err) }
|
| is sufficient for a quick scripting hack designed to be run
| interactively.
|
| I don't see it as a lot different to bash scripts with -e and
| pipefail set, which is generally preferable anyway.
|
| Plenty of go code does if err != nil {
| return nil, err; }
|
| for each step and there are plenty of cases where you only
| care -if- it failed plus a description of some sort of the
| failure - if you want to proceed on some errors you'd split
| the pipe up so that it pauses at moments where you can check
| that and compensate accordingly.
|
| (and under -e plus pipefail, "error reported to stdout
| followed by aborting" is pretty much what you get in bash as
| well, so I'm unconvinced it's actually going to be harder to
| debug)
| coldtea wrote:
| > _The whole point of using Go is to explicitly handle errors
| as they happen_
|
| That's hardly the whole point of using Go.
|
| The friendlier syntax (and in this case DSL) is an ever
| bigger point.
|
| In any case, you can trivially get at the error at the point
| it occured:
|
| n, err := script.File("test.txt").Match("Error").CountLines()
| simonw wrote:
| I believe error handling looks like this:
| package main import (
| "github.com/bitfield/script" ) func
| main() { _, err :=
| script.Stdin().Column(1).Freq().First(10).Stdout()
| if err != nil { log.Fatal(err) }
| }
|
| Errors are "remembered" by the pipeline and can be processed
| when you get to a sink method.
| fpoling wrote:
| From a technical point of view nothing prevents the scripting
| package to be just as informative with errors as bash and
| have a helper to log and clear the error. If it is not
| already the case, I call it a bug.
| jvictor118 wrote:
| If one were actually going to use something like this, I'd
| think it'd be worth implementing a little shebang script that
| can wrap a single-file script in the necessary boilerplate and
| call go run!
| jayd16 wrote:
| Hmm, I wonder if this is Microsoft's real endgame with
| allowing the single line C# syntax.
| simonw wrote:
| That's a really fun idea. I got that working here:
| https://til.simonwillison.net/bash/go-script
|
| Now you can run this: cat file.txt |
| ./goscript.sh -c
| 'script.Stdin().Column(1).Freq().First(10).Stdout()'
|
| Or write scripts like this - call it 'top10.sh':
| #!/tmp/goscript.sh
| script.Stdin().Column(1).Freq().First(10).Stdout()
|
| Then run this: chmod 755 topten.sh
| echo "one\none\ntwo" | ./topten.sh
| ComputerGuru wrote:
| Tangentially related: I posted a shebang for scripting in rust
| some years ago, if anyone is interested:
| https://neosmart.net/blog/self-compiling-rust-code/
| booleandilemma wrote:
| I like Go, but its insistence on not permitting unused imports
| and unused variables make it unsuitable for scripting, imo.
|
| For scripting I want something that I can be fast and messy in.
| Go is the opposite of that.
|
| It's ok, a language doesn't have to be good at everything.
| nprateem wrote:
| There should totally be a compiler flag to not require those
| ilyt wrote:
| Perl was _literally_ made for that, just use it
| nerdbaggy wrote:
| I ended up using this for my cli scripting needs.
| https://github.com/google/zx
| nullwarp wrote:
| Oh very neat, thanks for posting I will definitely give this a
| try.
| earthboundkid wrote:
| This post is several years old fwiw.
| everybodyknows wrote:
| There's a cute little icon telling us "Feb 21" right at the top
| but omitting the _year_ , which would have been ever so
| helpful.
| geenat wrote:
| Would love to use more golang- amazing build system and cross
| compiler built in. "All in one" binaries are the best thing ever.
| I adore most of the ideas in the language.
|
| .... but there are just soooo many little annoyances /
| inconveniences which turn me off.
|
| - No Optional Parameters. No Named Parameters. Throw us a bone
| Rob Pike, it's 2023. Type inferred composite literals may be an
| OK compromise.. if we ever see them:
| https://github.com/golang/go/issues/12854
|
| - Unused import = will not compile. Unused variable = Will not
| compile. Give us the ability to turn off the warning.
|
| - No null safe or nullish coalescing operator. (? in rust, ?? in
| php, etc.)
|
| - Verbosity of if err != nil { return err; }
|
| - A ternary operator would be nice, and could bring if err != nil
| to 1 line.
|
| - No double declarations. "no new variables on left side of :="
| .. For some odd reason "err" is OK here... Would be highly
| convenient for pipelines, so each result doesn't need to be
| uniquely named.
|
| I'd describe Go as a "simple" language- Not an "easy" language.
| 1-2 lines in Python is going to be 5-10 lines in golang.
|
| Note: Nim has most of these..
| skybrian wrote:
| Have you tried Deno?
| geenat wrote:
| URL?
| skybrian wrote:
| https://deno.land/
|
| This is Typescript, but you have language complaints, and
| it will build binaries.
| fpoling wrote:
| The error handling verbosity in Go should be blamed partially
| on the formatter that replaces one-liner if err != nil { return
| err } with 3 lines.
| myzie wrote:
| Agreed!
|
| Shameless plug: this is why I built Risor.
|
| https://github.com/risor-io/risor
|
| Keep in the Go ecosystem, retain compatibility with the Go
| programs you already have, but have a much more concise
| scripting capability at your disposal.
| geenat wrote:
| Looks more useful than OP.
| IshKebab wrote:
| I agree. Go has such amazing infrastructure it's a huge shame
| the language is so stubbornly basic.
| pdimitar wrote:
| Shell scripting is quite fine up until certain complexity (say
| 500-1000 lines), after which adding even a single small feature
| becomes a huge drag. We're talking hours for something that would
| take me 10 minutes in Golang and 15 in Rust.
|
| Many people love to smirk and say "just learn bash properly, duh"
| but that's missing the point that we never do big projects in
| bash so our muscle memory of bash is always kind of shallow. And
| by "we" I mean "a lot of programmers"; I am not stupid, but I
| have to learn bash's intricacies every time almost from scratch
| and that's not productive. It's very normal for things to slip up
| from your memory when you're not using them regularly. To make
| this even more annoying, nobody will pay me to work exclusively
| with bash for 3 months until it gets etched deep into my memory.
| So there's that too.
|
| I view OP as a good reminder that maybe universal-ish tools to
| get most of what we need from shell scripting exist even today
| but we aren't giving them enough attention and energy and we
| don't make them mainstream. Though it doesn't help that Golang
| doesn't automatically fetch dependencies when you just do `go run
| random_script.go`: https://github.com/golang/go/issues/36513
|
| I am not fixating on Golang in particular. But IMO
| _next_bash_or_something_ should be due Soon(tm). It 's not a huge
| problem to install a single program when provisioning a new VM or
| container either so I am not sure why are people so averse to it.
|
| So yeah, nice article. I like the direction.
|
| EDIT: I know about nushell, oilshell and fish but admittedly
| never gave them a chance.
| tambourine_man wrote:
| The pipe like code with dot notation reminds me a lot of jQuery.
| That's a compliment.
| KnobbleMcKnees wrote:
| I agree with this, and also in a complimentary way, but it all
| seems very non-idiomatic for Go. But I am not a Go expert by
| any means.
| myzie wrote:
| Take a look at Risor and its pipes capability.
|
| https://github.com/risor-io/risor#quick-example
|
| Stay in the Go ecosystem, but gain pipes, Python-like
| f-strings, and more.
|
| (I'm the author)
| coldtea wrote:
| Try jq if you haven't already.
| tambourine_man wrote:
| Yes, but jq's syntax is impossible to memorize for me.
|
| gron | rg
|
| FTW
| js2 wrote:
| Previous discussion (March 11, 2022 | 243 points | 66 comments):
|
| https://news.ycombinator.com/item?id=30641883
| simonw wrote:
| Inspired by comments in this thread, I threw together a Bash
| script that lets you do this: cat file.txt |
| ./goscript.sh -c
| 'script.Stdin().Column(1).Freq().First(10).Stdout()'
|
| You can also use it as a shebang line to write self-contained
| scripts.
|
| Details here: https://til.simonwillison.net/bash/go-script
| sgarland wrote:
| Every time I see things like this, I feel like the person must be
| unaware of awk. # the original one-liner to get
| unique IP addresses cut -d' ' -f 1 access.log | sort | uniq
| -c | sort -rn | head # turns into this with GNU awk
| gawk '{PROCINFO["sorted_in"] = "@val_num_desc"; a[$1]++} END
| {c=0; for (i in a) if (c++ < 10) print a[i], i}' access.log
|
| It's also far, far faster on larger files (base-spec M1 Air):
| $ wc -lc fake_log.txt 1000000 218433264 fake_log.txt
| $ hyperfine "gawk '{PROCINFO[\"sorted_in\"] = \"@val_num_desc\";
| a[\$1]++} END {c=0; for (i in a) if (c++ <10) print a[i], i}'
| fake_log.txt" Benchmark 1: gawk '{PROCINFO["sorted_in"] =
| "@val_num_desc"; a[$1]++} END {c=0; for (i in a) if (c++ <10)
| print a[i], i}' fake_log.txt Time (mean +- s): 1.250 s
| +- 0.003 s [User: 1.185 s, System: 0.061 s] Range (min
| ... max): 1.246 s ... 1.254 s 10 runs $
| hyperfine "cut -d' ' -f1 fake_log.txt | sort | uniq -c | sort -rn
| | head" Benchmark 1: cut -d' ' -f1 fake_log.txt | sort |
| uniq -c | sort -rn | head Time (mean +- s): 4.844 s +-
| 0.020 s [User: 5.367 s, System: 0.087 s] Range (min ...
| max): 4.817 s ... 4.873 s 10 runs
|
| Interestingly, GNU cut is significantly faster than BSD cut on
| the M1: $ hyperfine "gcut -d' ' -f1 fake_log.txt
| | sort | uniq -c | sort -rn | head" Benchmark 1: gcut -d' '
| -f1 fake_log.txt | sort | uniq -c | sort -rn | head Time
| (mean +- s): 3.622 s +- 0.004 s [User: 4.149 s, System:
| 0.078 s] Range (min ... max): 3.616 s ... 3.629 s 10
| runs
| [deleted]
| zer8k wrote:
| I don't understand the downvotes. This is a fair criticism. The
| author even points out "programs as pipelines" which is
| _literally_ the UNIX philosophy. There are tools that already
| exist on UNIX-likes more people should use instead of reaching
| for a script.
|
| I can sympathize with the author w.r.t wanting to use a single
| language you like for everything. However, after decades I've
| found this to be untenable. There are languages that are just
| simply better for one-off scripting (Perl, Python), and
| languages that aren't (anything compiled). Trying to bolt an
| interpreter onto a compiled language from the outside seems
| like a lot of work for questionable gain.
| sgarland wrote:
| > The author even points out "programs as pipelines" which is
| literally the UNIX philosophy.
|
| Yes, and if the thing I'm trying to do has a small input, it
| will only be done once, etc. I will often just pipe `grep` to
| `sort` or whatever, because it's less typing, it's generally
| clearer to a wider range of people, etc.
|
| But on larger inputs, or even things like doing a single
| pattern inversion mixed with a pattern match, I like awk.
| tomcam wrote:
| One reason the author could be doing this is to reduce
| dependencies. Maybe they deploy to Windows or to some other
| environment not guaranteed to have those utilities. Also
| testing probably gets simplified.
| ajross wrote:
| > There are languages that are just simply better for one-off
| scripting (Perl, Python), and languages that aren't (anything
| compiled). Trying to bolt an interpreter onto a compiled
| language from the outside seems like a lot of work for
| questionable gain.
|
| One reason is deployment. Writing code in python/node/etc...
| implies the ability of the production environment to
| bootstrap a rather complicated installation tree for the
| elaborate runtimes required by the code and all its
| dependencies. And so there are elaborate tools (npm, venv,
| Docker, etc...) that have grown up around those requirements.
|
| Compiled languages (and Go in particular shines here) spit
| out a near-dependency-free[1] binary you can drop on the
| target without fuss.
|
| I deal with this in my day job pretty routinely. Chromebooks
| have an old python and limited ability to pull down
| dependencies for quick test runs. Static test binaries make
| things a lot easier.
|
| [1] Though there are shared libraries and runtime frameworks
| there too. You can't deploy a Gnome 3 app with the same
| freedom you can a TCP daemon, obviously.
| LeBit wrote:
| I agree with you.
|
| But I think for python you could also deploy a binary with
| pyinstaller.
| karmakaze wrote:
| The 'scripting' vs 'compiled' language is a false dichotomy.
| Awk, Perl, Python are compiled programs. What makes a
| 'scripting' language special? Dynamic typing? Lack of compile
| step/delay?
|
| I could imagine a lifetime of collecting scripting
| macros/libs in lisp to be as good or better.
| tempusr wrote:
| Python is not a compiled language.
|
| However, the reason Bash is so prolific amongst Sys Admins
| such as myself is the fact that they are portable and
| reliable to use across Debian, Arch or RHEL based
| distributions.
|
| You don't have to import extra libraries, ensure that you
| are running the proper python environment, or be certain
| that pip is properly installed and configured for whatever
| extra source code beyond what is included out of the box.
|
| Bash is the most consistent code you can write to perform
| any task you need when you have to work with Linux.
| dragonwriter wrote:
| > Python is not a compiled language.
|
| Python is (at least in the CPython implementation)
| compiled, to python byte code which runs on the python
| virtual machine.
|
| Its not compiled to native code. (Unless you use one of
| the compilers which do compile it to native code, though
| they tend to support only a subset of python.)
| pdimitar wrote:
| Another commenter beat me to it but still: sh / bash /
| zsh are quite fine up until certain complexity (say 500
| lines), after which adding even a single small feature
| becomes a _huge_ drag. We 're talking hours for something
| that would take me 10 minutes in Golang and 15 in Rust.
| mbreese wrote:
| _> portable and reliable to use across Debian, Arch or
| RHEL based distributions _
|
| Until you try to use a newer feature or try the script in
| a Mac or BSD or any older bash.
|
| SH code is completely portable, but bash itself can have
| quite a few novel features. Don't get me wrong - I'm
| happy the language is dynamic and still growing. But it
| can make things awkward when trying to use a script from
| a newer system on an older server (and the author has
| been "clever").
| LeBit wrote:
| Bash is fine for small scripts.
|
| Once you use it to manage complex data structures and
| flow, you are simply wasting time because you will have
| to rewrite it in Python or Go.
| heresie-dabord wrote:
| > The 'scripting' vs 'compiled' language is a false
| dichotomy.
|
| Not false, but perhaps in need of better definition. The
| term _script_ has often denoted a trivial set of commands
| run by $interpreter.
|
| "Scripting languages" have been seen as being in contrast
| to C, C++, Pascal, Java, SmallTalk, &c. The scripting
| languages remove from the user the need:
|
| -a- to think about an extensive type system,
|
| -b- to compile the logic, and
|
| -c- to build for a specific architecture.
| paulddraper wrote:
| Static typing is the key differentiator.
|
| That requires a level of bookkeeping which is helpful for
| large programs and a nuicense for small programs.
| riku_iki wrote:
| > Dynamic typing?
|
| actually amount of reasoning, which program requires to
| perform in run time is close to interpretion.
| ajross wrote:
| And every time I see things like _that_ , I feel like the
| person must be unaware of perl.
|
| I've made this point before, but I still find it hilarious. For
| more than a decade, _awk was dead_. Like, dead dead. There was
| nothing you could do in awk that wasn 't cleaner and simpler
| and vastly more extensible in perl. And, yes, perl was faster
| than gawk, just like gawk is faster than shell pipelines.
|
| Then python got big, people decided that they didn't want to
| use perl for big projects[1], and so perl went out of vogue and
| got dropped even for the stuff it did (and continues to do)
| really well. Then a new generation came along having never
| learned perl, and...
|
| ... have apparently rediscovered awk?
|
| [1] Also the perl 5 tree stagnated[2] as all the stakeholders
| wandered off into the weeds to think about some new language.
| They're all still out there, AFAIK.
|
| [2] Around 2000-2005, perl was The Language to be seen writing
| your new stuff in, so e.g. bioinformatics landed there and not
| elsewhere. But by 2015, the TensorFlow people wouldn't be
| caught dead writing perl.
| sgarland wrote:
| That's a fair criticism. I know Perl can do pretty amazing
| things with text, but I've never bothered to learn it.
|
| EDIT: I decided to ask GPT-4 to translate the gawk script to
| Perl. I make zero claims that this is ideal (as stated, I
| don't know Perl at all), but it _does_ produce the same
| output, but slightly slower than the gawk script.
| $ hyperfine "perl -lane '\$ips{\$F[0]}++; END {print
| \"\$ips{\$_} \$_\" for (sort {\$ips{\$b} <=> \$ips{\$a}} keys
| %ips)[0..9]}' fake_log.txt" Benchmark 1: perl -lane
| '$ips{$F[0]}++; END {print "$ips{$_} $_" for (sort {$ips{$b}
| <=> $ips{$a}} keys %ips)[0..9]}' fake_log.txt Time
| (mean +- s): 1.499 s +- 0.006 s [User: 1.447 s,
| System: 0.050 s] Range (min ... max): 1.490 s ...
| 1.507 s 10 runs
| ajross wrote:
| I would have gone with an iteratively-built list, FWIW, and
| avoided the overhead in parsing fields the script won't
| use: perl -e 'for $i (<>) { $i =~ s/
| .*//; push @list, $i; }; print(sort(@list));'
| tgv wrote:
| I learned perl around that time, and I thought it was awful.
| And just about everything about it: the parameter passing,
| the sigils that made BASIC look like Dijkstra's love child,
| the funky array/scalar coercion, and the bloody fact that it
| couldn't read from two files at once even though the docs
| suggested it should work. They didn't say so explicitly,
| because perl was pretty badly documented. My boss started
| writing object oriented perl, and that made perl unreadable
| even to perl experts.
|
| AWK, on the other hand, is simplicity itself. Sure, it misses
| a few things, but for searching through log files or db dumps
| it's an excellent tool. And it's fast enough. If you really
| need much more speed, there are other tools, but _I_ would
| rather rewrite it in C than try perl again.
| pclmulqdq wrote:
| I am in the "awk > perl" camp. I think the idea of "vastly
| more extensible" is a negative for my scripting language, and
| "cleaner" just doesn't matter - I just want to write it the
| one time I want to use it and then be done with it. The awk
| language is really simple and quick to write.
|
| By the way, I think this is why Perl lost to Python on larger
| scripting and programming projects - it's just easier to
| write (albeit harder to read, to antagoinze the Python lovers
| out there).
| tomjakubowski wrote:
| Sample of one. I came of age on Linux in the late 90s/early
| 00s. Through other nerds on IRC channels I became familiar
| with Perl and didn't like it. I also picked up basic awk in
| the context of one-liners for shell pipelines and it was
| pretty nice for that. Easier to remember than the flags for
| cut and friends.
|
| Learning awk a bit more deeply in recent years has been good
| too. I can write one liners that do more. I shipped a full
| awk script once, for something unimportant, but I would never
| do that again. For serious text munging these days I'd rather
| write a Rust program.
| voidfunc wrote:
| Perl never recovered from its many ways to do things label.
| It's a tired criticism of the language but it's lodged in the
| brains of a generation of programmers which is unfortunate.
|
| Also the classic sysadmin role which used to lean on Perl
| heavily sort of evolved with rise of The Cloud and automation
| tools like Chef, Puppet, and Ansible took over in that
| 2005-2015 time frame.
| telotortium wrote:
| I mostly use awk over perl because awk is completely
| documented in one man page, so it's easy to see whether awk
| will be fit for purpose or whether I should write it using a
| real programming language. I learned Perl over a decade ago,
| but not the really concise dialect you would use on the
| command line for stuff I'd use awk for, and I've forgotten
| almost all of it now. At least with awk it's easy to relearn
| the functions I need when I need it.
| ajross wrote:
| Right, which is sort of my point. 20 years ago, "everyone"
| knew perl, at least to the extent of knowing the standard
| idioms for different environments that you're talking
| about. And in that world, "everyone" would choose perl for
| these tasks, knowing that everyone else would be expert
| enough to read and maintain them. Perl was the natural
| choice.
|
| And in a world where perl is a natural choice for these
| tasks, awk doesn't have a niche. Because at the end of the
| day awk is simply an inferior language.
|
| Which is the bit I find funny: we threw out and forgot
| about a great tool, and now we think that the ancestral toy
| it replaced is a good idea again.
| tptacek wrote:
| They taught awk to my boy in bioinformatics as part of his
| degree. I was like Vito Corleone in the funeral home when he
| showed me the FASTA parsing awk code they were working on.
| xdsdvsv wrote:
| way to completely miss the point and turn this into a weird
| pissing competition (btw your "simple" awk example is super
| complicated and opaque to someone who doesn't have the awk man
| page open in front of them)
|
| The script package looks really cool and I'll definitely try it
| out, cause honestly even though I do a lot of bash scripting
| it's super painful for anything but something super simple.
| sgarland wrote:
| If someone doesn't know awk, then of course it'll be
| complicated and opaque - the same is true of practically any
| language. One-liners in general also tend to optimize for
| space. If you wanted it to be pretty-printed and with
| variable names that are more obvious: {
| PROCINFO["sorted_in"] = "@val_num_desc" top_ips[$1]++
| } END { counter = 0 for (i in top_ips)
| { if (counter++ < 10) { print
| top_ips[i], i } } }
|
| But also, if you read further up in the thread, you'll see
| that another user correctly identified the bottlenecks in the
| original pipeline, and applying those optimizations made it
| about 3x as fast as the awk one. Arguably, if you weren't
| familiar with the tools (and their specific implementations,
| like how GNU sort and BSD sort have wildly different default
| buffer sizes), you'd still be facing the same problem.
|
| At least half of what people complain about with shell
| scripts can be solved by using ShellCheck [0], and
| understanding what it's asking you to do. I disagree with the
| common opinion of "anything beyond a few lines should be a
| Python script instead." If you're careful with variable
| scoping and error handling, bash is perfectly functional for
| many uses.
|
| [0]: https://www.shellcheck.net
| subjectsigma wrote:
| > If you're careful with variable scoping and error
| handling, bash is perfectly functional for many uses.
|
| "Loaded guns are perfectly functional for juggling, just be
| careful with the trigger and you won't shoot yourself in
| the foot!"
|
| You are technically correct but why bother with being
| careful when you could just avoid writing bash?
| dharmab wrote:
| > If someone doesn't know awk, then of course it'll be
| complicated and opaque - the same is true of practically
| any language
|
| I don't think this is true. Before I learned Go, I could
| follow along most Go programs pretty well, and learning Go
| well enough to get started took less than an hour. Every
| attempt I've made to learn more Awk, I've bounced off.
| [deleted]
| jeffbee wrote:
| The overwhelming cost of the first shell pipeline, at least on
| my machine, is caused by the default UTF-8 locale. As I have
| found in almost every other case, `LC_ALL=C` radically speeds
| this up. Original: 3.294s w/ LC_ALL=C:
| 1.055s w/ larger sort buffer `-S5%`: 0.780s Your
| gawk: 1.772s + LC_ALL=C: 1.772s
|
| By the way, these changes immediately suggested themselves
| after running the pipeline under `perf`. Profiling is always
| the first step in optimization.
| sgarland wrote:
| Collation aside (which is absolutely a huge boost in speed
| that I neglected to think about), I assumed that the rest of
| the difference was coming from the fact that the initial
| `cut` meant the rest of the pipeline had far less to deal
| with, whereas `awk` is processing every line. Benchmarking
| (and testing in `perf`) showed this to not be the case. I'd
| need to compile `awk` with debug symbols, I think, to know
| exactly where the slowdown is, but I'm going to assume it's
| mostly due to `sort` being extremely optimized for doing one
| thing, and doing it well.
|
| I did find one other interesting difference between BSD and
| GNU tools - BSD sort defaults to 90% for its buffer, GNU sort
| defaults to 1024 KiB.
|
| Combining all of these (and using GNU uniq - it was also
| faster), I was able to get down to 463 msec on the M1 Air:
| $ hyperfine "export LC_ALL=C; gcut -d' ' -f1 fake_log.txt |
| gsort -S5% | guniq -c | gsort -rn -S5% | head"
| Benchmark 1: export LC_ALL=C; gcut -d' ' -f1 fake_log.txt |
| gsort -S5% | guniq -c | gsort -rn -S5% | head Time
| (mean +- s): 463.4 ms +- 3.3 ms [User: 965.5 ms,
| System: 93.3 ms] Range (min ... max): 459.9 ms ...
| 469.8 ms 10 runs
|
| TIL, thank you.
| xvector wrote:
| Could you elaborate on how you arrived at 5% for your
| buffer? Does specifying a buffer size really cause that
| much of a speed up?
| tejtm wrote:
| It is always "horses for courses" and there may be times when
| the five concurrent cores with the shell pipeline will beat the
| single core awk script.
| kermatt wrote:
| Mawk can be even faster, although missing some features of GNU
| Awk 5.
| jerf wrote:
| I don't do a lot of shell scripting type things in Go because
| it's not a great language for it, but when I do, I take another
| approach, which is just to panic. Generics offer a nice little
| func Must[T any](x T, err error) T { if err != nil {
| panic(err) } return x }
|
| which you can wrap around any standard "x, err :=" function to
| just make it panic, and even prior to generics you could wrap a
| "PanicOnErr(justReturnsErr())".
|
| In the event that you want to handle errors in some other manner,
| you trivially can, and you're not limited to just the pipeline
| design patterns, which are cool in some ways, but limiting when
| that's all you have. (It can also be tricky to ensure the
| pipeline is written in a way that doesn't generate a ton of
| memory traffic with intermediate arrays; I haven't checked to see
| what the library they show does.) Presumably if I'm writing this
| in Go I have some other reason for wanting to do that, like
| having some non-trivial concurrency desire (using concurrency to
| handle a newline-delimited JSON file was my major use case, doing
| non-trivial though not terribly extensive work on the JSON).
|
| While this may make some people freak, IMHO the real point of
| "errors as values" is not to force you to handle the errors in
| some very particular manner, but to make you _think_ about the
| errors more deeply than a conventional exceptions-based program
| typically does. As such, it is perfectly legal and moral to think
| about your error handling and decide that what you really want is
| the entire program to terminate on the first error. Obviously
| this is not the correct solution for my API server blasting out
| tens of thousands of highly heterogeneous calls per second, but
| for a shell script it is quite often the correct answer. As
| something I have thought about and chosen deliberately, it 's
| fine.
| dang wrote:
| Discussed at the time:
|
| _Scripting with Go_ -
| https://news.ycombinator.com/item?id=30641883 - March 2022 (66
| comments)
| wudangmonk wrote:
| The unix philosophy of having small programs that take in input,
| process it and return a result has proven to a success, I just
| never understood why the next logical step of having this program
| in library form never became a thing. I guess shells are a bit
| useful but not as useful as a decent repl (common-lisp or the
| jupyter repl) where these programs can be used as if they were a
| function.
| perfmode wrote:
| From Sanjay Ghemawat, 9 years ago
|
| https://github.com/ghemawat/stream
| Hendrikto wrote:
| This is satire, right? I think commenters are completely missing
| the point.
|
| https://en.m.wikipedia.org/wiki/A_Modest_Proposal
| dang wrote:
| The submitted title was "Scripting with Go: A Modest Proposal"
| but the phrase "modest proposal" doesn't appear in the article,
| so I've taken it out.
|
| " _Please use the original title, unless it is misleading or
| linkbait; don 't editorialize._" -
| https://news.ycombinator.com/newsguidelines.html
| 1vuio0pswjnm7 wrote:
| export LC_ALL=C awk '!a[$1]++' access.log|head
|
| If access.log is large enough, awk will fail.
|
| When this happens, one can split access.log into pieces, process
| separately then recombine.
|
| But that's more or less what sort(1) does with large files,
| creating temporary files in $TMPDIR or other user-specified
| directory after -T if using GNU sort.
|
| There was a way to eliminate duplicate lines from an unordered
| list using k/q, without using temporary files but I stopped using
| it after Kx, Inc. was sold off and I started using musl
| exclusively. q requires glibc.
|
| For example, something like #!/bin/sh
| # usage: $0 file echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1
| 0:l"|exec q >null;
|
| Can this be done in ngn k.
|
| The other approach I use to avoid temporary files is to just put
| the list in an SQL database, add a UNIQUE constraint, and update
| the database.
| fsmv wrote:
| I put together a go "sh-bang" line so you can just chmod +x your
| .go file and run it (and it works with go fmt unlike other
| options). /*usr/bin/env go run "$0" "$@"; exit
| $? #*/
|
| It's fun try it out! Just make this the first line of the file.
| nicechianti wrote:
| terrible idea
| kardianos wrote:
| Interesting. I do something similar with my task
| https://github.com/kardianos/task package, which is in tern
| loosely based off of another package from 10-15 years ago.
| tgv wrote:
| That sounds interesting, but the package is unfortunately
| undocumented. I tried
| https://pkg.go.dev/github.com/kardianos/task, but that doesn't
| help me understand it either. It's missing a high level
| explanation of what to use it for, its limits and some decent
| examples.
___________________________________________________________________
(page generated 2023-08-20 23:01 UTC)