https://bitfieldconsulting.com/golang/scripting Learn Go with John Go books Go courses Go tutorials Code Club Let's Code Mentoring Contact Blog About Go tutorials RSS [ ] Bitfield Consulting Friendly, professional Go mentoring 0 [ ] Feb 21 Feb 21 Scripting with Go John Arundel [magic] The Unix shell is pure wizardry. With the right incantation of shell spells, you can organise files, process text, compute data, and feed the output of any program to the input of any other. We might even say, paraphrasing Clarke's Third Law: Any sufficiently clever shell one-liner is indistinguishable from magic. Systems programs It's not that the shell itself is so clever. As a programming language, at least for non-trivial tasks, it's distinctly clumsy. But its elegant design makes it the perfect tool for scripts: short, focused programs that operate on files, processes, or text, in the service of managing computer systems. In other words, systems programs. Traditionally, much of this kind of software has been written as scripts for various shells. A shell such as bash is a sort of hybrid between a job control language and a text-based interactive user interface. There are lots of different and mutually incompatible shells, which is part of the problem, but let's just refer to all of these as "the shell" in this discussion. Why shouldn't it be as easy to write systems programs in Go as it is in the shell? Why Go? If the shell is the traditional way of writing systems software, then what would be the point of using a language like Go instead? It has many advantages: Go programs are fast, scalable, can be written quickly, and can also be maintained by large teams over a long time. Go does much to support us in writing correct programs, being a compiled language with a strong type system. It also surfaces errors in a way that makes them hard to ignore, encouraging robust programs. However, whereas shells are optimised for the specific tasks of scripting and job control, Go is a general-purpose language, used for all sorts of different applications. That doesn't mean that we can't use it for systems programming, though. It just means that it doesn't have a lot of built-in facilities for making such programs easy to write. At least, perhaps not as easy as the shell makes it. For example, consider a typical devops task such as counting the lines in a log file that match a certain string (error, let's say). Most experienced Unix users would write some kind of shell one-liner to do this. For example: grep error log.txt |wc -l The overall effect of this pipeline is to print the number of lines in log.txt that match the string error. The shell makes it easy to compose individual commands like grep and wc to achieve the goal. A typical task But shell wizards (and witches) can do much more. For example, suppose we have a web server access log to analyse. Here's a typical line from such a log: it contains the client's IP address, a timestamp, and various information about the request. 203.0.113.17 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0..." And suppose we want to find the top ten most frequent visitors to our website, by IP address. How could we do that? Each line represents one request, so we'll need to count the lines for each IP address, sort them in descending order of frequency, and take the first ten. A shell one-liner like this would do the job: cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head This extracts the IP address from the first column of each line (cut), counts the number of unique values (uniq -c), sorts them in descending numerical order (sort -rn), and shows the first ten (head). Since virtually all Unix commands can accept data on standard input, and write results to standard output, some very complex pipelines can be constructed this way, using only the shell's simple pipe operator. This is a very powerful and flexible programming paradigm, which goes a long way to explaining the dominance of the Unix model today. It also suggests it's worth investing a little time in learning how to get the best out of your shell. So can we do the same sort of magic in Go? Let's try to write a similar program and see how easy it is (or not). The wrong answer While the task is well-defined, this program is by no means easy to write in Go. It's certainly hard to make it as concise as the shell version. Something like this might do the job: package main import ( "bufio" "fmt" "log" "os" "sort" "strings" ) func main() { f, err := os.Open("log") if err != nil { log.Fatal(err) } defer f.Close() scanner := bufio.NewScanner(f) uniques := map[string]int{} for scanner.Scan() { fields := strings.Fields(scanner.Text()) if len(fields) > 0 { uniques[fields[0]]++ } } type freq struct { addr string count int } freqs := make([]freq, 0, len(uniques)) for addr, count := range uniques { freqs = append(freqs, freq{addr, count}) } sort.Slice(freqs, func(i, j int) bool { return freqs[i].count > freqs[j].count }) fmt.Printf("%-16s%s\n", "Address", "Requests") for i, f := range freqs { if i > 9 { break } fmt.Printf("%-16s%d\n", f.addr, f.count) } } There are several problems with this program, not least that it's pretty complicated. You might like to test your code-reading skills by figuring out how it works, but I'm by no means recommending it as a model of Go style. It's just a quick, untested, hack, and that's partly the point. In devops work we're often required to solve problems quickly, rather than elegantly. The server could be on fire now, and we need to figure out which IP address is burning it down. So this isn't a very satisfactory result. If Go is so great, why isn't it a good fit for this problem? What kind of code would we like to write instead? Programs as pipelines Given the nature of the problem, we'd prefer to express the solution as a pipeline, just like the shell program. How could we express that in Go? What about something like this? File("log").Column(1).Freq().First(10).Stdout() In other words, read the file log, take its first column, sort by frequency, get the first ten results, and print them to standard output. Not only is this extremely concise, it's arguably even clearer than the shell pipeline. A beginner wouldn't necessarily know what cut -d' ' -f 1 does, for example. But if they saw Column(1), I think they'd understand that. And this program achieves the same task as the awkward 30+ line monster we just saw... in one line. Not bad. In fact, it's getting to the point where even seasoned shell wizards might start to think it's worth writing systems programs in Go. The script library But this example doesn't even really look like Go code! How is it possible? The answer is a library package called script: import "github.com/bitfield/script" script is a Go library for doing the kind of tasks that shell scripts are good at: reading files, executing subprocesses, counting lines, matching strings, and so on. The author liked the elegant and concise nature of well-crafted shell pipelines, but liked Go even more. Now you can construct nifty one-liners in Go without all that tedious scanning, sorting, slicing, and looping. Let's take a look at a few examples. Suppose you want to read the contents of a file as a string. Here's what that looks like in script: data, err := script.File("test.txt").String() That looks straightforward enough, but suppose you now want to count the lines in that file. n, err := script.File("test.txt").CountLines() For something a bit more challenging, let's try counting the number of lines in the file that match the string "Error": n, err := script.File("test.txt"). Match("Error").CountLines() But what if, instead of reading a specific file, we want to simply pipe input into this program, and have it output only matching lines (like grep)? script.Stdin().Match("Error").Stdout() That was almost too easy! So let's pass in a list of files on the command line, and have our program read them all in sequence and output the matching lines: script.Args().Concat(). Match("Error").Stdout() Maybe we're only interested in the first 10 matches. No problem: script.Args().Concat(). Match("Error").First(10).Stdout() What's that? You want to append that output to a file instead of printing it to the terminal? You've got some attitude, mister. But okay: script.Args().Concat(). Match("Error").First(10). AppendFile("/var/log/errors.txt") Userland tools One of the things that makes shell scripts powerful is not just the shell language itself, which is pretty basic. It's the availability of a rich set of userland tools, like grep, awk, cat, find, head, and so on. But we can replicate most of the functionality of those tools using script. Here's a program that just echoes its input to its output, like cat: script.Stdin().Stdout() And here's one that concatenates all the files it's given as arguments and writes them to the output, again like cat: script.Args().Concat().Stdout() What about echo? Can we do that? Yup: script.Args().Join().Stdout() It's fairly easy to reproduce at least the simple behaviour of most of the familiar Unix tools in this way. And for anything that's not already provided in script, well, we can just use the tool itself: script.Exec("open info.pdf") Since we can run any external program, we can use the shell's facilities too. Now we really have the best of all worlds! script.Exec("bash -c 'echo hello from bash'"). Stdout() One common operation in shell scripts is to use the find tool to generate a recursive directory listing. We can do that too: script.FindFiles("/backup").Stdout() But supposing we then wanted to do some operation on each of the files discovered in this way. What would that look like? script.FindFiles("*.go"). ExecForEach("gofmt -w {{.}}") You might recognise the argument to ExecForEach as a Go template; every filename produced by FindFiles will be substituted into this command in turn. How does it work? Those chained function calls look a bit weird. What's going on there? One of the neat things about the Unix shell, and its many imitators, is the way you can compose operations into a pipeline: cat test.txt | grep Error | wc -l The output from each stage of the pipeline feeds into the next, and you can think of each stage as a filter that passes on only certain parts of its input to its output. By comparison, writing shell-like scripts in raw Go is much less convenient, because everything you do returns a different data type, and you must (or at least should) check errors following every operation. In scripts for system administration we often want to compose different operations like this in a quick and convenient way. If an error occurs somewhere along the pipeline, we would like to check this just once at the end, rather than after every operation. Everything is a pipe The script library allows us to do this because everything is a pipe (specifically, a script.Pipe). To create a pipe, start with a source like File: p := script.File("test.txt") You might expect File to return an error if there is a problem opening the file, but it doesn't. We will want to call a chain of methods on the result of File, and it's inconvenient to do that if it also returns an error. So File returns a pipe instead. Since File returns a pipe, you can call any method on it you like. For example, Match: p.Match("what I'm looking for") The result of this is another pipe (containing only the matching lines from test.txt), and so on. You don't have to chain all your methods onto a single line, but it's pretty neat that you can if you want to. Handling errors Woah, woah! Just a minute! We haven't done anything about errors. We know that good Go programmers always check errors. That's because almost everything we do in a program can go wrong, and the robustness of a program is almost all about how it behaves in error situations. What if there was an error opening the file at the source end of this pipe? Won't Match blow up if it tries to read from a non-existent file? No, it won't. That's because if File encounters an error, it sets a flag on the resulting pipe to say "hey, something's wrong". Normally when Go functions run into an error, they return something like nil and an error value indicating the problem. Instead, File returns a valid pipe, just one with its error flag set. All pipe operations check this error flag before doing anything. If it's set, then they don't have valid data, so they short-circuit, returning immediately without doing any work (a "no-op", short for "no operation"). As soon as any pipeline stage hits an error, the error flag is set on the pipe, and all subsequent operations on the pipe become no-ops. That means you don't need to check for an error at each stage: instead, you can check it at the end, or whenever you like. You can always ask a pipe for its error status by calling its Error method: if err := p.Error(); err != nil { return fmt.Errorf("oh no: %w", err) } Seasoned Gophers will recognise this as the errWriter pattern described by Rob Pike in the blog post Errors are values. It's ideal when you're carrying out a series of operations, any of which can error, but you only care if the sequence as a whole encounters an error. This eliminates a lot of the if err != nil boilerplate which seems to infuriate people so much about Go. Those people aren't crazy: that kind of thing can be annoying, if you don't know this useful pattern. Closing pipes If you've dealt with files in Go before, you'll know that you need to close a file once you've finished with it. Otherwise, the program will retain what's called a file handle (the kernel data structure that represents an open file). There is a limit to the total number of open file handles for a given program, and for the system as a whole, so a program that leaks file handles will eventually crash, and will waste resources in the meantime. Files aren't the only things that need to be closed after reading: so do network connections, HTTP response bodies, and so on. How does script handle this? Simple. The data source associated with a pipe will be automatically closed once it is read completely. Therefore, calling any sink method that reads the pipe to completion (such as String) will close its data source. The only case in which you need to explicitly call Close on a pipe is when you don't read from it, or you don't read it to completion for some reason. If the pipe was created from something that doesn't need to be closed, such as a string, then calling Close simply does nothing, as you'd expect. Why not shell? This is cool and all, but is it really needed? After all, haven't we been running the world on shell scripts for several decades? And aren't we writing more of them every day? So what's the problem with shell scripts, and why is Go the solution? It's a fair question. Shell scripts and one-liners are perfectly adequate for building one-off tasks, initialization scripts, and the kind of 'glue code' that holds the internet together. I speak as someone who's spent at least thirty years doing this for a living. But in many ways they're not ideal for important, non-trivial programs: * Trying to build portable shell scripts is a nightmare. The exact syntax and options of Unix commands varies from one distribution to another. Although in theory POSIX is a workable common subset of functionality, in practice it's usually precisely the non-POSIX behaviour that you need. * Shell scripts are hard to test (though test frameworks have been written, and if you're seriously putting mission-critical shell scripts into production, you should be using them, or reconsidering your technology choices). * Shell scripts don't scale. Because there are very limited facilities for logic and abstraction, and because any successful program tends to grow remorselessly over time, shell scripts can become an unreadable mess of special cases and spaghetti code. We've all seen it, if not, indeed, done it. * Shell syntax is awkward: quoting, whitespace, and brackets can require a lot of fiddling to get right, and so many characters are magic to the shell (*, ?, > and so on) that this can lead to subtle bugs. Scripts can work fine for years until you suddenly encounter a file whose name contains whitespace, and then everything breaks horribly. * Deploying shell scripts obviously requires at least a (sizable) shell binary in addition to the source code, but it usually also requires an unknown and variable number of extra userland programs (cut, grep, head, and friends). If you're building container images, for example, you effectively need to include a whole Unix distribution with your program, which runs to hundreds of megabytes, and is not at all in the spirit of containers. To be fair to the shell, this kind of thing is not what it was ever intended for. Shell is an interactive job control tool for launching programs, connecting programs together, and to a limited extent, manipulating text. It's not for building portable, scalable, reliable, and elegant programs. That's what Go is for. Scripting with Go Quick View The Power of Go: Tools 39.95 Add To Cart Go has a great testing framework built right into the standard library. It has a superb standard library, and thousands of high-quality third-party packages for just about any functionality you can imagine. It is compiled, so it's fast, and statically typed, so it's reliable. It's efficient and memory-safe. Go programs can be distributed as a single binary. Go scales to enormous projects (Kubernetes, for example). The script library is implemented entirely in Go, and does not require any external userland programs. Thus you can build your script program as a single (very small) binary that is quick to build, quick to upload, quick to deploy (in a container, for example), quick to run, and economical with resources. I'm not saying shell scripting is obsolete. I still use a lot of shell scripts myself. There is a large problem domain where shell is absolutely the right answer. But small programs tend to grow into large ones, and when that happens, it's nice to be able to use the facilities of a language that was designed for programming at scale. But with a package like script, we don't have to lose all the elegance and flexibility of the classic Unix environment we've come to love. Instead, we can keep just a little bit of that old shell magic. You can read more about the implementation of the script library, and other interesting programs, in my book, The Power of Go: Tools. John Arundel Go, golang, shell, script, scripting Twitter LinkedIn0 Reddit 0 Likes John Arundel John Arundel [image-asse] Cras mattis consectetur purus sit amet fermentum. Integer posuere erat a ante venenatis dapibus posuere velit aliquet. Aenean eu leo quam. Pellentesque ornare sem lacinia quam venenatis vestibulum. Feb 9 A career ending mistake Related Posts Oct 5 Oct 5 Go vs Python Jul 19 Jul 19 The Tao of Go Aug 13 Aug 13 Test-driven development with Go Get John's new writing direct to your inbox Be the first to know when John publishes new Go content, and get early access! Email Address * [ ] Beta reader * [ ] I'd like to read early drafts and give feedback on new books [ ] [Notify me] Powered by Squarespace 0