https://bitfieldconsulting.com/golang/scripting

Learn Go with John
Go books
Go courses
Go tutorials
Code Club
Let's Code
Mentoring
Contact
Blog
About
 
 
 
Go tutorials RSS
[                    ]
 
 
 
Bitfield Consulting Friendly, professional Go mentoring
 
 
 
 
 
 
 
 
0
[ ]
Feb 21

Feb 21 Scripting with Go

John Arundel
[magic]

The Unix shell is pure wizardry. With the right incantation of shell
spells, you can organise files, process text, compute data, and feed
the output of any program to the input of any other. We might even
say, paraphrasing Clarke's Third Law:

    Any sufficiently clever shell one-liner is indistinguishable from
    magic.

Systems programs

It's not that the shell itself is so clever. As a programming
language, at least for non-trivial tasks, it's distinctly clumsy. But
its elegant design makes it the perfect tool for scripts: short,
focused programs that operate on files, processes, or text, in the
service of managing computer systems. In other words, systems
programs.

Traditionally, much of this kind of software has been written as
scripts for various shells. A shell such as bash is a sort of hybrid
between a job control language and a text-based interactive user
interface.

There are lots of different and mutually incompatible shells, which
is part of the problem, but let's just refer to all of these as "the
shell" in this discussion.

Why shouldn't it be as easy to write systems programs in Go as it is
in the shell?

Why Go?

If the shell is the traditional way of writing systems software, then
what would be the point of using a language like Go instead? It has
many advantages: Go programs are fast, scalable, can be written
quickly, and can also be maintained by large teams over a long time.

Go does much to support us in writing correct programs, being a
compiled language with a strong type system. It also surfaces errors
in a way that makes them hard to ignore, encouraging robust programs.

However, whereas shells are optimised for the specific tasks of
scripting and job control, Go is a general-purpose language, used for
all sorts of different applications.

That doesn't mean that we can't use it for systems programming,
though. It just means that it doesn't have a lot of built-in
facilities for making such programs easy to write. At least, perhaps
not as easy as the shell makes it.

For example, consider a typical devops task such as counting the
lines in a log file that match a certain string (error, let's say).
Most experienced Unix users would write some kind of shell one-liner
to do this. For example:

grep error log.txt |wc -l

The overall effect of this pipeline is to print the number of lines
in log.txt that match the string error. The shell makes it easy to
compose individual commands like grep and wc to achieve the goal.

A typical task

But shell wizards (and witches) can do much more. For example,
suppose we have a web server access log to analyse. Here's a typical
line from such a log: it contains the client's IP address, a
timestamp, and various information about the request.

203.0.113.17 - - [30/Jun/2019:17:06:15 +0000] "GET / HTTP/1.1" 200 2028 "https://example.com/ "Mozilla/5.0..."

And suppose we want to find the top ten most frequent visitors to our
website, by IP address. How could we do that?

Each line represents one request, so we'll need to count the lines
for each IP address, sort them in descending order of frequency, and
take the first ten.

A shell one-liner like this would do the job:

cut -d' ' -f 1 access.log |sort |uniq -c |sort -rn |head

This extracts the IP address from the first column of each line
(cut), counts the number of unique values (uniq -c), sorts them in
descending numerical order (sort -rn), and shows the first ten
(head).

Since virtually all Unix commands can accept data on standard input,
and write results to standard output, some very complex pipelines can
be constructed this way, using only the shell's simple pipe operator.

This is a very powerful and flexible programming paradigm, which goes
a long way to explaining the dominance of the Unix model today. It
also suggests it's worth investing a little time in learning how to
get the best out of your shell.

So can we do the same sort of magic in Go? Let's try to write a
similar program and see how easy it is (or not).

The wrong answer

While the task is well-defined, this program is by no means easy to
write in Go. It's certainly hard to make it as concise as the shell
version.

Something like this might do the job:

package main

import (
    "bufio"
    "fmt"
    "log"
    "os"
    "sort"
    "strings"
)

func main() {
    f, err := os.Open("log")
    if err != nil {
        log.Fatal(err)
    }
    defer f.Close()
    scanner := bufio.NewScanner(f)
    uniques := map[string]int{}
    for scanner.Scan() {
        fields := strings.Fields(scanner.Text())
        if len(fields) > 0 {
            uniques[fields[0]]++
        }
    }
    type freq struct {
        addr  string
        count int
    }
    freqs := make([]freq, 0, len(uniques))
    for addr, count := range uniques {
        freqs = append(freqs, freq{addr, count})
    }
    sort.Slice(freqs, func(i, j int) bool {
        return freqs[i].count > freqs[j].count
    })
    fmt.Printf("%-16s%s\n", "Address", "Requests")
    for i, f := range freqs {
        if i > 9 {
            break
        }
        fmt.Printf("%-16s%d\n", f.addr, f.count)
    }
}

There are several problems with this program, not least that it's
pretty complicated. You might like to test your code-reading skills
by figuring out how it works, but I'm by no means recommending it as
a model of Go style.

It's just a quick, untested, hack, and that's partly the point. In
devops work we're often required to solve problems quickly, rather
than elegantly. The server could be on fire now, and we need to
figure out which IP address is burning it down.

So this isn't a very satisfactory result. If Go is so great, why
isn't it a good fit for this problem? What kind of code would we like
to write instead?

Programs as pipelines

Given the nature of the problem, we'd prefer to express the solution
as a pipeline, just like the shell program. How could we express that
in Go? What about something like this?

File("log").Column(1).Freq().First(10).Stdout()

In other words, read the file log, take its first column, sort by
frequency, get the first ten results, and print them to standard
output.

Not only is this extremely concise, it's arguably even clearer than
the shell pipeline. A beginner wouldn't necessarily know what cut -d'
' -f 1 does, for example. But if they saw Column(1), I think they'd
understand that.

And this program achieves the same task as the awkward 30+ line
monster we just saw... in one line. Not bad. In fact, it's getting to
the point where even seasoned shell wizards might start to think it's
worth writing systems programs in Go.

The script library

But this example doesn't even really look like Go code! How is it
possible? The answer is a library package called script:

import "github.com/bitfield/script"

script is a Go library for doing the kind of tasks that shell scripts
are good at: reading files, executing subprocesses, counting lines,
matching strings, and so on. The author liked the elegant and concise
nature of well-crafted shell pipelines, but liked Go even more.

Now you can construct nifty one-liners in Go without all that tedious
scanning, sorting, slicing, and looping. Let's take a look at a few
examples.

Suppose you want to read the contents of a file as a string. Here's
what that looks like in script:

data, err := script.File("test.txt").String()

That looks straightforward enough, but suppose you now want to count
the lines in that file.

n, err := script.File("test.txt").CountLines()

For something a bit more challenging, let's try counting the number
of lines in the file that match the string "Error":

n, err := script.File("test.txt").
    Match("Error").CountLines()

But what if, instead of reading a specific file, we want to simply
pipe input into this program, and have it output only matching lines
(like grep)?

script.Stdin().Match("Error").Stdout()

That was almost too easy! So let's pass in a list of files on the
command line, and have our program read them all in sequence and
output the matching lines:

script.Args().Concat().
    Match("Error").Stdout()

Maybe we're only interested in the first 10 matches. No problem:

script.Args().Concat().
    Match("Error").First(10).Stdout()

What's that? You want to append that output to a file instead of
printing it to the terminal? You've got some attitude, mister. But
okay:

script.Args().Concat().
    Match("Error").First(10).
    AppendFile("/var/log/errors.txt")

Userland tools

One of the things that makes shell scripts powerful is not just the
shell language itself, which is pretty basic. It's the availability
of a rich set of userland tools, like grep, awk, cat, find, head, and
so on.

But we can replicate most of the functionality of those tools using
script. Here's a program that just echoes its input to its output,
like cat:

script.Stdin().Stdout()

And here's one that concatenates all the files it's given as
arguments and writes them to the output, again like cat:

script.Args().Concat().Stdout()

What about echo? Can we do that? Yup:

script.Args().Join().Stdout()

It's fairly easy to reproduce at least the simple behaviour of most
of the familiar Unix tools in this way. And for anything that's not
already provided in script, well, we can just use the tool itself:

script.Exec("open info.pdf")

Since we can run any external program, we can use the shell's
facilities too. Now we really have the best of all worlds!

script.Exec("bash -c 'echo hello from bash'").
    Stdout()

One common operation in shell scripts is to use the find tool to
generate a recursive directory listing. We can do that too:

script.FindFiles("/backup").Stdout()

But supposing we then wanted to do some operation on each of the
files discovered in this way. What would that look like?

script.FindFiles("*.go").
    ExecForEach("gofmt -w {{.}}")

You might recognise the argument to ExecForEach as a Go template;
every filename produced by FindFiles will be substituted into this
command in turn.

How does it work?

Those chained function calls look a bit weird. What's going on there?

One of the neat things about the Unix shell, and its many imitators,
is the way you can compose operations into a pipeline:

cat test.txt | grep Error | wc -l

The output from each stage of the pipeline feeds into the next, and
you can think of each stage as a filter that passes on only certain
parts of its input to its output.

By comparison, writing shell-like scripts in raw Go is much less
convenient, because everything you do returns a different data type,
and you must (or at least should) check errors following every
operation.

In scripts for system administration we often want to compose
different operations like this in a quick and convenient way. If an
error occurs somewhere along the pipeline, we would like to check
this just once at the end, rather than after every operation.

Everything is a pipe

The script library allows us to do this because everything is a pipe
(specifically, a script.Pipe). To create a pipe, start with a source
like File:

p := script.File("test.txt")

You might expect File to return an error if there is a problem
opening the file, but it doesn't. We will want to call a chain of
methods on the result of File, and it's inconvenient to do that if it
also returns an error. So File returns a pipe instead.

Since File returns a pipe, you can call any method on it you like.
For example, Match:

p.Match("what I'm looking for")

The result of this is another pipe (containing only the matching
lines from test.txt), and so on. You don't have to chain all your
methods onto a single line, but it's pretty neat that you can if you
want to.

Handling errors

Woah, woah! Just a minute! We haven't done anything about errors. We
know that good Go programmers always check errors. That's because
almost everything we do in a program can go wrong, and the robustness
of a program is almost all about how it behaves in error situations.

What if there was an error opening the file at the source end of this
pipe? Won't Match blow up if it tries to read from a non-existent
file?

No, it won't. That's because if File encounters an error, it sets a
flag on the resulting pipe to say "hey, something's wrong". Normally
when Go functions run into an error, they return something like nil
and an error value indicating the problem. Instead, File returns a
valid pipe, just one with its error flag set.

All pipe operations check this error flag before doing anything. If
it's set, then they don't have valid data, so they short-circuit,
returning immediately without doing any work (a "no-op", short for
"no operation").

As soon as any pipeline stage hits an error, the error flag is set on
the pipe, and all subsequent operations on the pipe become no-ops.
That means you don't need to check for an error at each stage:
instead, you can check it at the end, or whenever you like.

You can always ask a pipe for its error status by calling its Error
method:

if err := p.Error(); err != nil {
    return fmt.Errorf("oh no: %w", err)
}

Seasoned Gophers will recognise this as the errWriter pattern
described by Rob Pike in the blog post Errors are values. It's ideal
when you're carrying out a series of operations, any of which can
error, but you only care if the sequence as a whole encounters an
error.

This eliminates a lot of the if err != nil boilerplate which seems to
infuriate people so much about Go. Those people aren't crazy: that
kind of thing can be annoying, if you don't know this useful pattern.

Closing pipes

If you've dealt with files in Go before, you'll know that you need to
close a file once you've finished with it. Otherwise, the program
will retain what's called a file handle (the kernel data structure
that represents an open file).

There is a limit to the total number of open file handles for a given
program, and for the system as a whole, so a program that leaks file
handles will eventually crash, and will waste resources in the
meantime.

Files aren't the only things that need to be closed after reading: so
do network connections, HTTP response bodies, and so on.

How does script handle this? Simple. The data source associated with
a pipe will be automatically closed once it is read completely.
Therefore, calling any sink method that reads the pipe to completion
(such as String) will close its data source. The only case in which
you need to explicitly call Close on a pipe is when you don't read
from it, or you don't read it to completion for some reason.

If the pipe was created from something that doesn't need to be
closed, such as a string, then calling Close simply does nothing, as
you'd expect.

Why not shell?

This is cool and all, but is it really needed? After all, haven't we
been running the world on shell scripts for several decades? And
aren't we writing more of them every day? So what's the problem with
shell scripts, and why is Go the solution?

It's a fair question. Shell scripts and one-liners are perfectly
adequate for building one-off tasks, initialization scripts, and the
kind of 'glue code' that holds the internet together. I speak as
someone who's spent at least thirty years doing this for a living.
But in many ways they're not ideal for important, non-trivial
programs:

  * Trying to build portable shell scripts is a nightmare. The exact
    syntax and options of Unix commands varies from one distribution
    to another. Although in theory POSIX is a workable common subset
    of functionality, in practice it's usually precisely the
    non-POSIX behaviour that you need.

  * Shell scripts are hard to test (though test frameworks have been
    written, and if you're seriously putting mission-critical shell
    scripts into production, you should be using them, or
    reconsidering your technology choices).

  * Shell scripts don't scale. Because there are very limited
    facilities for logic and abstraction, and because any successful
    program tends to grow remorselessly over time, shell scripts can
    become an unreadable mess of special cases and spaghetti code.
    We've all seen it, if not, indeed, done it.

  * Shell syntax is awkward: quoting, whitespace, and brackets can
    require a lot of fiddling to get right, and so many characters
    are magic to the shell (*, ?, > and so on) that this can lead to
    subtle bugs. Scripts can work fine for years until you suddenly
    encounter a file whose name contains whitespace, and then
    everything breaks horribly.

  * Deploying shell scripts obviously requires at least a (sizable)
    shell binary in addition to the source code, but it usually also
    requires an unknown and variable number of extra userland
    programs (cut, grep, head, and friends). If you're building
    container images, for example, you effectively need to include a
    whole Unix distribution with your program, which runs to hundreds
    of megabytes, and is not at all in the spirit of containers.

To be fair to the shell, this kind of thing is not what it was ever
intended for. Shell is an interactive job control tool for launching
programs, connecting programs together, and to a limited extent,
manipulating text. It's not for building portable, scalable,
reliable, and elegant programs. That's what Go is for.

Scripting with Go

 
Quick View
The Power of Go: Tools
39.95
Add To Cart

Go has a great testing framework built right into the standard
library. It has a superb standard library, and thousands of
high-quality third-party packages for just about any functionality
you can imagine. It is compiled, so it's fast, and statically typed,
so it's reliable. It's efficient and memory-safe. Go programs can be
distributed as a single binary. Go scales to enormous projects
(Kubernetes, for example).

The script library is implemented entirely in Go, and does not
require any external userland programs. Thus you can build your
script program as a single (very small) binary that is quick to
build, quick to upload, quick to deploy (in a container, for
example), quick to run, and economical with resources.

I'm not saying shell scripting is obsolete. I still use a lot of
shell scripts myself. There is a large problem domain where shell is
absolutely the right answer. But small programs tend to grow into
large ones, and when that happens, it's nice to be able to use the
facilities of a language that was designed for programming at scale.

But with a package like script, we don't have to lose all the
elegance and flexibility of the classic Unix environment we've come
to love. Instead, we can keep just a little bit of that old shell
magic.

You can read more about the implementation of the script library, and
other interesting programs, in my book, The Power of Go: Tools.

John Arundel
Go, golang, shell, script, scripting
Twitter LinkedIn0 Reddit 0 Likes
John Arundel
John Arundel
[image-asse]

Cras mattis consectetur purus sit amet fermentum. Integer posuere
erat a ante venenatis dapibus posuere velit aliquet. Aenean eu leo
quam. Pellentesque ornare sem lacinia quam venenatis vestibulum.

 

Feb 9 A career ending mistake

Related Posts

 
 
Oct 5

Oct 5 Go vs Python

 
 
Jul 19

Jul 19 The Tao of Go

 
 
Aug 13

Aug 13 Test-driven development with Go

 
Get John's new writing direct to your inbox

Be the first to know when John publishes new Go content, and get
early access!

Email Address * [                    ]
Beta reader

  * [ ] I'd like to read early drafts and give feedback on new books

[                    ]
[Notify me]
 

Powered by Squarespace

 
  
0