[HN Gopher] Show HN: Maelstrom - A Hermetic, Clustered Test Runn...
___________________________________________________________________
Show HN: Maelstrom - A Hermetic, Clustered Test Runner for Python
and Rust
Hi everyone, Maelstrom is a suite of tools for running tests in
hermetic micro-containers locally on your machine or distributed
across arbitrarily large clusters. Maelstrom currently has test
runners for Rust and Python, with more on the way. You might use
Maelstrom to run your tests because: * It's easy.
Maelstrom functions as a drop-in replacement for cargo test and
pytest. In most cases, it just works with your existing tests with
minimal configuration. * It's reliable. Maelstrom runs every
test hermetically in its own lightweight container, eliminating
confusing errors caused by inter-test or implicit test-environment
dependencies. * It's scalable. Maelstrom can be run as a
cluster. You can add more worker machines to linearly increase test
throughput. * It's clean. Maelstrom has built a rootless
container implementation (not relying on Docker or RunC) from
scratch, in Rust, optimized to be low-overhead and start quickly.
* It's fast. In most cases, Maelstrom is faster than cargo test,
even without using clustering. Maelstrom's test-per-process model
is inherently slower than pytest's shared-process model, but
Maelstrom provides test isolation at a low performance cost.
While our focus thus far has been on running tests, Maelstrom's
underlying job execution system is general-purpose. We provide a
command line utility to run arbitrary commands, as well a gRPC-
based API and Rust bindings for programmatic access and control.
Feedback and questions are welcome! Thanks for giving it a whirl.
Author : nfachan
Score : 70 points
Date : 2024-07-09 20:23 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| electric_mayhem wrote:
| Came here for the 90s shareware game from Ambrosia for the Mac.
|
| I guess this is cool too though.
| esafak wrote:
| There was also a development company called Maelstrom Games,
| known for https://en.wikipedia.org/wiki/Midwinter_(video_game)
| jitl wrote:
| Why did you build this? Any particular motivating experience? At
| $WORK, we use Jest to run each test suite in its own process. The
| places we struggle with isolation are for integration tests that
| read/write from the DB, so I'm curious what kinds of issues
| motivated container-per-test.
|
| Are you going to sell it somehow?
|
| Does it work nested inside Docker? I already have a CI setup that
| stripes batches of tests across worker containers.
| nfachan wrote:
| At my previous company, we had a lot of tests. They were a mix
| of C and Python. Running all of them on a single machine took
| on the order of an hour or more. Even just limiting the tests
| run to those that could theoretically be affected by your
| change could take minutes or even tens of minutes.
|
| We ended up building a shared cluster of ~1000 cores that was
| available to all developers, and that was used by CI. This
| changed our developers' workflows quite a bit. It was now
| possible to run large amounts of tests regularly: like every
| few minutes instead of a once or twice a day. This in turn
| encouraged developers to write more tests and do more test-
| driven development.
|
| On top of that, having the cluster available provided other
| benefits. If a test was flakey, it was easy to run it tens or
| even hundreds of thousands of times, making it easy to
| reproduce and identify the bug. We also occasionally did Monte
| Carlo simulations, and it was really handy to have a lot of
| cores available for general developer use.
|
| I got used to working that way and I've missed it since I left
| that company. So this project is an attempt to make a more
| general-purpose implementation of that system. I hope others
| will find similar workflows that make them more productive
| using this system or something like it.
|
| Regarding the container-per-test idea. It really comes about
| because it's the obvious way to package up jobs to submit them
| to a cluster. Plus, it makes tests reproducible for all
| developers in a project, and between developer machines and CI.
| Using Linux namespaces, the overhead of running tests in
| individual containers isn't much more than running tests in
| individual processes.
| nfachan wrote:
| I forgot to answer your two other questions.
|
| Maelstrom is open source and we plan to keep it that way. We
| may look at ways of selling access to a hosted cluster as
| service. Test running is very elastic, and could benefit from
| having an elastic service to support it.
|
| Maelstrom is completely root-less, so it'll work inside of
| Docker just fine. We regularly test Maelstrom within Maelstrom.
| nine_k wrote:
| A container implementation that depends neither on Docker _nor on
| runc_ is at the very least interesting, by itself.
| amluto wrote:
| It's very easy to write one. I've done it in half an hour _in
| bash_. (Most of the half hour was spent cursing at various
| versions of util-linux that were broken in creative ways.)
|
| Doing it _well_ is a different story.
| nine_k wrote:
| Well, yes, chroot, cgroups, mount --bind, and some ipfw /
| iptables stuff is enough to create a makeshift container.
|
| I _hope_ these guys are into doing it _well_ , else runc
| would be more than adequate for low-level stuff.
| amluto wrote:
| If anyone is doing it from scratch, in a real programming
| language (which, for better or for worse, seems to
| currently mean C or Go or futzing with the FFI raw
| syscalls), one shouldn't use chroot or the mount syscall.
| The new mount API is _much_ better.
|
| Cgroups are nice and add some fun features, but they're
| just icing on the cake and are also not necessary, even for
| a very functional and nicely secure container, unless the
| stuff inside the container needs cgroup delegation.
|
| Using iptables to make a container is IMO pathetic, and I'm
| hoping to find time at some point to work out something
| better.
| Joker_vD wrote:
| > The new mount API
|
| Could you please tell what exactly this API is? I'd like
| to try and use it.
| nfachan wrote:
| If you're interested, the main part of the container
| implementation is here: https://github.com/maelstrom-
| software/maelstrom/blob/main/cr...
|
| For each test we run, we clone the worker process, then make a
| bunch of Linux syscalls to set everything up for the container,
| then exec the test. We use the trick of having the child
| process share the virtual memory of the parent until the test
| is exec'ed.
|
| We also use a technique where we build up a "program" of simple
| operations (each operation more or less maps to a syscall) in
| the parent before cloning, then evaluate the program in the
| child. This gives us the same performance benefits of using
| posix_spawn or vfork, but lets us configure all of the
| namespace stuff while we're spawning.
|
| The code that's run in the child can be found here:
| https://github.com/maelstrom-software/maelstrom/blob/main/cr...
| nextaccountic wrote:
| For Rust tests, can Maelstrom be combined with nextest [0]?
| (Maelstrom provides the cargo maelstrom command, and nextest
| provides the cargo nextest command, with no obvious way to
| compose them)
|
| I guess an env variable to specify which test command to run
| (very low level) or something like cargo maelstrom --nextest
| would work (but then how to compose with other test runners?)
|
| Now,
|
| > It's fast. In most cases, Maelstrom is faster than cargo test,
| even without using clustering.
|
| That's surprising. Why is this the case?
|
| Will Maelstrom without clustering (running on a single machine)
| be faster than nextest as well? (Nextest is also faster than bare
| cargo test [1])
|
| Would combining nextest with Maelstrom bring further performance
| benefits, or is Malestrom already doing whatever improvements
| nextest do?
|
| [0] https://nexte.st/
|
| [1] https://nexte.st/docs/design/how-it-works/
| nfachan wrote:
| Our desire is to provide a general test-running and job-running
| framework, not to built the world's next test runner. We think
| nextest is great, and we were inspired by some of the things
| they did.
|
| We've designed Maelstrom to be usable as a library. So you can
| build your own test runner or job runner. We've been in contact
| with Rain, the primary developer of nextest, regarding how we
| can make it so that nextest can use Maelstrom. We'd love
| nothing more than to have nextest be Maelstrom-ized (Maelstrom-
| ified?).
|
| We definitely have a little bit of work to do, but we plan to
| make big steps with the API for the next release. Currently,
| the client library doesn't give per-test updates until the test
| finishes. This means that you don't know how long a test is
| taking to run until it's completed (though we do provide a
| timeout feature). This is fine for our currently limited UI,
| but is probably insufficient for nextest.
|
| Maelstrom in standalone mode running on a single machine is
| usually a bit slower than nextest. Maelstrom and nextest are
| similar in that they both run each test in their own processes,
| and they both do a good job of running enough test processes in
| parallel to keep the machine busy. Maelstrom has to do a little
| bit more work each time it starts a new process to set up the
| namespaces, so it's always going to be a bit slower than
| nextest, but not by much.
|
| One thing that Maelstrom does that I don't think nextest does
| is to use Longest Processing Time First (LPT) scheduling
| (https://en.wikipedia.org/wiki/Longest-processing-time-
| first_...). When the runtimes of tests varies a lot within a
| project, using LPT can result in big wins and more predictable
| runtimes. Maelstrom itself actually has some pretty long-
| running integration tests, and once we added LPT, running
| Maelstrom tests on Maelstrom is usually faster than running
| them on nextest. But again, we're not talking about huge
| differences in single-machine cases.
|
| I think cargo test is usually slower than both Maelstrom and
| nextest for the reasons described in the nextest documentation:
| cargo test doesn't always keep enough test threads running to
| keep the machine busy. However, if you have a lot of really
| small tests all in a single crate, then cargo test can and does
| outperform both Maelstrom and nextest. The clap project
| (https://github.com/clap-rs/clap)is a good example of this.
|
| I think Maelstrom does most of the performance things that
| nextest does. However, nextest obviously has a lot more
| features and integrations than Maelstrom.
| geekodour wrote:
| at first i thought it was about https://github.com/jepsen-
| io/maelstrom/tree/main which shares the same name
___________________________________________________________________
(page generated 2024-07-10 23:02 UTC)