[HN Gopher] Why Rust nextest is process-per-test
       ___________________________________________________________________
        
       Why Rust nextest is process-per-test
        
       Author : jicea
       Score  : 88 points
       Date   : 2025-01-09 19:40 UTC (3 days ago)
        
 (HTM) web link (sunshowers.io)
 (TXT) w3m dump (sunshowers.io)
        
       | marky1991 wrote:
       | I don't understand why he jumps straight from 'one test per
       | process' to 'one test per thread' as the alternative.
       | 
       | I'm not actually clear what he means by 'test' to be honest, but
       | I assume he means 'a single test function that can either pass or
       | fail'
       | 
       | Eg in python (nose)
       | 
       | class TestSomething: def test_A(): ... def test_B(): ...
       | 
       | I'm assuming he means test_A. But why not run all of
       | TestSomething in a process?
       | 
       | Honestly, I think the idea of having tests have shared state is
       | bad to begin with (for things that truly matter, eg if the
       | outcome of your test depends on the state of sys.modules,
       | something else is horribly wrong), so I would never make this
       | tradeoff to benefit a scenario that I never think should be done.
       | 
       | Even if we were being absolute purists, this still hasn't solved
       | the problem, the second your process communicates with any other
       | process (or server). And that problem seems largely unsolveable,
       | short of mocking.
       | 
       | Basically, I'm not convinced this is a good tradeoff, because the
       | idea of creating thousands and thousands of processes to run a
       | test suite, even on linux, sounds like a bad idea. (And at work,
       | would definitely be a bad idea, for performance reasons)
        
         | cbarrick wrote:
         | > I'm not actually clear what he means by 'test' to be honest,
         | but I assume he means 'a single test function that can either
         | pass or fail'
         | 
         | I assume so as well.
         | 
         | Unit testing in Rust is based around functions annotated with
         | #[test], so it's safe to assume that when the author says
         | "test" they are referring to one such function.
         | 
         | It's up to the user to decide what they do in each function.
         | For example, you could do a Go-style table-driven test, but the
         | entire function would be a single "test", _not_ one "test" per
         | table entry.
        
         | saghm wrote:
         | I think most of the context that might explain your confusion
         | is the way that tests work out of the box in Rust. The default
         | test harness when invoking `cargo test` runs one test per
         | thread (and by default parallelizes based on the number of
         | cores available, although this is configurable with a flag). In
         | Rust, there isn't any equivalent to the `TestSomething` class
         | you give; each test is always a top-level function. Since
         | `cargo nextest` is a mostly drop-in replacement for `cargo
         | test`, I imagine the author is using one test per thread as an
         | alternative because that's the paradigm that users will be
         | switching from if they start using cargo nextest.
         | 
         | While enforcing no shared state in tests might be useful, that
         | wouldn't be feasible in Rust without adding quite a lot of
         | constraints that would be tough if not impossible to enforce in
         | a drop-in replacement for cargo test. There's certainly room
         | for alternatives in the testing ecosystem in Rust that don't
         | try to maintain compatibility with the built-in test harness,
         | but I don't think the intention of cargo nextest is to try to
         | do that.
         | 
         | One other point that might not be obvious is that right now,
         | there's no stable way to hook into Rust's libtest. The only
         | options to provide an alternative testing harness in Rust are
         | to either only support nightly rather than stable, break
         | compatibility with tests written for the built-in test harness,
         | or provide a separate harness that still supports existing
         | tests. I'm sure there are arguments to be made for each of the
         | other alternatives, but personally, I don't think there's any
         | super realistic chance for adoption of anything that picks the
         | first two options, so the approach cargo nextest is taking the
         | most viable approach available (at least until libtest
         | stabilizes, but it's not obvious when that will happen).
        
         | jclulow wrote:
         | NB: the author's pronouns are they/she, not he.
        
         | sunshowers wrote:
         | FYI I use they/she pronouns (thanks jclulow!)
         | 
         | > Honestly, I think the idea of having tests have shared state
         | is bad to begin with (for things that truly matter, eg if the
         | outcome of your test depends on the state of sys.modules,
         | something else is horribly wrong), so I would never make this
         | tradeoff to benefit a scenario that I never think should be
         | done.
         | 
         | I don't disagree as a matter of principle, but the reality
         | really is different. Some of the first nextest users outside of
         | myself and my workplace were graphical libraries and engines.
         | 
         | > Basically, I'm not convinced this is a good tradeoff, because
         | the idea of creating thousands and thousands of processes to
         | run a test suite, even on linux, sounds like a bad idea. (And
         | at work, would definitely be a bad idea, for performance
         | reasons)
         | 
         | With Rust or other native languages it really is quite fast.
         | With Python I agree not as much, so this tradeoff wouldn't make
         | as much sense there yes.
         | 
         | But note that things like test cancellation are a little easier
         | to do in an interpreted model.
        
         | hinkley wrote:
         | > Honestly, I think the idea of having tests have shared state
         | is bad to begin with
         | 
         | I blame this partially on our notions of code reuse. We
         | conflate it with several other things, and in the case of tests
         | we conflate it with state reuse.
         | 
         | And the availability of state reuse leads to people writing
         | fakes when they should be using mocks, and people not being
         | able to tell the difference between mocks and fakes and thus
         | being incapable to have a rational discussion about them.
         | 
         | To my thinking, and the thinking of pretty much all of the test
         | experts I've called Mentor (or even Brother), beforeEaches
         | should be repeatable. Add a test, it repeats one more time.
         | Delete a test, one less. And if they're repeatable, they don't
         | have to share the same heap. One heap is as good as another.
         | 
         | Lots of languages can only do that segregation at the process
         | level. In NodeJS it would be isolates (workers). If you're very
         | careful about global state you could do it per thread. But that
         | doesn't happen very often because "you" includes library
         | writers, language designers, and your coworker Steve who is
         | addicted to in-memory caching. I can say, "don't be Steve"
         | until I'm blue in the face but nearly every team hires at least
         | one Steve, and some are rotten with them.
        
       | grayhatter wrote:
       | Restating the exact same thing 4 different times in the first few
       | paragraphs is an LLM feature right?
        
         | diggan wrote:
         | Sounds more like someone is writing "Inverted pyramid" style,
         | where you repeat information somewhat but go deeper and deeper
         | every time.
        
         | sunshowers wrote:
         | I had Claude do a mild review but this was entirely written by
         | me. The sibling is right: the goal is to present information in
         | increasing levels of detail. Repetition is a powerful tool to
         | make ideas stick.
         | 
         | I do try and present a decent level of detail in the post.
        
       | OptionOfT wrote:
       | I prefer per process over the alternatives.
       | 
       | When you write code you have the choice to do per process, per
       | thread, or sequential.
       | 
       | The problem is that doing multiple tests in a shared space
       | doesn't necessarily match the world in which this code is run.
       | 
       | Per process testing allows you to design a test that matches the
       | usage of your codebase. Per thread already constrains you.
       | 
       | For example: we might elect to write a job as a process that runs
       | on demand, and the library we use has a memory leak, but it can't
       | be fixed in reasonable time. Since we write it as a process that
       | gets restarted we manage to constrain the behavior.
       | 
       | Doing multiple tests in multiple threads might not work here as
       | there is a shared space that is retained and isn't representative
       | of real world usage.
       | 
       | Concurrency is a feature of your software that you need to code
       | for. So if you have multiple things happening, then that should
       | be part of your test harness.
       | 
       | The test harness forcing you to think of it isn't always a
       | desirable trait.
       | 
       | That said, I have worked on a codebase where we discovered bugs
       | because the tests were run in parallel, in a shared space.
        
         | sunshowers wrote:
         | There is definitely some value in shaking out bugs by running
         | code in parallel in the same process space -- someone on
         | Lobsters brought this up too. I've been wondering if there's
         | some kind of optional feature that can be enabled here.
        
           | NewJazz wrote:
           | Can you do it deliberately in the test?
        
             | sunshowers wrote:
             | Yes, you can write your own meta-test that runs all of the
             | actual tests in separate threads. But it's a bit
             | inconvenient and you won't necessarily get separate
             | reporting.
        
       | amelius wrote:
       | According to some of these reasons every library should run in
       | its own process too.
        
         | sujayakar wrote:
         | that's roughly what the wasm component model is aiming for!
         | 
         | https://hacks.mozilla.org/2019/11/announcing-the-bytecode-al...
        
         | sunshowers wrote:
         | I think the difference is that misbehavior isn't as expected of
         | libraries as it is of tests. But yes if your libraries are
         | prone to misbehavior, it's common to compartmentalize them in
         | separate processes.
        
       | sedatk wrote:
       | > Memory corruption in one test doesn't cause others to behave
       | erratically. One test segfaulting does not take down a bunch of
       | other tests.
       | 
       | Is "memory corruption" an issue with Rust? Also, if one test
       | segfaults, isn't it a reason to halt the run because something
       | got seriously broken?
        
         | codetrotter wrote:
         | > Is "memory corruption" an issue with Rust?
         | 
         | You can cause memory corruption if you opt out of memory safety
         | guarantees by using Unsafe Rust.
         | 
         | https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html
         | 
         | Sometimes unsafe is necessary and the idea then is that the
         | "dangerous" parts of the code remain isolated in explicitly
         | marked "unsafe" blocks, where it can be closely reviewed.
         | 
         | Also, even if your own Rust code is doing nothing unsafe you
         | might be using external libraries written in other languages
         | and things might go wrong.
         | 
         | > if one test segfaults, isn't it a reason to halt the run
         | because something got seriously broken?
         | 
         | Sometimes it's still interesting and helpful to continue
         | running other tests even if one fail. If several of them fail
         | it might even help you pinpoint exactly what's going wrong than
         | just a single failure might. (Although having a bunch of
         | failing tests can also be more noise.)
        
         | sunshowers wrote:
         | In local runs it makes sense to halt the run, but in CI not as
         | much -- nextest makes this entirely configurable.
        
       | Ericson2314 wrote:
       | This is good for an entirely different reason, which is running
       | cross-compiled tests in an emulator.
       | 
       | That is especially good for bare metal. If you don't have global
       | allocator, have limitted ram, etc., you might not be able to
       | write the test harness as part of the guest program at all! so
       | you want want to move as much logic to the host program as
       | possible, and then run as little as a few instructions (!) in the
       | guess program.
       | 
       | See https://github.com/gz/rust-x86 for an example of doing some
       | of this.
        
         | sunshowers wrote:
         | This is why Miri also works well with nextest, yeah!
        
       | pjc50 wrote:
       | This will be _horrendously_ slow on Windows.
        
         | sunshowers wrote:
         | It's certainly slower than on Linux, but horrendous is a little
         | bit of an overstatement. For many real-world test suites,
         | process creation is a small part of overall time, and nextest
         | still ends up being overall faster due to better handling of
         | long-pole tests. The issues around thread termination and stuff
         | are quite relevant on Windows. (And maybe posts like mine are
         | going to get Microsoft and AV vendors to pay more attention to
         | process creation performance! It really doesn't have to be
         | slow, it just is in practice.)
         | 
         | Measured by weekly downloads (around 120k a week total last I
         | checked), Windows is actually the number two platform nextest
         | is used on, ahead of macOS. It's mostly CI, but clearly a lot
         | of people are getting value out of nextest on Windows.
        
       | cortesi wrote:
       | Nextest is one of the very small handful of tools I use dozens or
       | hundreds of times a day. Parallelism can reduce test suite
       | execution time significantly, depending on your project, and has
       | saved me cumulative days of my life. The output is nicer, test
       | filtering is nicer, leak detection is great, and the developer is
       | friendly and responsive. Thanks sunshowers!
       | 
       | The one thing we've had to be aware of is that the execution
       | model means there can sometimes be differences in behaviour
       | between nextest and cargo test. Very occasionally there are tests
       | that fail in cargo test but succeed in nextest due to better
       | isolation. In practice this just means that we run cargo test in
       | CI.
        
       | zbentley wrote:
       | This article is a good primer on why process isolation is more
       | robust/separated than threads/coroutines _in general_ , though
       | ironically I don't think it fully justifies why process isolation
       | is better _for tests_ as a specific usecase benefitting from that
       | isolation.
       | 
       | For tests specifically, some considerations I found to be
       | missing:
       | 
       | - Given speed requirements for tests, and representativeness
       | requirements, it's often beneficial to refrain from too much
       | isolation so that multiple tests can use/excercise paths that use
       | pre-primed in memory state (caches, open sockets, etc.). It's odd
       | that the article calls out that global-ish state mutation as a
       | specific benefit of process isolation, given that it's often
       | substantially faster and more representative of real production
       | environments to run tests in the presence of already-primed
       | global state. Other commenters have pointed this out.
       | 
       | - I wish the article were clearer about threads as an alternative
       | isolation mechanism _for sequential tests_ versus threads as a
       | _means of parallelizing tests_. If tests really do need to be run
       | in parallel, processes are indeed the way to go in many cases,
       | since thread-parallel tests often test a more stringent
       | requirement than production would. Consider, for example, a
       | global connection pool which is primed sequentially on webserver
       | start, before the webserver begins (maybe parallel) request
       | servicing. That setup code doesn 't need to be thread-safe, so
       | using threads to test it _in parallel_ may surface concurrency
       | issues that are not realistic.
       | 
       | - On the other hand, enough benefits are lost when running clean-
       | slate test-per-process that it's sometimes more appropriate to
       | have the test harness orchestrate a series of parallel executors
       | and schedule multiple tests to each one. Many testing frameworks
       | support this on other platforms; I'm not as sure about Rust--my
       | testing needs tend to be very simple (and, shamefully, my
       | coverage of fragile code lower than it should be), so take this
       | with a grain of salt.
       | 
       | - Many testing scenarios want to abort testing on the first
       | failure, in which case processes vs. threads is largely moot. If
       | you run your tests with a thread or otherwise-backgrounded
       | routine that can observe a timeout, it doesn't matter whether
       | your test harness can reliably kill the test and keep going;
       | aborting the entire test harness (including all processes/threads
       | involved) is sufficient in those cases.
       | 
       | - Debugging tools are often friendlier to in-process test code.
       | It's usually possible to get debuggers to understand process-
       | based test harnesses, but this isn't usually set up by default.
       | If you want to breakpoint/debug during testing, running your
       | tests in-process and on the main thread (with a background thread
       | aborting the harness or auto-starting a debugger on timeout) is
       | generally the most debugger-friendly practice. This is true on
       | most platforms, not just Rust.
       | 
       | - fork() is a middle ground here as well, which can be slow,
       | though mitigations exist, but can also speed things up
       | considerably by sharing e.g. primed in-memory caches and socket
       | state to tests when they run. Given fork()'s sharp edges re:
       | filehandle sharing, this, too, works best with sequential rather
       | than parallel test execution. Depending on the libraries in use
       | in code-under-test, though, this is often more trouble than it's
       | worth. Dealing with a mixture of fork-aware and fork-unaware code
       | is miserable; better to do as the article suggests if you find
       | yourself in that situation. How to set up library/reusable code
       | to hit the right balance between fork-awareness/fork-safety and
       | environment-agnosticism is a big and complicated question with no
       | easy answers (and also excludes the easy rejoinder of "fork is
       | obsolete/bad/harmful; don't bother supporting it and don't use
       | it, just read Baumann et. al!").
       | 
       | - In many ways, this article makes a good case for something it
       | doesn't explicitly mention: a means of annotating/interrogating
       | in-memory global state, like caches/lazy_static/connections, used
       | by code under test. With such an annotation, it's relatively easy
       | to let invocations of the test harness choose how they want to
       | work: reuse a process for testing and re-set global state before
       | each test, have _the harness itself_ (rather than tests by side-
       | effect) set up the global state, run each test with and /or
       | without pre-primed global state and see if behavior differs, etc.
       | Annotating such global state interactions isn't trivial, though,
       | if third-party code is in the mix. A robust combination of
       | annotations in first-party code and a clear place to manually
       | observe/prime/reset-if-possible state that isn't annotated is a
       | good harness feature to strive for. Even if you don't get 100% of
       | the way there, incremental progress in this direction yields
       | considerable rewards.
        
         | sunshowers wrote:
         | But that's exactly it, right -- everything you've listed are
         | valid points in the design space, but they require a lot of
         | coordination between various actors in the system. Process per
         | test solves a whole swath of coordination issues.
        
       ___________________________________________________________________
       (page generated 2025-01-12 23:01 UTC)