[HN Gopher] Stateright: A model checker for implementing distrib...
       ___________________________________________________________________
        
       Stateright: A model checker for implementing distributed systems
        
       Author : yagizdegirmenci
       Score  : 110 points
       Date   : 2021-06-06 13:42 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | jonnadal wrote:
       | Author here. I'm also working on a related book covering
       | distributed algorithms: https://www.stateright.rs/
       | 
       | It includes a comparison w/ TLA+/TLC for those familiar with
       | those technologies.
        
         | SkyMarshal wrote:
         | This is pretty cool. Just want to copy-paste a paragraph from
         | your link that sums up how it's different from other tools:
         | 
         |  _> Cloud service providers like AWS and Azure leverage
         | verification software such as the TLA+ model checker to achieve
         | the same goal, but whereas those solutions typically verify a
         | high level system design, Stateright is able to verify the
         | underlying system implementation in addition to the design
         | (along with providing other unique benefits explained in the
         | "Comparison with TLA+" chapter). On the other end of the
         | spectrum are tools such as Jepsen which can validate a final
         | implementation by testing a random subset of the system's
         | behavior, whereas Stateright can systematically enumerate all
         | possible behaviors within a specified model of the system._
        
           | jonnadal wrote:
           | Thank you!
        
       | hwayne wrote:
       | This is really cool! Also, `JobMarket` is a pun I'm surprised I
       | haven't seen before.
       | 
       | One recommendation for the guide: you know the tool much better
       | than the audience, so you're going to underestimate how hard it
       | is for readers to follow. It's something all experts do all the
       | time. Have you considered smaller, sorter step examples?
        
         | jonnadal wrote:
         | I'm thrilled to hear that as I'm a big fan of your work!
         | 
         | Thank you for the suggestion as well. Do you have thoughts on
         | specific places where I should either split a chapter and/or
         | associated project to make the ramp up more incremental?
        
           | hwayne wrote:
           | Hm. So the easiest, short term thing might be more
           | annotations. For example, is `on_start` and `on_msg` Rust-
           | isms, or are they particular to this project? If the latter,
           | I'd suggest putting a comment describing what they do in the
           | code itself, and then a separate page describing how they
           | work (with a small, easily modifiable example). Things like
           | that would go a long way to getting people to the expertise-
           | level. Past that, I get the impression there are three ideas
           | here:
           | 
           | 1. Actor systems; writing code with Stateright
           | 
           | 2. Model checking; model checking with Stateright
           | 
           | 3. The actual problems we want to apply Stateright too
           | 
           | Right now you're teaching all three of those simultaneously;
           | I think it would be better if you could _scaffold_ , or
           | introduce the ideas one at a time. For example, showcasing
           | setup and modelchecking first on a property simpler than
           | linearizability. I've been really fond of race-free critical
           | sections for concurrent properties recently, just because it
           | has relatively fewer moving parts.
           | 
           | Sorry these thoughts are a little disjointed, writing guides
           | is really really hard
           | 
           | EDIT: probably not something that would be directly helpful
           | to people, but one thing I'd personally be very interested in
           | reading is how this all works under the hood. If I
           | understanding correctly, you're making each Actor inspectable
           | by `ActorModel`, which then acts as an orchestrator and
           | records history. Is that correct?
        
       | faeyanpiraat wrote:
       | These thing are so exciting, but as a regular web dev I always
       | wonder what are some real use cases?
       | 
       | Most stuff can be done without any of the fancy containerization
       | and such solutions.
        
         | parhamn wrote:
         | Wrong post?
        
         | politician wrote:
         | Imagine that you are designing an electronic bank vault door.
         | The product requirements state that the door should be secure
         | when it's closed and should only be able to be opened when
         | authorized. Oh, in an emergency, the vault should fail closed
         | unless there are people inside.
         | 
         | How would you prove to the client that your solution implements
         | these requirements? Modeling the system using a language like
         | TLA+ is one way of doing that.
        
           | gradschool wrote:
           | How would I prove to the client that the implementation
           | conforms to the model? Wouldn't it end up being written in
           | some crappy programming language for which no one can agree
           | on a formal semantics anyway, running on some grotesquely
           | complicated processor family full of defects and bugs?
        
             | nivertech wrote:
             | The model is to prove your design, not the implementation.
             | 
             | There is no reason to start implementing things, if your
             | design is wrong.
             | 
             | Formal Specification proves your design, while Formal
             | Verification proves your implementation.
             | 
             | As far as I understand this project - Stateright does both
             | (i.e. 2-in-1). Kind of executable model?
        
               | jonnadal wrote:
               | Regarding the last point -- correct, Stateright aims to
               | verify both.
               | 
               | It's important to clarify that this doesn't provide a
               | proof of correctness, but it can dramatically improve
               | confidence in both the design and implementation compared
               | with fuzz testing, for example. This is done by
               | exhaustively enumerating possible nondeterministic
               | outcomes (e.g. due to message reordering) within
               | specified constraints (e.g. up to S servers and C clients
               | performing X operations...).
               | 
               | Examples:
               | 
               | SD Paxos: https://github.com/stateright/stateright/blob/m
               | aster/example...
               | 
               | ABD (linearizable register algorithm): https://github.com
               | /stateright/stateright/blob/master/example...
        
             | hwayne wrote:
             | We've been working on some techniques for this. To my
             | understanding, Stateright (the OP's repo) directly connects
             | the spec with Rust code, so you can make some
             | implementation guarantees.
             | 
             | If you don't have that (say you're using TLA+), one really
             | promising technique is to generate behavioral traces and
             | then refine those into code tests. I know a few companies
             | have done this successfully, and I've been working on some
             | examples of how it's done, but to my knowledge there's no
             | battle-tested tooling that do this automatically. It's all
             | bespoke for now.
        
               | faeyanpiraat wrote:
               | Wow, these formal specification tools is something I
               | always wanted to look into but the learning curve seems
               | steep.
               | 
               | Having a specification which is guaranteed to be correct
               | before implementation begins is a programmer's wet dream.
        
               | hwayne wrote:
               | > Wow, these formal specification tools is something I
               | always wanted to look into but the learning curve seems
               | steep.
               | 
               | One of my dreams is to flatten that learning curve. I
               | don't think we'll ever make these tools effortless, but I
               | think it's possible to reach a point where "these tools
               | are too hard to learn" isn't a barrier anymore.
               | 
               | (Even then, I imagine a lot of people will learn some
               | formal specification, and think "this isn't useful for
               | me", and never touch them again. And that's fine. But I
               | want anybody to be able to reach the point where they
               | _can make that decision_ , as opposed to feeling like
               | they're locked out by default. We have a long way to go,
               | but I think we're considerably further along than we were
               | five years ago.)
               | 
               | > Having a specification which is guaranteed to be
               | correct before implementation begins is a programmer's
               | wet dream.
               | 
               | I wouldn't say _guaranteed_ to be correct, but at least
               | it 's, like, closer to correct? Less likely to have
               | horrific bugs, or bizarre edge cases in the requirements
               | that nobody notices until neck deep into the
               | implementation. Stuff that increases founded confidence.
        
         | mullr wrote:
         | As a regular web dev, you are constantly staring a distributed
         | system in the face: it's made of the program running on the
         | server (back end), and the program running in the web browser
         | (front end). How do those two programs agree on pieces of
         | common information? How does the server deal with
         | communications from other clients? At what cadence? How are the
         | caches invalidated? Regular web dev is anything but simple, if
         | you seriously consider the questions it poses. Tools like this
         | help you think such questions.
        
         | vajrabum wrote:
         | Imagine that you are doing this at web scale. Instead of one
         | web server you have 2000, and another 1000 middleware servers
         | all talking to a distributed database cluster with another 500
         | machines. You can probably run your web servers close to
         | stateless sharing sessions across redis. And your middleware
         | worker machines can all run pub/sub so as a web dev you won't
         | have this problem much. But the developers of your distributed
         | key/value store, your distributed pub/sub queueing system and
         | your database are going to be very different animals. Take a
         | look at the last 10 years of Kyle Kingsbury's work with Jepsen
         | and you'll see what I mean (https://aphyr.com/tags/jepsen).
         | Opportunities for corruption, unavailability and lossage abound
         | with distributed systems.
        
           | jonnadal wrote:
           | Yep, Kyle has done an amazing job demonstrating how pervasive
           | bugs are within real world distributed systems.
        
         | yagizdegirmenci wrote:
         | Imagine you are implementing a database engine and you want to
         | make your database be able to run in run multiple nodes in
         | order to make it fault-tolerant, therefore you need to
         | implement some kind of consensus algorithm (Paxos, Raft) to
         | select a leader and a followers, in case if some node fails,
         | your entire cluster can keep working.
         | 
         | That said, implementing a consensus algorithm is quite hard.
         | Modeling and testing is much harder, there is even a language
         | just for this purpose; TLA+ [0].
         | 
         | [0]: https://lamport.azurewebsites.net/tla/tla.html
        
         | hwayne wrote:
         | Imagine you're making a theatre booking system, where groups
         | can reserve tickets in blocks. Lots of people are reserving
         | tickets both individually and in groups at a time. How do you
         | enforce that groups are mostly booked together? What about if
         | there's a contiguous block when they _start_ purchasing but
         | then some other individuals want one of those seats before the
         | group _finishes_ purchasing? What if one person in the group
         | needs to cancel: do you pick a seat in the middle and let an
         | individual  "split" an existing group block, or do you pick one
         | off the edge and shuffle the seats? Does that changed if the
         | group spans two rows? Do any of these decisions change if the
         | tickets are "family" tickets instead of "group" tickets? What
         | if some of the individual tickets are for "floating" tickets
         | and not strictly assigned seats? What about wheelchair
         | accessible seats?
         | 
         | Whatever things you end up deciding, does your system, where
         | many people are reserving, cancelling, and re-reserving over
         | long periods of time, guarantee all of the client's
         | expectations?
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-06-06 23:01 UTC)