[HN Gopher] Stateright: A model checker for implementing distrib...
___________________________________________________________________
Stateright: A model checker for implementing distributed systems
Author : yagizdegirmenci
Score : 110 points
Date : 2021-06-06 13:42 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| jonnadal wrote:
| Author here. I'm also working on a related book covering
| distributed algorithms: https://www.stateright.rs/
|
| It includes a comparison w/ TLA+/TLC for those familiar with
| those technologies.
| SkyMarshal wrote:
| This is pretty cool. Just want to copy-paste a paragraph from
| your link that sums up how it's different from other tools:
|
| _> Cloud service providers like AWS and Azure leverage
| verification software such as the TLA+ model checker to achieve
| the same goal, but whereas those solutions typically verify a
| high level system design, Stateright is able to verify the
| underlying system implementation in addition to the design
| (along with providing other unique benefits explained in the
| "Comparison with TLA+" chapter). On the other end of the
| spectrum are tools such as Jepsen which can validate a final
| implementation by testing a random subset of the system's
| behavior, whereas Stateright can systematically enumerate all
| possible behaviors within a specified model of the system._
| jonnadal wrote:
| Thank you!
| hwayne wrote:
| This is really cool! Also, `JobMarket` is a pun I'm surprised I
| haven't seen before.
|
| One recommendation for the guide: you know the tool much better
| than the audience, so you're going to underestimate how hard it
| is for readers to follow. It's something all experts do all the
| time. Have you considered smaller, sorter step examples?
| jonnadal wrote:
| I'm thrilled to hear that as I'm a big fan of your work!
|
| Thank you for the suggestion as well. Do you have thoughts on
| specific places where I should either split a chapter and/or
| associated project to make the ramp up more incremental?
| hwayne wrote:
| Hm. So the easiest, short term thing might be more
| annotations. For example, is `on_start` and `on_msg` Rust-
| isms, or are they particular to this project? If the latter,
| I'd suggest putting a comment describing what they do in the
| code itself, and then a separate page describing how they
| work (with a small, easily modifiable example). Things like
| that would go a long way to getting people to the expertise-
| level. Past that, I get the impression there are three ideas
| here:
|
| 1. Actor systems; writing code with Stateright
|
| 2. Model checking; model checking with Stateright
|
| 3. The actual problems we want to apply Stateright too
|
| Right now you're teaching all three of those simultaneously;
| I think it would be better if you could _scaffold_ , or
| introduce the ideas one at a time. For example, showcasing
| setup and modelchecking first on a property simpler than
| linearizability. I've been really fond of race-free critical
| sections for concurrent properties recently, just because it
| has relatively fewer moving parts.
|
| Sorry these thoughts are a little disjointed, writing guides
| is really really hard
|
| EDIT: probably not something that would be directly helpful
| to people, but one thing I'd personally be very interested in
| reading is how this all works under the hood. If I
| understanding correctly, you're making each Actor inspectable
| by `ActorModel`, which then acts as an orchestrator and
| records history. Is that correct?
| faeyanpiraat wrote:
| These thing are so exciting, but as a regular web dev I always
| wonder what are some real use cases?
|
| Most stuff can be done without any of the fancy containerization
| and such solutions.
| parhamn wrote:
| Wrong post?
| politician wrote:
| Imagine that you are designing an electronic bank vault door.
| The product requirements state that the door should be secure
| when it's closed and should only be able to be opened when
| authorized. Oh, in an emergency, the vault should fail closed
| unless there are people inside.
|
| How would you prove to the client that your solution implements
| these requirements? Modeling the system using a language like
| TLA+ is one way of doing that.
| gradschool wrote:
| How would I prove to the client that the implementation
| conforms to the model? Wouldn't it end up being written in
| some crappy programming language for which no one can agree
| on a formal semantics anyway, running on some grotesquely
| complicated processor family full of defects and bugs?
| nivertech wrote:
| The model is to prove your design, not the implementation.
|
| There is no reason to start implementing things, if your
| design is wrong.
|
| Formal Specification proves your design, while Formal
| Verification proves your implementation.
|
| As far as I understand this project - Stateright does both
| (i.e. 2-in-1). Kind of executable model?
| jonnadal wrote:
| Regarding the last point -- correct, Stateright aims to
| verify both.
|
| It's important to clarify that this doesn't provide a
| proof of correctness, but it can dramatically improve
| confidence in both the design and implementation compared
| with fuzz testing, for example. This is done by
| exhaustively enumerating possible nondeterministic
| outcomes (e.g. due to message reordering) within
| specified constraints (e.g. up to S servers and C clients
| performing X operations...).
|
| Examples:
|
| SD Paxos: https://github.com/stateright/stateright/blob/m
| aster/example...
|
| ABD (linearizable register algorithm): https://github.com
| /stateright/stateright/blob/master/example...
| hwayne wrote:
| We've been working on some techniques for this. To my
| understanding, Stateright (the OP's repo) directly connects
| the spec with Rust code, so you can make some
| implementation guarantees.
|
| If you don't have that (say you're using TLA+), one really
| promising technique is to generate behavioral traces and
| then refine those into code tests. I know a few companies
| have done this successfully, and I've been working on some
| examples of how it's done, but to my knowledge there's no
| battle-tested tooling that do this automatically. It's all
| bespoke for now.
| faeyanpiraat wrote:
| Wow, these formal specification tools is something I
| always wanted to look into but the learning curve seems
| steep.
|
| Having a specification which is guaranteed to be correct
| before implementation begins is a programmer's wet dream.
| hwayne wrote:
| > Wow, these formal specification tools is something I
| always wanted to look into but the learning curve seems
| steep.
|
| One of my dreams is to flatten that learning curve. I
| don't think we'll ever make these tools effortless, but I
| think it's possible to reach a point where "these tools
| are too hard to learn" isn't a barrier anymore.
|
| (Even then, I imagine a lot of people will learn some
| formal specification, and think "this isn't useful for
| me", and never touch them again. And that's fine. But I
| want anybody to be able to reach the point where they
| _can make that decision_ , as opposed to feeling like
| they're locked out by default. We have a long way to go,
| but I think we're considerably further along than we were
| five years ago.)
|
| > Having a specification which is guaranteed to be
| correct before implementation begins is a programmer's
| wet dream.
|
| I wouldn't say _guaranteed_ to be correct, but at least
| it 's, like, closer to correct? Less likely to have
| horrific bugs, or bizarre edge cases in the requirements
| that nobody notices until neck deep into the
| implementation. Stuff that increases founded confidence.
| mullr wrote:
| As a regular web dev, you are constantly staring a distributed
| system in the face: it's made of the program running on the
| server (back end), and the program running in the web browser
| (front end). How do those two programs agree on pieces of
| common information? How does the server deal with
| communications from other clients? At what cadence? How are the
| caches invalidated? Regular web dev is anything but simple, if
| you seriously consider the questions it poses. Tools like this
| help you think such questions.
| vajrabum wrote:
| Imagine that you are doing this at web scale. Instead of one
| web server you have 2000, and another 1000 middleware servers
| all talking to a distributed database cluster with another 500
| machines. You can probably run your web servers close to
| stateless sharing sessions across redis. And your middleware
| worker machines can all run pub/sub so as a web dev you won't
| have this problem much. But the developers of your distributed
| key/value store, your distributed pub/sub queueing system and
| your database are going to be very different animals. Take a
| look at the last 10 years of Kyle Kingsbury's work with Jepsen
| and you'll see what I mean (https://aphyr.com/tags/jepsen).
| Opportunities for corruption, unavailability and lossage abound
| with distributed systems.
| jonnadal wrote:
| Yep, Kyle has done an amazing job demonstrating how pervasive
| bugs are within real world distributed systems.
| yagizdegirmenci wrote:
| Imagine you are implementing a database engine and you want to
| make your database be able to run in run multiple nodes in
| order to make it fault-tolerant, therefore you need to
| implement some kind of consensus algorithm (Paxos, Raft) to
| select a leader and a followers, in case if some node fails,
| your entire cluster can keep working.
|
| That said, implementing a consensus algorithm is quite hard.
| Modeling and testing is much harder, there is even a language
| just for this purpose; TLA+ [0].
|
| [0]: https://lamport.azurewebsites.net/tla/tla.html
| hwayne wrote:
| Imagine you're making a theatre booking system, where groups
| can reserve tickets in blocks. Lots of people are reserving
| tickets both individually and in groups at a time. How do you
| enforce that groups are mostly booked together? What about if
| there's a contiguous block when they _start_ purchasing but
| then some other individuals want one of those seats before the
| group _finishes_ purchasing? What if one person in the group
| needs to cancel: do you pick a seat in the middle and let an
| individual "split" an existing group block, or do you pick one
| off the edge and shuffle the seats? Does that changed if the
| group spans two rows? Do any of these decisions change if the
| tickets are "family" tickets instead of "group" tickets? What
| if some of the individual tickets are for "floating" tickets
| and not strictly assigned seats? What about wheelchair
| accessible seats?
|
| Whatever things you end up deciding, does your system, where
| many people are reserving, cancelling, and re-reserving over
| long periods of time, guarantee all of the client's
| expectations?
| [deleted]
___________________________________________________________________
(page generated 2021-06-06 23:01 UTC)