[HN Gopher] Verifying distributed systems with Isabelle/HOL
___________________________________________________________________
Verifying distributed systems with Isabelle/HOL
Author : eatonphil
Score : 117 points
Date : 2022-10-12 15:02 UTC (7 hours ago)
(HTM) web link (lawrencecpaulson.github.io)
(TXT) w3m dump (lawrencecpaulson.github.io)
| whispersnow wrote:
| Martin Kleppmann is the master of any complicated system charts!!
| spinningslate wrote:
| Enjoyed this. The underlying formalism seems based (I think?) on
| concurrent, independently-executing processes that are stimulated
| by messages. That has some similarities with the Actor model [0],
| albeit Actors support concurrent, intra-process message handling
| whereas this seems sequential at the individual process level. So
| maybe more like Communicating Sequential Processes [1].
|
| For anyone more knowledgable on the topic: is that right? And is
| that model common across other proof systems like TLA?
|
| Second (bonus) question: why separate messages from user input?
|
| Thanks.
|
| [0] https://en.wikipedia.org/wiki/Actor_model [1]
| https://en.wikipedia.org/wiki/Communicating_sequential_proce...
| mjb wrote:
| > And is that model common across other proof systems like TLA?
|
| Yes, it's pretty common, and is a really nice abstraction.
|
| TLA+ doesn't use that model, although a lot of TLA+ code is
| implemented that way. TLA+'s model is basically "shared memory"
| where any step can access or change any state in any way it
| likes. That's super flexible, but can feel a little too
| flexible.
|
| P, on the other hand, is based on a model of state machines
| that communicate with messages. See this example for a clear
| illustration of how that works:
| https://p-org.github.io/P/tutorial/twophasecommit/
| mcguire wrote:
| " _Even though in reality processes may run in parallel, we do
| not need to model this parallelism since the only communication
| between processes is by sending and receiving messages, and we
| can assume that a process finishes processing one event before
| starting to process the next event. Every parallel execution is
| therefore equivalent to some linear sequence of execution steps.
| Other formalisations of distributed systems, such as the TLA+
| language, also use such a linear sequence of steps._ "
|
| Ha-hah! Proving that you can treat "processing one event" as
| globally atomic was the funnest thing in grad school.
| mjb wrote:
| > For this reason, formal proofs of correctness are valuable for
| distributed algorithms.
|
| I have a massive amount of respect for Martin and his work, but I
| think that this emphasis is the wrong one (if our goal is to
| increase the correctness of deployed distributed algorithms).
|
| Instead, I like to think (building off work of folks like Ankush
| Desai, an AWS colleague and creator of P) that the process of
| specification, and it's products, are more valuable than the
| creation of a proof. Model checking - either exhaustive or
| stochastic - is valuable for checking the properties of
| specifications, but pushing the last step into full proof is
| often not worth the additional effort. Instead, I think making
| proof the focus and goal tends to turn most programmers off the
| whole topic.
|
| I think we should be saying to folks "If you pick up these tools
| you'll end up with a crisp specification that'll help you move
| faster at development time, extremely clear documentation of your
| protocol, and a deeper understanding of what your protocol does"
| instead of "if you pick up these tools you'll end up with the
| mathematical artifact of a proof". This has been a big shift in
| my own thinking. When I first picked up Promela, Alloy, and
| TLA+/TLC/TLAPS, my own focus was on proof (and remained so until
| 2014 ish, when I spent a lot of time talking to Chris Newcombe
| and Leslie Lamport as we were writing the "Formal Methods at AWS"
| CACM paper). Over time I've found the journey much more valuable
| than the artifact, and tools like PlusCal and P which make
| specification more approachable more valuable than tools with
| less familiar syntax and semantics.
|
| We shouldn't be surprised that few people can make progress when
| we ask them to understand both Paxos and Isabelle at the same
| time! Maybe if that's your focus as a PhD student, then it's OK.
|
| > The real challenge in verifying distributed algorithms is to
| come up with the right invariant that is both true and also
| implies the properties you want your algorithm to have.
| Unfortunately, designing this invariant has to be done manually.
|
| This is true. Invariants are hard. But maybe I'm more optimistic:
| I think that there are a lot of very real systems with
| straightforward invariants. Lamport's three invariants for
| consensus (section 2.1 of Paxos Made Simple), for example:
| https://lamport.azurewebsites.net/pubs/paxos-simple.pdf
| Similarly, invariants like Linearizability, and even formal
| definitions of isolation.
|
| The research I would _love_ to see is work to synthesize
| invariants from formalisms like Adya 's or Crooks' work on
| database isolation levels. I think that's a very tractable
| project, and would be super useful to practitioners.
| comfypotato wrote:
| Are your focus and Dr. Kleppmann's mutually exclusive? I ask
| because it seems like the Isabelle formulation emerges from the
| protocol itself. The invariant is the only added work, yes?
|
| I'm effectively asking if the proof generation could replace
| the model checking as a way of verifying that you have
| appropriately specified your protocol. Big information exchange
| protocols can be confusing!
|
| (Maybe I'm misunderstanding what an invariant is... I'm
| thinking it's the property proved true by induction (and would
| go something along the lines of "if and only if the good has
| been sold, then the merchant has received payment for the
| good"))
| Nokurn wrote:
| This is great stuff. I did something similar in a 2020 paper[1]
| using Coq. I did not find Coq particularly suited for this type
| of work. Isabelle/HOL looks promising.
|
| One thing to note is that these types of verifications can become
| difficult when the systems are layered. Even the simplest real
| world systems are layered, when you factor in the underlying
| transport for the messages. The TLC paper provided a way for
| distributed systems to be composed in a way that reduces the
| complexity of the proofs for each distributed component.
|
| [1] https://dl.acm.org/doi/10.1145/3409005
| allanrbo wrote:
| For context, Kleppmann is the author of the book Designing Data-
| Intensive Applications (DDIA). https://dataintensive.net/
| oa335 wrote:
| I'm interested in learning how to formally express distributed
| systems but have limited time. Does anyone have any insight into
| which would be better to learn - TLA+/PlusCal (e.g. Hillel
| Wayne's courses) or Isabelle/HOL?
| ocschwar wrote:
| TLA+ definitely. The lowest rung on the ladder is actually
| reachable.
| javcasas wrote:
| Martin Kleppmann verifying distributed systems? That's as good as
| it can get.
| eatonphil wrote:
| With a proof assistant written in Standard ML. :)
|
| https://isabelle.in.tum.de/
| javcasas wrote:
| Yeah, a distributed system with proofs of it working under
| some specific assumptions. This is the future. It's a future
| hard and complicated because the tools and the expertise are
| not here yet, but definitely the future.
| preseinger wrote:
| My perspective from industry is that we're not really suffering
| for a lack of formal verification, or anything at the
| specification level. Basically all of the meaningful problems
| that need help and attention are with the implementations. I've
| been on both sides, and I can say it's much harder to implement a
| bug free distributed system for large scale production use than
| to specify one.
___________________________________________________________________
(page generated 2022-10-12 23:00 UTC)