[HN Gopher] CreuSAT: Formally verified SAT solver written in Rus...
___________________________________________________________________
CreuSAT: Formally verified SAT solver written in Rust and verified
with Creusot
Author : ingve
Score : 220 points
Date : 2022-06-17 15:57 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| joshlk wrote:
| What are some practical applications for a SAT solver?
| Jweb_Guru wrote:
| Verifying programs, for one thing--CreuSAT actually calls out
| to no less than _three_ different SAT solvers in the course of
| automatically discharging its proof obligations :)
|
| More generally, SAT solving is useful whenever you have
| something that can be represented as a circuit, which is an
| extremely large class of problems. Indeed, SAT can be proven to
| be polynomial-time equivalent to _any_ problem in NP (which you
| can read as "any decision problem whose yes solution is
| efficiently verifiable"), which includes many useful problems:
| common examples include the traveling salesman problem,
| detecting hamiltonian cycles, subgraph isomorphism
| (interestingly, graph _isomorphism_ is no longer believed to be
| NP-complete), graph coloring, and many others (which in turn
| often have easy reductions from other extremely useful
| problems). Since SAT solvers have received so much optimization
| work, in practical cases these problems are very often solved
| by reducing them to some variant of a satisfiability problem,
| rather than trying to solve them directly.
|
| It is even more useful when combined with theories that can
| compute special cases (like arithmetic on natural numbers) much
| more efficiently, which is known as an SMT solver
| (satisfiability modulo theory). Sometimes, the theories are
| undecidable, which means the SMT solver is allowed to report
| "don't know" or run forever in some cases (in practice, people
| then massage the inputs to try to get them within the solvable
| cases). This is the form in which SAT solvers are used most
| often. There are a ton of applications, but an immediate one is
| determining whether version constraints in a package manager
| (which may include equalities, inequalities, etc.) can all be
| simultaneously satisfied (and producing a satisfying
| assignment). There are a _lot_ of good examples, though, which
| is why SMT solvers are so ubiquitous in the literature when it
| comes to "how do we do this really hard thing?" (e.g. they are
| often critical components of program synthesis).
|
| Sorry if this wasn't that helpful--it's a very broad topic and
| I'm sure many people here can give you a much better answer!
| k0k0r0 wrote:
| Another SAT developer here. Most prominently, SAT Solvers are
| used in the design and construction of CPUs. See the EDA
| Challenge of the SAT Competition 2021.
| bjornsing wrote:
| So can CreuSAT be used as a backend for Creusot?
|
| A "self-hosted" SAT-solver, in the sense that it can prove its
| own correctness, that would certainly give you a speaking slot at
| some conference. But also a philosophical dilemma I guess. :)
| Sarek wrote:
| This was asked in the Rust Zulip as well, and is actually the
| original idea which this thesis evolved from (Xavier said
| something along the lines of "My supervisor and I have talked
| about how cool it would be to have a solver which verifies
| itself").
|
| The short answer is that it should be possible to write a Why3
| driver for CreuSAT which allows it to handle basic
| propositional reasoning, and that one would probably need to
| add some theories.
|
| The a bit longer answer:
|
| Getting it to work as a backend should probably not be all that
| hard, and it should probably be possible to make it able to
| prove some things as well. Getting it to verify itself would
| probably be very hard. As it stands, the proof needs Alt-Ergo,
| CVC4 and Z3 (all of which are really good) to pass, and some of
| the obligations take a long time. A solver capable of verifying
| itself would (I imagine) have to put much more emphasis on
| having only simple proofs which were solvable using a minimal
| amount of theories, and would probably have to run for a very
| long time. I dunno, would be a fun project, and I guess one
| could decide after having tried to prove for instance Friday
| (the super naive solver) if it is at all worth pursuing.
| netr0ute wrote:
| Will there be a version of this for the College Board type of
| SAT?
| jerf wrote:
| That's not possible; it's well known that the College Board SAT
| is inconsistent:
|
| https://www.youtube.com/watch?v=POMX80Q11O0
|
| https://www.youtube.com/watch?v=Xh5coE5wJ2I
| bergenty wrote:
| Those examples are mistakes the test makers made. It doesn't
| make the test "inconsistent".
| eutectic wrote:
| I thought proof checking tools mostly negated the need for
| verified sat solvers.
| k0k0r0 wrote:
| A former coworker of mine was in the process of publishing a
| really nice paper about a framework that allows concurrent
| resp. simultaneous verification of proofs generated by most
| state-of-the-art SAT Solvers, which drastically reduced the
| time needed to construct and then resp. simultaneously verify
| UNSAT proofs. Unfortunately, he left before publishing the
| paper.
|
| (Among other reasons he was unhappy with the academic world,
| which I do understand perfectly.) Unfortunately, I have trouble
| reaching him. I believe his basically finished paper would be
| of so much use to some people. It's really sad. I believe his
| paper really should be published in a journal or at least put
| online somewhere. It's not totally groundbreaking or something,
| but like hard work nobody had the determination to, yet. At
| least not publicly. I feel so bad that I can not share his work
| without his permission.
| Jweb_Guru wrote:
| Producing the proofs (especially for UNSAT) is often very time
| consuming, and verifying them can be subtle (and if an
| incorrect proof is produced, you still have to deal with that
| case somehow...). It's not so much that they negate the need
| for verified SAT solvers as that creating and maintaining a
| verified SAT solver was considered much too difficult to
| justify the proof effort required. If the goal can be
| accomplished with an order of magnitude less effort, that
| calculus changes somewhat.
| zozbot234 wrote:
| SAT solving is the subtle and time consuming part. A valid
| SAT proof certificate can be verified efficiently.
| Jweb_Guru wrote:
| I'm referring to producing the certificate as the time-
| consuming part, not the verification time (I disagree that
| verification is not subtle, but I suppose that doesn't
| matter as there are verified verifiers). For example, many
| SAT solvers no longer attempt to provide "pure" resolution
| proofs because they are too expensive to produce, which
| restricts the kinds of techniques you see used in
| competition. In fact, producing certificates efficiently is
| an active research area, which wouldn't be the case if
| certificate production had negligible cost, e.g. see
| https://lmcs.episciences.org/9357/pdf.
| xavxav wrote:
| SAT comp gives you 5x the solving time to check your proof,
| which is indicative that checking isn't so simple (though
| it is on paper)
| Sarek wrote:
| Generating a proof and checking it is definitely a step up, but
| it does have a few issues.
|
| - SAT is in NP, and UNSAT is in CoNP. This means that proofs
| take a lot of space to store and a lot of time to check. Proofs
| are usually run with 5x the time limit of the initial solve,
| and still end up timing out.
|
| - Proofs are (currently) based on resolution, which means that
| research on efficient solving techniques which can not be
| efficiently modelled as resolution is not being prioritised.
| This is touched on briefly in the first 2 talks of Randy Bryant
| on: https://fm.csl.sri.com/SSFT22/#splogic. I think the talks
| in general are among the better for understanding a SAT solver
| like CreuSAT.
| fjdh wrote:
| >Proofs are (currently) based on resolution
|
| Proofs are actually based on a practical version of extended
| resolution [1] called DRAT. Extended resolution is an
| extremely strong proof system, and research on techniques
| that are not efficiently modelled by resolution are in fact
| being prioritised (see for instance [2]). This is not to say
| that CreuSAT is not very impressive, but just a correction on
| the current state of the things.
|
| [1]: https://en.wikipedia.org/wiki/Frege_system#Extended_Freg
| e_sy...
|
| [2]: https://link.springer.com/chapter/10.1007/978-3-642-3136
| 5-3_...
| thesz wrote:
| This project uses one-way translation of Rust code into WhyML
| through Creusot. This translation is not formally guaranteed to
| be correct or semantics-preserving.
|
| Ideally, the code produced by Creusot should be translated back
| to Rust by a tool from Creusot or someone else. This way it is
| possible to use fixed point iteration, just like AST pretty-
| printer and parser for programming language syntax.
|
| This is more or less an approach taken by verification team of
| Buran (USSR's space shuttle) software. They took Fortran code and
| compiled it into executable code. The executable code was then
| manually translated back into Fortran and discrepancies analyzed.
| This way they found several errors in Fortran compiler. Buran
| made fully automatic landing in late 80-s.
|
| Contemporary verifiers should do the same automatically.
| xavxav wrote:
| Indeed, Creusot itself is not formally verified, but we have
| verified the metatheory in our paper RustHornBelt:
| https://people.mpi-
| sws.org/~dreyer/papers/rusthornbelt/paper.... Actually
| verifying the translation that Creusot produces would
| definitely be interesting future work, but I'm not sure it's
| where the best 'bang for the buck' is found right now, though I
| try to ensure the underlying translation would remain simple to
| formalize.
| jasinjames wrote:
| Fascinating anecdote. Where can I read more about this?
| thesz wrote:
| I don't know where you can read more. There are resources in
| Russian, but buran.ru is not quite functioning right now.
|
| I remember that story from the time when I worked with guys
| from Russian CQG team, who, in turn, worked in CQG with the
| member of the Buran's software engineering team. It was 15
| years ago.
| xavxav wrote:
| Hi! I'm the author of Creusot, Sarek did some amazing work on
| CreuSAT, he also had to teach himself formal verification and SAT
| solving along the way.
|
| If you have any questions about CreuSAT or Creusot, I'd be happy
| to answer
| rch wrote:
| At first glance I didn't see a license in the repo. Am I
| missing it?
| Sarek wrote:
| Ah, I knew I forgot something!
|
| Thanks, I licensed it under the MIT license now:)
| OscarCunningham wrote:
| How does it compare to kissat in terms of speed?
| newca12 wrote:
| Unsurprisingly, we can see a growing interest in the Rust
| ecosystem regarding formal verification. I try to keep
| https://github.com/newca12/awesome-rust-formalized-reasoning up
| to date. I will add CreuSAT shortly. The goal is to have an
| overview of what is actually Rust powered. Long term goal is to
| identify best practices and provide usefull common libraries.
| hinkley wrote:
| There's a whole undiscovered country of writing software that is
| both mathematically sound and intuitively sound. I had a long
| list of hopes and dreams in the late 90's of what I'd hoped
| software would be by now. I got about half of my wishes, and some
| of them were weird for me because I went from bitter struggles to
| get even lip service to some of those things, to two jobs later
| having them taken for granted. It's weird to brace for push back
| and then have people nod and change the subject.
|
| This is one of the wishes I have for 2040, that we continue to
| prove the ergonomics of type theory and escape analysis, and that
| having succeeded there, we start normalizing other domains of
| provably correct code. If you can do that for SAT solvers, maybe
| we'll start writing other kinds of AI in such a manner.
| danuker wrote:
| There is TLA+ for verifying specifications. Here is a talk by
| its creator:
|
| https://www.youtube.com/watch?v=-4Yp3j_jk8Q
| asimpletune wrote:
| I have wondered at times if the reason why we don't have those
| things is because, in such an environment, 20th century
| management might not work. Basically, "the curse of lisp",
| although I don't mean lisp specifically in this case, but
| rather languages and tooling that unleashes the power of formal
| verification and, in a larger way, the power of math.
|
| It's conceivable that in an alternate universe an individual
| engineer's productivity could be so high, because of math, that
| one developer would replace a whole team from our universe, and
| bugs would be extremely rare and mostly restricted to UI. From
| that point of view, the theory says, it's harder to build teams
| and these super charged developers are natural information
| silos. Or, in other words, it feels like the tech choices made
| have demonstrated a slight preference of less efficient
| developers that all understand a shared problem (e.g. this team
| is responsible for testing, or this team builds integrations
| with APIs, or another team parses unstructured data into
| structured data), over technology that would aid or formalize
| many aspects of those kinds of problems and others.
|
| Also, in such a world, I feel that software engineering would
| be much more abstract, and it probably wouldn't be approachable
| to those who aren't legitimately fascinated and interested in
| this stuff. Sort of like how I've never met anyone in my life
| who studied Math just to get rich, even though you can become
| phenomenally wealthy if you do.
|
| More likely though the answer is complicated and it's a mix of
| historical accident, what I mentioned, and a slew of other
| factors. Nonetheless, I too think it would be so cool if I
| could spend my time writing programs that write programs that
| write programs, all guaranteed to match how I stated the
| problem's invariants. I could spend the majority of my time
| really understanding the requirements and assumptions, as
| opposed to actually implementing the features.
| nemothekid wrote:
| > _It 's conceivable that in an alternate universe an
| individual engineer's productivity could be so high, because
| of math, that one developer would replace a whole team from
| our universe, and bugs would be extremely rare and mostly
| restricted to UI._
|
| I think this only applies for strictly theoretical problems
| and code golf, but not in real world programming. It's likely
| useful for low-level servers like compilers and parsers, but
| most of the world isn't working on those problems.
|
| For any engineering task (not just software engineering) half
| of the problem is _modeling the problem_. Most bugs I see in
| systems by non-junior engineers are in this camp; the
| engineer thought one thing but the world was actually
| different (and this abstracts up to, where _management_
| thought one thing, but the world was different).
|
| For a parser, that problem is easily defined. For other
| problems, the difficulty of programming just moves into
| verifier. Even if your code was bug-free, it isn't the case
| that your verifier is. For example lets say you are building
| a search engine; how do you embed the fact that queries for
| "Justin Beiber" should rank higher than "Justin Timberlake"
| except for when the term "super bowl" in included? Or how
| does the verifier embed the fact that local government
| regulation says that queries for "Winnie the pooh" should
| only return results for the kids cartoon character? Finally,
| Godel's shows us every system will have useful results that
| cannot be formally verified.
|
| On a more human level, I think verifiers have great use in
| building more stable systems but I don't think we will see a
| day where they make us more efficient. It's like believing
| cars will lead to people staying home all day just because it
| will only take 5 minutes to get to the grocery store. No, the
| human mind will invent evermore complex systems on top of the
| tools we have requiring even more engineers to manage them.
| Jweb_Guru wrote:
| In addition to the project itself being very cool and useful (you
| can count the number of verified SAT solvers that can solve most
| problems in modern SAT competitions on two fingers!), it's also
| an impressive demonstration of its toolchain, which essentially
| exploits the fact that the purely safe fragment of Rust
| (augmented with a handful of types that use unsafe code like Vec,
| which I believe have been proven compatible with the rest of
| Creusot in another recent paper) behaves like a functional
| language. This allows borrow-checked Rust code to be treated
| purely functionally in safe contexts, which allows exploiting the
| large amount of existing software tailored to proving purely
| functional specifications, without having to worry about memory-
| related side conditions, separation logic, etc.
|
| This has always been the hope for Rust verification tools (that
| Rust would scale to these kinds of large proofs better than other
| industrial imperative languages due to the restrictions of the
| borrow checker), and I think this is the first real practical
| indication that that is indeed the case. Both in terms of
| verification time and additional verification code, it weighs in
| at a small fraction of that of the only other verified solver
| with comparable performance on SAT benchmarks (IsaSAT), which is
| extracted using code generation from Isabelle (which is a full
| proof assistant), while other attempts to produce verified SAT
| solvers directly from imperative code fall far short of its
| performance because they do not implement any of the difficult
| optimizations (presumably due to the amount of extra proof effort
| required).
|
| Incidentally, another paper has come out at basically the same
| time making a similar observation, Aeneas:
| https://arxiv.org/abs/2206.07185, which uses a resizable hash map
| as its case study. Since it just came out, it does not have any
| larger proof developments yet that could be used as comparison,
| and it has somewhat different goals, but overall I am also
| excited about it and am very hopeful for the future of
| verification tools that integrate directly with Rust code like
| these two, as I think they are one of the more promising avenues
| towards the regular integration of verified code with more
| "ordinary" projects.
| asimpletune wrote:
| It would be cool if the unsafe aspects of Rust that are
| necessary (like Vec) could themselves be formally verified,
| like when each edition of the language gets compiled or
| something. I know some languages include proof tools but
| they're not super popular, and Rust's compiler helps a lot.
| Still, it would be really cool to _know_ , or at least know
| that you can't know. LinkedList is another example. There's no
| way to really write a linked list in the rust friendly way, so
| you have to use some boxing. All of that is fine, it probably
| works, but what if I forgot later, and changed something about
| the implementation, introducing bugs around edge cases or what
| not.
| Jweb_Guru wrote:
| This process is actually already underway, notably with the
| RustBelt project: https://plv.mpi-sws.org/rustbelt/. It has
| not yet achieved the kind of tight integration with the
| source language that stuff like Creusot has, but everyone
| would like it to move in that direction :)
|
| For the specific case of Vec (and a few other types--e.g.
| Cell, Mutex, etc.), as I mentioned, a recent paper by xavxav
| actually proves that the important parts of the unsafe
| implementation of Vec _do_ actually fulfill the signatures
| that are used by the CreuSAT proof. This is part of
| https://people.mpi-
| sws.org/~dreyer/papers/rusthornbelt/paper..., which I believe
| won the distinguished paper award at this year's PLDI!
| avgcorrection wrote:
| Oh wow. So efforts like this can actually eventually trickle
| down to more regular programming?
| Jweb_Guru wrote:
| Well, that's the research question! So far signs are
| promising, IMO. I think this is at least the best chance
| formal methods have ever had at being relatively easy to
| integrate into larger codebases--Rust's type system is strong
| enough to support nontrivial verification conditions, and
| effectively encapsulate them, without needing to directly
| deal with "exotic" logic, heavyweight encodings, pointer
| obligations that need to be manually discharged, or linking
| to code generated by some external language (a feature it
| shares with functional programming languages, as well as
| "hybrid" linear/dependently typed lanugages, like Dafny or
| ATS). Unlike functional or dependently typed programming
| languages, however, Rust has a lot of adoption
| _independently_ of its suitability for formal methods,
| meaning that ease of integration has a good chance of
| benefiting projects that don 't have correctness as a
| principle requirement. For example, assuming that a useful
| crate can be extracted from this project, it would be quite
| easy to integrate into a video game like Veloren (e.g. to
| help improve some of its AI heuristics), which is about as
| far from traditional high-assurance software as you can get!
|
| If we can make using high-assurance software as easy (or
| almost as easy) as non-high-assurance software, without
| sacrificing huge amounts of performance to do so, it will be
| a major step in making software more reliable for everyday
| programmers, rather than restricting these kinds of
| techniques to critical libraries like cryptographic
| primitives where correctness and performance justify pretty
| much any number of proof engineering hours. This doesn't mean
| everything would be verified, but with more and more verified
| "building blocks," less and less engineering time would be
| spent debugging third party code (which at least IME, is
| usually much more painful than verifying code at the top
| level). We have a long way to go to get there, but projects
| like this are an excellent start.
| jerf wrote:
| Does the proof encompass some sort of proof of amount of RAM
| consumption in it, presumably bounded by some characteristic of
| the problem presented?
|
| Regardless, this is very impressive, and one of the first
| things to raise my eyebrows w.r.t. formal proofs being possibly
| useful in the real world in a long time. If safe Rust can
| function as the bridge between the "practical" world and the
| world of proofs, that alone would be enough to justify Rust's
| existence.
| Sarek wrote:
| It does assume enough RAM, as it has no way of knowing how
| much RAM the target machine will have. It is thus correct
| with regards to the model specified by Rust, but a program
| which is proven correct in Creusot may get an OOM-error. A
| note here is that it will not provide the wrong answer due to
| insufficient memory, but it will crash.
|
| Proving some upper bound on the amount of memory used (and
| potentially gracefully exiting) should be possible, but I
| have not prioritised it, as OOM has not been an issue.
| jerf wrote:
| "has no way of knowing how much RAM the target machine will
| have"
|
| That's why I phrased it in terms of a bound based on some
| characteristic of the problem.
|
| But I was just asking for information. I know it's a well-
| known problem. To be honest the fact that we're even
| seriously discussing this as something that may be doable
| is a huge step forward. AIUI, previous state of the art on
| real languages would have this as hopelessly pie-in-the-
| sky.
|
| I think in many cases memory bound is uninteresting; one
| could examine a vector implementation and be satisfied that
| it won't explode unnecessarily. After all, we do exactly
| that on non-proved implementations without too much fear
| all the time. SAT just came to mind as a problem where that
| could be an interesting thing to be able to prove.
| Jweb_Guru wrote:
| No, it does not. This is mentioned as a limitation in the
| paper (it does, however, handle things like integer overflows
| correctly). I believe with some suitable extensions, it would
| be possible to prove invariants like this as well, but IMO
| functional correctness is a pretty good start!
| riedel wrote:
| This discussion is also interesting wrt the rust verification:
| https://news.ycombinator.com/item?id=30174827 . Creusot author
| did comment on it.
|
| Seems like there finally will be alternatives to Ada SPARK.
| ithkuil wrote:
| > on two fingers
|
| In which base? Can I use my knuckles?
| Jweb_Guru wrote:
| Base 1, no knuckles required :)
| andi999 wrote:
| Why are verified SAT solvers interesting?
| Jweb_Guru wrote:
| SAT solvers often form (part of) the backbone of other proof
| assistants and verified software and hardware, which makes
| them producing correct output quite critical. At the same
| time, production SAT solvers (the ones people actually use on
| the critical path of these proof assistants) are very
| performance-sensitive, which means tons of optimizations that
| are nonobvious, subtle, and in some cases not obviously
| memory safe. This makes them ideal candidates for formal
| verification.
|
| Unfortunately, in the past, the proof effort required to
| produce a competitive SAT solver was so high that this was
| considered infeasible, so instead the industry has been
| focused on non-verified SAT solvers that produce efficiently
| verifiable _certificates_ for their solution that can be
| checked by an independent verifier. This was an important
| improvement that has allowed SAT solvers to continue to
| evolve, but (as mentioned downthread) they have disadvantages
| compared to a solver that can be fully trusted without
| producing a certificate, including in "forcing" people to
| use techniques that produce proofs which we know how to
| efficiently verify.
|
| What this project promises is an order of magnitude reduction
| in proof effort to achieve and maintain an extensible,
| competitive, verified solver, which may help change the above
| calculus to favor verified solvers. IMO, it is quite an
| exciting development, even if it does not pan out!
| jhgb wrote:
| Maybe I'm missing something blindingly obvious, but why is
| "trusting" a SAT solver necessary in the first place? The
| solution itself either satisfies the constraints or it does
| not. Are there situations where we can't even tell whether
| the solution is correct by looking at the problem instance?
| Jweb_Guru wrote:
| For positive solutions, you are correct--verifying the
| solution is incredibly easy. For negative solutions
| (UNSAT), however, this is not the case--UNSAT is in Co-
| NP, not NP. Finding an efficiently verifiable
| representation of the UNSAT proof can be very
| challenging, which is part of why solvers are given much
| more time to produce and verify certificates than they
| are to solve the problem in the first place.
| jhgb wrote:
| Ahh, I see now what you mean. Thanks, that makes much
| more sense to me now.
| dwheeler wrote:
| Right, unsat is the interesting case where you especially
| want a proof of the SAT solver.
|
| Unsat can be used, in turn, to prove other things. If you
| want to prove that some condition C is always true in a
| set of expressions, just ask the SAT solver to solve for
| "not C". If the answer is UNSAT, then C is always true...
| if the answer UNSAT is actually correct.
| andi999 wrote:
| Thanks. Do you have a link or reference to more material on
| the certificates. I dont understand a bit, but I want to
| learn more.
| Jweb_Guru wrote:
| I will just steal from the paper's author (Sarek) and
| point you to the first two talks and lecture notes by
| Randy Bryant at this link:
| https://fm.csl.sri.com/SSFT22/#splogic.
| Animats wrote:
| Nice.
___________________________________________________________________
(page generated 2022-06-17 23:00 UTC)