[HN Gopher] What I've Learned About Formal Methods in Half a Year
___________________________________________________________________
What I've Learned About Formal Methods in Half a Year
Author : todsacerdoti
Score : 131 points
Date : 2023-04-10 11:31 UTC (11 hours ago)
(HTM) web link (jakob.space)
(TXT) w3m dump (jakob.space)
| prisonguard wrote:
| I've always thought theorem proving to be about proving equality
| of the function and the result.
|
| Lean to me is not only obscure but unapproachable by someone from
| a non-mathematical background.
|
| Take this snippet for instance theorem
| nat.add_comm : [?] (n m : N), n + m = m + n := l (n m :
| N), nat.brec_on m (l (m : N) (_F :
| nat.below (l (m : N), [?] (n : N), n + m = m + n) m) (n : N),
| nat.cases_on m (l (_F : nat.below (l (m : N),
| [?] (n : N), n + m = m + n) 0), id_rhs (n +
| 0 = 0 + n) (eq.symm (nat.zero_add n))) (l (m :
| N) (_F : nat.below (l (m : N), [?] (n : N), n + m = m + n)
| (nat.succ m)), id_rhs (n + (m + 1) =
| nat.succ m + n) (eq.symm (nat.succ_add m n) > congr_arg nat.succ
| (_F.fst.fst n))) _F) n
|
| If we want to democratise formal methods and theorem proving, we
| should lean towards natural language expressions of proof and not
| mathematical expressions and greek characters.
|
| This is the path that languages such as Idris https://www.idris-
| lang.org/ have taken
| voxl wrote:
| Something else important to note: this proof of commutativity
| of addition on natural numbers is definitely _not_ how you
| normally write these proofs in Lean.
|
| Here is a closer approximation to how someone would actually
| write the proof in Lean: lemma add_comm (a b :
| N) : a + b = b + a := begin induction a with a
| IH, rw (add_zero b), rw (zero_add b),
| trivial, rw (add_succ b a), rw (succ_add a b),
| rw IH, trivial end
| prisonguard wrote:
| thank you for explaining, but do i need a special keyboard
| just to type out the natural number type?
| nodogoto wrote:
| [dead]
| deadbeef57 wrote:
| No, you can just type `\nat` and the Lean extension in
| VScode will turn it into `N`. Similarly, you can type many
| LaTeX macros, and the will render in unicode. Examples:
| `\times` becomes `x` and `\to` becomes `-`, etc...
| mcguire wrote:
| Brute force string search: /*@ //
| There is a partial match of the needle at location loc in the
| // haystack, of length len. predicate
| partial_match_at(char *needle, char *haystack, int loc, int
| len) = \forall int i; 0 <= i < len ==> needle[i] ==
| haystack[loc + i]; */ /*@ //
| There is a complete match of the needle at location loc in the
| // haystack. predicate match_at(char *needle, int n,
| char *haystack, int h, int loc) = \valid(needle +
| (0 .. n-1)) && 0 <= n < INT_MAX && \valid(haystack
| + (0 .. h-1)) && 0 <= h < INT_MAX && n <= h && loc
| <= h - n && partial_match_at(needle, haystack, loc,
| n); */ /*@ requires
| \valid(needle + (0 .. n-1)) && 0 <= n < INT_MAX;
| requires \valid(haystack + (0 .. h-1)) && 0 <= h < INT_MAX;
| requires n <= h; assigns \nothing; ensures -1
| <= \result <= (h-n); behavior success:
| ensures \result >= 0 ==> match_at(needle, n, haystack, h,
| \result); behavior failure: ensures
| \result == -1 ==> \forall int i; 0 <= i < h ==>
| !match_at(needle, n, haystack, h, i); */ int
| brute_force (char *needle, int n, char *haystack, int h)
| { int i, j; /*@ loop
| assigns i, j; loop invariant 0 <= i <= (h-n) + 1;
| loop invariant \forall int k; 0 <= k < i ==>
| !match_at(needle, n, haystack, h, k); */
| for (i = 0; i <= h - n; ++i) { /*@
| loop assigns j; loop invariant 0 <= j <= n;
| loop invariant partial_match_at(needle, haystack, i, j);
| */ for (j = 0; j < n && needle[j] == haystack[j +
| i]; ++j); if (j >= n) { return i;
| } } return -1; }
| ratmice wrote:
| One thing which perhaps is not captured well in the article, is
| that rather unlike when reading/writing source code, where
| stepping through lines of code is a relatively uncommon task
| only done debugging. For interactive theorem provers it is a
| given. He discusses the bottom up & top-down approach to the
| solving goals with tactics & otherwise, but typically in both
| cases you will be stepping through expressions to get a better
| understanding. They generally aren't written to be read from
| just the source text alone.
| nodogoto wrote:
| [dead]
| tsimionescu wrote:
| I don't think you have any chance of approaching formal methods
| if the notation is a stumbling block.
|
| The difficulty and verbosity of formal proofs is huge even with
| the very concise mathematical notation. And learning the
| notation itself is a very small stumbling block compared to
| learning the actual theorems and logic that you need to use.
|
| So while I am not a fan of mathematical notation for general
| programming (which is often quite un-mathematical in fact)
| formal proofs are a place where this notation absolutely
| shines. I can't even imagine how much uglier the proof above
| would become if you had to replace `[?] (n : N)` with something
| like `for any natural number n`.
| sirwhinesalot wrote:
| Forall (n: Nat)
|
| Is perfectly reasonable.
|
| If conciseness was that important theorem provers would have
| syntax like APL.
| tsimionescu wrote:
| Why is forall any less formal than [?]? Or Nat less formal
| than N? Especially when these are symbols that most will
| have learned in high school?
| sirwhinesalot wrote:
| They're not on my keyboard.
| prisonguard wrote:
| there has to be a better way for a beginner to learn Lean
| even with the faintest high school mathematics recollection.
|
| I can still do theorem proving in Idris excluding the
| mathematical notation and still learn concepts such as
| totality, covering, equality etc
| ykonstant wrote:
| A better way than _what_? The snippet you posted has
| nothing to do with how a beginner (or intermediate) learns
| Lean.
| Kranar wrote:
| I sympathize with your view, I genuinely do, but to write out
| formal methods using natural language expressions results in
| quite a few problems. For one, it results in incredibly long
| proofs, like this simple proof written out in natural language
| would take up pages.
|
| Second, it makes it harder to identify patterns. Writing things
| out in a formalism lets you identify syntactic patterns and
| employ techniques like factoring, cancelling, seeing symmetries
| and other purely syntactic properties. When you write things
| out in natural language you lose the ability to see these
| things clearly. Being able to see purely formal syntactic
| properties helps reinforce your confidence in a proof or can
| help you identify potential flaws in it. It's like with
| programming styles and conventions, one thing I tell developers
| is an important reason for choosing one style over another is
| if it makes wrong code look wrong visually.
|
| Third, and this is just a personal matter... it results in a
| type of culture of writing that I find is often pretentious and
| full of unnecessary baggage. When I read an academic paper I
| cringe at how bad most writing is and try my best to skip to
| the formal part of the paper.
| miloignis wrote:
| That snippet is the result of printing out the internal
| structure of the proof, not how the proof was actually written.
| You'll notice that what a human actually wrote and would
| normally read is the code block mentioned directly above in
| TFA, which is much more sensible: lemma
| nat.add_comm : [?] n m : N, n + m = m + n | n 0 :=
| eq.symm (nat.zero_add n) | n (m+1) := suffices
| succ (n + m) = succ (m + n), from eq.symm (succ_add m
| n) > this, congr_arg succ (add_comm n m)
|
| Also, this example proof is explicitly about math (the
| commutativity of addition on natural numbers), so it's
| appropriate and not surprising that it uses mathematical
| notation.
| nh23423fefe wrote:
| > if we
|
| Who is this we? The people who understand mashed with the
| ignorant who put in no work?
|
| > to democratise
|
| So that those in the know can be further diluted with dabblers
| who contribute nothing and complain about their ignorance?
| flir wrote:
| _raises hand_ I 'd like more tools that involve me putting in
| no work, please. They help me achieve my real-world goal
| faster.
| prisonguard wrote:
| making thing obscure and less obvious is anti-intellectual,
| its getting harder and harder for new developers to gain a
| footing in software development
|
| also why tools like chatGPT have gained so much popularity
| nh23423fefe wrote:
| 3 assertions with no implications
| UncleEntity wrote:
| Yep, like the olden days where the only people "in the know"
| knew Latin. Or, before that, Greek.
|
| Then someone had to come along and invent the printing press.
| Joker_vD wrote:
| > Or, before that, Greek.
|
| Greek (or koine, for that matter) was pretty much street
| vernacular in the Roman/Byzantine empire during that period
| you're likely talking about.
| mcguire wrote:
| I'm not sure if the author is here, or if my comment attempt was
| successful. So, can I suggest you take a look at a third leg of
| the formal methods stool?
|
| If you are familiar with C, check out Frama-C
| (https://frama-c.com/) and the WP and RTE plugins. The approach
| is based on Tony Hoare and EWD's axiomatic semantics
| (https://en.wikipedia.org/wiki/Hoare_logic). It does not have a
| good memory management story, as far as I know, but is very good
| for demonstrating value correctness (RTE automatically generates
| assertions for numeric runtime errors, for example) and many
| memory errors.
|
| If you are familiar with Ada, check out SPARK
| (https://www.adacore.com/about-spark), which is similar to
| Frama-C but has a much better interface in the AdaCore GNAT
| toolkit and IDE.
|
| Both work similarly: Assertions in normal Ada or C code as well
| as the code itself are translated into SMT statements and fed to
| a SMT solver to find counterexamples---errors.
|
| I have some blog posts from several years ago about Frama-C:https
| ://maniagnosis.crsr.net/tags/applied%20formal%20logic.h... (And I
| really should get back into it; it's a lot of fun.)
|
| If you are not familiar with Ada or C, Dafny (https://dafny.org/)
| is another option based on .NET and devoleped at Microsoft. It
| seems nigh-on perfect for this approach. (The language uses a
| garbage collector.) At the time I was looking, there was little
| documentation on Dafny, but that seems to have improved.
| tsarfox wrote:
| Sorry to hear you had issues with my comment system. Comments
| are invisible until I manually approve them, but I'm not seeing
| anything new in the database right now. Did you get any sort of
| error code? The system should let you know if e.g., the answer
| to the captcha was wrong.
|
| I do mention SPARK at the end of the article. I'm not familiar
| with Frama-C so I'll check that and your blog posts out. Thanks
| for sharing!
| mcguire wrote:
| Got a blank page.
|
| Sorry I missed the mention of SPARK. TLA+, although I haven't
| fiddled with it as much, is more similar to Alloy than
| anything else. SPARK actually works on Ada code.
| jgeada wrote:
| If I had complete enough specifications for any interesting
| program that formal methods would be interesting, I think I would
| just be easier to hit compile on that spec.
|
| Yes, there are a subset of interesting bugs related to edge
| cases, but the real problematic bugs are when what the software
| does does not match what was expected, even if nobody really
| could articulate what was expected in the first place. The art of
| software engineering isn't the coding, it is asking enough
| questions to know what the actual problem to be solved is.
| commandlinefan wrote:
| > I think I would just be easier to hit compile on that spec
|
| I have a hard time not being similarly skeptical about formal
| methods. They promise the ultimate holy grail of software
| development - release software with zero defects - but demand
| what appears to be years of study before you actually see
| results. On the one hand, there's lots of examples of
| techniques and tactics that promise amazing results if you "put
| in the work", even outside of software (like fitness and music)
| - but in over 30 years of software development practice, I've
| never met anybody who was applying formal methods at all, much
| less to achieve defect-free software. If it could deliver what
| it promised even with a "mere" five years of dedicated study,
| I'd expect it to be more widespread by now.
| Animats wrote:
| As someone who was working on this decades ago [1][2], here's
| a useful recap.
|
| - Back then, CPU power for the prover was a big problem.
| That's been fixed. We were too early in the days of a 1 MIPS
| VAX.
|
| - Prover theory is much better. We were using the Oppen-
| Nelson prover, the original SAT system. Now that's routine
| technology.
|
| - Loose languages are a problem. You have to nail down all
| "undefined behavior", and either detect it and forbid it, or
| give it precise semantics.
|
| - Integrate the verification annotation into the programming
| language. _Not as comments_. It should be at least syntax and
| type checked when the program is compiled, and it should use
| essentially the same syntax as the language to which it is
| attached. Otherwise the annotations get out of date and
| nobody uses them.
|
| - Break big assertions into lots of little assertions. Don't
| AND them. This improves the diagnostics.
|
| - Debug the verification in the source language, by adding
| asserts. The verification system usually treats asserts as a
| proof goal, and assumes they are true for the code that
| follows. You narrow down the problem into a small area in
| that way. If you need to prove something hard, get to the
| point where you have "assert(a); assert(b);", and need to
| prove that A implies B. Then you use an offline prover. Don't
| work on the code problems in a prover directly.
|
| - You're not done until you get 100%. If you write
| "assert(false);", there is only one error but the
| verification is totally meaningless.
|
| - Don't get carried away with the formalism. Verification
| systems tend to be built by people who think formalism is
| cool. That is a negative for getting work done.
|
| - Undecidability and the halting problem are not issues. If
| your program is anywhere near undecidable, it's broken.
| Microsoft took the position with their Static Driver Verifier
| that if the verifier can't decide termination easily, it
| doesn't get to be a signed kernel driver.
|
| - Some things are hard to specify, and some things aren't. A
| database is an example of a complicated system that's not too
| hard to specify. The specification is a full table search of
| giant arrays. The implementation doesn't do it that way, but
| it's supposed to behave as if it does. The other extreme
| would be a GUI program.
|
| [1] http://www.animats.com/papers/verifier/verifiermanual.pdf
|
| [2] https://github.com/John-Nagle/pasv
| cscurmudgeon wrote:
| I have met tons of folks who have used this in practice. It
| tends more applied in domains where errors can be costly.
| E.g., this won't be that much useful in the JS/NodeJS.
|
| https://www.amazon.science/publications/how-amazon-web-
| servi...
|
| https://link.springer.com/chapter/10.1007/978-3-642-34281-3_.
| ..
|
| > They promise the ultimate holy grail of software
| development - release software with zero defects
|
| Thats a straw man. It is just another tool.
| noisenotsignal wrote:
| Years of study are definitely not required to get up and
| running! I attended a 3 day workshop for TLA+ and spent ~2
| weeks modeling a new project that my team was working on. I
| found lots of potential edge cases that would result in data
| inconsistencies without even having to write tests or think
| about how to articulate the issue in the first place, as the
| model checker is able to output the exact program flow that
| results in the error.
|
| You still need to make sure your implementation matches the
| spec, but that's an easier task than squashing concurrency
| bugs, let alone figuring out how to repro in the first place!
| The hope is that by writing the spec you avoid a good chunk
| of errors from the start instead of encountering them one by
| one.
| mcguire wrote:
| One advantage of formal methods is in determining "what was
| expected" (including all the goofy edge cases) without having
| to burrow into the details of code.
|
| Take a look at Alloy (http://alloytools.org/) and TLA+
| (https://lamport.azurewebsites.net/tla/tla.html) for example.
| (Or even the ancient Z ("Zed") notation
| (https://www.cs.cmu.edu/~15819/zedbook.pdf)).
| miloignis wrote:
| You don't need to prove everything against a complete spec, you
| can prove that things have particular properties you care
| about, from higher level behaviors like user data can only be
| accessed by the user down to there are no out of bounds array
| accesses.
|
| You could prove some complex data-structure to be a correct
| implementation of some logical data-structure and then prove
| that your service won't deadlock.
|
| There are a ton of interesting, impactful properties to prove
| that aren't "create a complete specification and prove the
| implementation refines it", and even that usage I think is more
| impactful for removing bugs and vulnerabilities than you're
| giving it credit for - even something as simple as sorting,
| Java had a bug in their Timsort implementation for a decade
| which was discovered when trying to verify it with formal
| methods: http://envisage-project.eu/wp-
| content/uploads/2015/02/sortin... ). Note too that if you
| compiled the simplest/easiest-to-understand spec for something
| like sorting you'd get something like bubble sort, not Timsort,
| though you are much more likely to want the latter.
| drewcoo wrote:
| I think the interesting part isn't in a single component's edge
| cases at all, but in how complex webs of many, many things
| interact.
|
| I've yet to see a hard debugging problem that didn't boil down
| to something like, in the best of possible worlds:
|
| "component A expected something to meet promise FOO when it
| calls component B, because that's obvious, right?" combined
| with
|
| "component B just passes requests and routes results, dude,
| component C is messed up" combined with
|
| "component C never promised that behavior and if you look at
| our spec it even says so!"
|
| That's an example of your "does not match what was expected"
| case.
|
| I think most of formal methods is an academic waste of time.
| Whether they take a more formal or a more practical approach, I
| see high value in contract tests and data flow analysis because
| they're more likely to catch the bugs that cross Conway's
| code/org boundaries.
| rsrsrs86 wrote:
| Formal methods have their applications and specific use
| cases. It all depends on how much is at stake, economically.
|
| I suggest checking into Alloy, which is far easier to work
| with. It feels like writing a SQL schema, but you get bounded
| model checking and much more.
| guai888 wrote:
| formal methods has value in safety related software. When you
| are developing software for airplane, rail, automobile and
| nuclear power station, any tools that might improve safety
| merit consideration.
| mcguire wrote:
| That's actually what many formal tools excel at. Of course,
| you have to do have to define "never promised that behavior"
| which is extra work that nobody wants to do.
| Animats wrote:
| Yes. That's "design by contract", which is extremely useful
| for determining who's at fault.
|
| This came out of aerospace, where it's taken seriously. If
| A won't interoperate with B, you check the interface spec.
| If A is out of compliance, A has to be fixed. If B is out
| of compliance, B has to be fixed. If you can't tell, the
| interface spec has to be fixed. This is why you can unplug
| Pratt and Whitney engines from an airplane and plug in
| Rolls-Royce engines.
|
| This worked in the era when the buyer had more clout than
| the seller. Now, if A won't interoperate with B, it's the
| problem of whomever is smaller.
| tomcam wrote:
| > This is why you can unplug Pratt and Whitney engines
| from an airplane and plug in Rolls-Royce engines.
|
| Wait wut? Are you talking about their embedded software
| or the hardware itself? Because if you're talking about
| the hardware, I'm about to be dumbfounded that there are
| formal verifications of hardware, my mind is about to be
| blown.
| mcguire wrote:
| You could think of many of the formal tools outside of
| dependently typed programming as design by contract plus
| a thingy that complains if the contract doesn't work.
| im_down_w_otp wrote:
| This kind of, "Just what the heck in this pile of parts is
| impacting the other stuff and how is the other stuff
| impacting this part?" was precisely the angle we took with
| our V&V tools.
| jgeada wrote:
| Yes, this!
|
| What would be helpful is if formal methods were powerful
| enough to be able to prove contracts and assertions about
| expectations and invariants; however, all formal systems I've
| experienced are not powerful enough to do this at scale in
| any interesting program, not without spending way more time
| reconstructing the entire system in the formal world, with
| the attendant likelihood of introducing spec bugs in that
| reconstruction.
|
| It is entirely different in the world of hardware, where the
| specification in many cases is the RTL and the thing being
| proved is the, supposedly identical, physical implementation.
| [deleted]
| rqtwteye wrote:
| I wonder if AI systems like ChatGPT will be helpful in this area.
| Something that's able to track all requirements of a system and
| understands the code enough to validate it against the
| requirements.
| markusde wrote:
| Yeah, I'm cautiously excited about how AI and FM might work
| together. I don't think LLM's can ever be trusted to verify
| programs itself, but anything which can reduce the annotation
| overhead for programmers is a super useful thing!
| rqtwteye wrote:
| I don't think a pure LLM approach will work (or maybe it
| will, who knows?). I am more thinking of a hybrid that
| combines LLM or other AI with some hardcoded knowledge for
| this domain
| Taikonerd wrote:
| I've had the same thought. Formally proven code can give
| powerful security assurances (cf. Project Everest[1]), but it's
| also very, very labor-intensive. I've heard rules of thumb
| like, "100x as much time to prove the software correct as to
| write it in the first place."
|
| If LLM systems are going to give us a virtual army of
| programmers, I think formally proving more systems software
| (device drivers, browser engines, etc.) would be a great use of
| their time.
|
| [1]: https://www.microsoft.com/en-us/research/blog/project-
| everes...
| AlotOfReading wrote:
| I've had extremely poor results attempting to get current
| LLMs to generate proofs, for code or otherwise.
| rqtwteye wrote:
| I think it probably needs to be a specialized combination
| of LLM and hardcoded knowledge.
| ogogmad wrote:
| Combining it with AutoGPT (or the ideas it's based on) and
| a formal prover like Lean might be the answer. Have you
| tried?
|
| Basically, if you don't allow GPT to iteratively write,
| execute, criticise and then correct its code, you won't get
| good results.
| cartoonfoxes wrote:
| Most definitely. I've been playing with using ChatGPT to
| generate proof texts in Isabelle/HOL, since it lets me verify
| the correctness of the output before code generation.
| stfutechbros wrote:
| [dead]
| SamvitJ wrote:
| When writing about a new topic (e.g., formal methods), it's
| generally better to keep other aspects of the presentation (e.g.,
| code samples) simple and familiar to the reader.
|
| Would have preferred the code samples to not be in Lisp
| (ostensibly also new to the reader, if formal methods are), and
| also more sparingly used.
| whatever1 wrote:
| I like there is us who like formal methods and guarantees of
| correctness for our code.
|
| And then there are the others who now generate approximately code
| with large language models that mostly runs.
|
| They are winning.
| cgio wrote:
| It's a repeat of search & hope/trust (Google) vs define and
| query (semantic web) and yes they are winning and will win.
| Fine tuning your scope for extreme accuracy might be less
| efficient a strategy than fine tuning your loader for extreme
| speed
| mcguire wrote:
| In most situations, a half-assed answer quickly will a beat a
| good answer slowly.
|
| Most.
|
| " _" Software should be reliable."_"
|
| Reliability is not usually worth the effort. Most software is
| not used enough, or not important enough, to make any effort
| worthwhile. In almost all systems, errors are created at
| every step and propagate through the system routinely without
| causing problems.
| jderick wrote:
| We use formal at Apple to verify hardware. We are always looking
| for good people in this area.
| rsrsrs86 wrote:
| Do you have more to share? I use Alloy at work and research and
| would love to learn about how Apple uses FM.
| jderick wrote:
| Apple is pretty tight lipped about revealing the details of
| our work. Formal is a bit of a niche area, but you can find
| some info online.
|
| Here is a course taught at UT with some slides from guest
| lecturers from industry that looks like a decent overview:
|
| https://www.cerc.utexas.edu/~jaa/verification/
|
| Also, the main conference in the area is FMCAD, you can find
| a lot of related work there. More recently the conference has
| moved more towards software, but if you look at some of the
| older proceedings you can find a lot of hardware related
| stuff.
|
| https://www.fmcad.org/
|
| There is decent amount of overlap between formal for hardware
| and software, so if you study one you will likely have enough
| background to get started in the other.
| ajb wrote:
| Not at apple: The following is about 10 years out of date,
| would be interesting to know what's changed:
|
| 10 years ago formal had just started to be viable in run of
| the mill semiconductor verification (as opposed to throwing a
| bunch of specialist PhD grads at the problem); Jasper Gold
| was the first tool I heard of that worked well enough. pre-
| formal, the approach was to write a bunch of tests with input
| stimulators and validity checkers and/or assertions, and then
| work on the stimulators until you reached 100%coverage
| (signing off the cases that couldn't happen). That's not just
| line coverage, but each branch of an if, each component of a
| boolean expression must toggle, each bit of a signal must
| toggle, etc.
|
| Reaching 100% coverage was a tedious exercise so the first
| use of formal was as a sumplement, to find the last,most
| difficult test cases that completed your coverage (or prove
| that the cases were impossible). The next use was to try and
| prove directly that assertions or test failures could not
| occur, or provide a counterexample. this is all based on the
| tool being able to read low level verilog or systemC code.
|
| What it couldn't do at the time, which would be interesting
| to know if it can now, is as follows: Often, you actually
| have a higher level model of the behavior of your system. The
| RTL code is written to have the same behavior (manually,
| because it takes expertise to choose RTL code that will work
| efficiently). Ideally, provers would simply prove that your
| RTL has the same behavior.
| ta1748 wrote:
| This is the easiest way to get an idea for what is happening
| and what experience is needed:
|
| https://jobs.apple.com/en-
| us/search?search=formal&sort=relev...
|
| There are several different niches, and HW formal is likely
| using different tools and methods than SW formal, though
| there is overlap. Anyways there is room for a wide range of
| experience/skills/background in a variety of different areas.
___________________________________________________________________
(page generated 2023-04-10 23:01 UTC)