[HN Gopher] Formal Methods: Just Good Engineering Practice? (2024)
___________________________________________________________________
Formal Methods: Just Good Engineering Practice? (2024)
Author : aiono
Score : 141 points
Date : 2025-01-10 15:25 UTC (7 hours ago)
(HTM) web link (brooker.co.za)
(TXT) w3m dump (brooker.co.za)
| ot wrote:
| Previous discussion (Jun 2024):
| https://news.ycombinator.com/item?id=40753989
| aiono wrote:
| Thanks, I couldn't find it via Google search on this website
| for some reason.
| onionisafruit wrote:
| I'm glad you couldn't if that would have kept you from
| posting it today.
| aiono wrote:
| Yeah I wasn't going to post it if it was shared before. But
| I am happy that some are happy that it's reposted.
| ot wrote:
| Nothing wrong with reposts, it's just useful to link to
| previous discussions for context :)
| ChrisArchitect wrote:
| Google search? Use the HN search on site here: https://hn.alg
| olia.com/?q=https%3A%2F%2Fbrooker.co.za%2Fblog...
| franktankbank wrote:
| Too slow, planning==ossification any documentation can and will
| be used against you in the Agile court of law.
| intelVISA wrote:
| Arguably, if 'real' Agile is ever found Formal Methods would be
| the antithesis of it: something provable and reproducible is
| blasphemy to True Believers.
| AnimalMuppet wrote:
| Not at all. Something _that you can 't easily change_ is
| terrifying.
|
| Reproducible? Sure, that's what unit tests are for. Make a
| change, prove that you didn't break any behavior that anybody
| relied on.
|
| But if you have to do a three-month-long formal proof run
| because the specification had a one-line change, then you're
| not agile, under any meaningful definition.
|
| (Where did three months come from? Thin air. I don't know how
| long a true formal proof would take. Depends on how many
| things you're proving, how long your spec is, and how much
| CPU power you have. I would think, though, to formally prove
| significant properties of a large code base would take a
| significant time.)
| franktankbank wrote:
| Sorry, we are discussing Agile not agile.
| constantcrying wrote:
| Formal verification of software, as the article acknowledges,
| relies heavily on the type of software and the development
| process.
|
| To use formal verification you need to have _formal requirements_
| of the behavior of your software. Most software projects and
| design philosophies are simply incompatible with this.
|
| In software development and design can often fall together, but
| that means that it is uniquely ill suited for formal methods. If
| you are developing, before you are sure what you even want, then
| formal methods do not apply.
|
| But I agree that there are certain areas, mostly smaller, safety
| critical, systems, which rely on upfront specifications, which
| can heavily benefit from formal verification. E.g. Aerospace
| software relies heavily on verification.
| aiono wrote:
| I think it's more formalisability than requiring upfront
| design. For example you may have an insurance claim automation
| system which you can't design upfront because most insurance
| providers have unspecified behaviour. But that doesn't mean you
| can refine your automation system as you get more information
| from interactions. You would still get the benefit that you
| ensure not leaving out any cases or not having any
| contradiction in your system
| constantcrying wrote:
| But that still relies on having a prior notion of the formal
| requirements of your system. I know little about insurance
| systems, but deriving a formal specification seems like a
| nightmare task. Although, as you mentioned, if you had a
| partial one you certainly would get some benefits from it.
| aiono wrote:
| It's been a while since I worked on those systems, but you
| usually decide on some rules that you refine over time. So
| they are purely logical decisions that you can formalize. I
| don't see why it would be hard to specify. I don't mean to
| specify all up front but one can specify the exact decision
| procedure that is implemented right now very easily in my
| experience. Generally you have a state machine representing
| the process.
|
| This state machine is usually embedded in the code, but
| code has a lot of noise around the state machine that makes
| it harder to see the underlying state machine.
| bassp wrote:
| It's not all or nothing!
|
| I work on a very "product-y" back end that isn't fully
| specified, but I have formally specified _parts_ of it.
|
| For instance, I property-based-tested a particularly nasty
| state machine I wrote to ensure that, no matter what kind of
| crazy input I called an endpoint with, the underlying state
| machine never made any invalid transitions. None of the code
| around the state machine has a formal spec, but because the
| state machine does, I was able to specify it.
|
| In the process, I found some very subtle bugs I'd have never
| caught via traditional unit testing.
| constantcrying wrote:
| Completely agree, but obviously you relied on the state
| machine being sufficiently separate and having a formal
| specification for it.
|
| State machines are quite common, in aerospace software, where
| the requirements even specify the states and transitions. If
| you can formally verify them, I believe, you absolutely
| should, as often there can be a lot of subtlety going on
| there, especially if you have a distributed system of state
| machines sending messages to one another.
| senderista wrote:
| Property-based testing is also effective in finding bugs even
| in the absence of any formal model. Basically you just need
| to identify informal invariants ("properties"), encode them
| as asserts, and then run tests with enough coverage to likely
| find any invariant violations.
| trending486 wrote:
| You are going to have requirements whether you like it or not.
| It doesn't matter if you discover them during requirements
| engineering and validate and deconflict them on a simple text
| document, or if you discover them as you go during coding
| (possibly after having coded the wrong thing), or if the client
| discovers them during the "sprint review" of some allegedly
| agile cult. The only difference is how much money and time are
| you willing to trade for being called "agile". Ironically the
| traditional way of doing a requirements stage is the less
| expensive of all three options. It is also the most aligned
| with the original agile spirit, since it converges with the
| client as soon as posible, at the point where changes are the
| most cheap (changing a line on a text document).
| agentultra wrote:
| It hasn't been my experience that it is as niche as this. I
| believe the "costs," people refer to in these discussions have
| come way down over the last couple of decades. I've taught
| developers how to use tools like TLA+ and Alloy in week.
|
| It's not a skill that requires a PhD and years of research to
| acquire these days.
|
| Nor does writing a basic, high level specification.
|
| If anything you will learn something about the system you are
| modelling by using a model checker. And that can be useful even
| if it is used for documentation or teaching.
|
| The fundamental power of formal methods is that they force you
| to think things through.
|
| All too often I find software developers eager to believe that
| they can implement concurrent algorithms armed with their own
| wit, a type checker, and a smattering of unit tests. It can be
| humbling to find errors in your design and assumptions after
| using a model checker. And perhaps it's hubris that keeps
| programmers from using such tools in more "mundane" and "real
| world" contexts.
|
| There are a lot more "small" distributed systems than you'd
| expect and state spaces are generally larger than you'd
| anticipate if you weren't open to formalizing your work.
| constantcrying wrote:
| I would say that the "nicheness" depends on how you treat
| software. Your development process and software architecture
| are engineering choices you make and these engineering
| choices affect how well formal specification applies.
|
| I didn't talk about "costs" or about "how hard" it is, but
| that common practices in software development make using
| formal methods infeasible. If you want to use formal
| verification you likely have to change how you develop
| software. In an environment where there is general
| uncertainty about architecture and behavior, as is common in
| agile environments, formal verification is difficult to
| implement. By its nature, formal verification discourages
| fast iteration.
| staunton wrote:
| > By its nature, formal verification discourages fast
| iteration.
|
| Not necessarily. You can develop the specification and
| implementation in parallel. That way, at every point in
| time, you could have a much more thorough and deep
| understanding of what you're currently building, as well as
| how exactly you decide to change it.
| aleph_minus_one wrote:
| > I've taught developers how to use tools like TLA+ and Alloy
| in week.
|
| TLA+:
|
| - https://en.wikipedia.org/wiki/TLA%2B
|
| - https://lamport.azurewebsites.net/tla/tla.html
|
| Alloy:
|
| - https://alloytools.org/
|
| -
| https://en.wikipedia.org/wiki/Alloy_(specification_language)
| gte525u wrote:
| Do you have any resources you could link to - for those that
| are curious?
| bassp wrote:
| Not the OP, but Hillel Wayne's course/tutorial
| (https://www.learntla.com/) is fantastic. It's focused on
| building practical skills, and helped me build enough
| competence to write a few (simple, but useful!) specs for
| some of the systems I work on.
| billfruit wrote:
| Alloy is a brute force model checker (for rather small
| models).
|
| Is that the state of the art for formal methods? How do you
| think of formally verifying systems with floating points
| calculations, with randomness, with databases and external
| environments?
| agentultra wrote:
| > Is that the state of the art for formal methods?
|
| I think the state of the art is quite broad as there are
| many tools.
|
| Model checking is, in my estimation, the most useful for
| industry programmers today. It doesn't require as much
| training to use as automated theorem proving, it doesn't
| provide the guarantees that mathematical induction and
| proof do; but you get far more for your buck than from
| testing alone.
|
| However model checkers are very flexible because the
| language they use, math, is also very flexible. TLA+ has
| been used to find errors in the Xbox 360 memory controller
| before it went to launch as well as in Amazon's S3 [0] --
| errors so complex that it would have been highly improbable
| that a human would detect them and the result of finding
| those errors and solving them saved real businesses a lot
| of money.
|
| It comes down to your ability to pick good levels of
| abstraction. You can start with a very high level
| specification that admits no details about how the file
| system works or the kernel syscalls involved and still get
| value out of it. If those details _do_ matter there's a
| strategy called _refinement_ where you can create another
| specification that is more specialized and write an
| expression that implies if the refined specification holds
| then the original specification does as well.
|
| Tools for randomness and numerical methods exist and
| depending on your needs you may want to look at other tools
| than _just_ a model checker. TLA+, for example, isn 't good
| at modelling "amounts" as it creates a rather large state
| space quickly; so users tend to create abstractions to
| represent these things with a finite number of states.
|
| [0] https://lamport.azurewebsites.net/tla/formal-methods-
| amazon....
| umvi wrote:
| Seems like it would really slow you down though if you adopt
| it too early. Sometimes you don't even know if the thing you
| want to do is possible.
|
| My development style is:
|
| - prototype ideas in a quick and dirty dynamic typed language
| just to gain confidence it's going to work (a lot of ideas
| die here)
|
| - rewrite in a more performant and safe language with a "type
| checker and a smattering of unit tests" (usually here I'm
| "done" and have moved onto the next idea/task. If there's an
| issue I fix it and add another test case)
|
| I'm trying to imagine where formal verification comes in. I'm
| imagining something like:
|
| - prototype ideas in a quick and dirty dynamic typed language
| just to gain confidence it's going to work (a lot of ideas
| die here)
|
| - Formally model the requirements
|
| - rewrite in a language that can be formally verified and
| which is hopefully performant and lets me do things like simd
| and/or cuda if needed
|
| - Never have to fix a bug unless there was a bug in the
| requirements (?)
|
| To me, it just seems like it would take an order of magnitude
| longer to develop things this way for not much benefit (I've
| traded development time and potentially runtime performance
| in exchange for correctness)
| adsharma wrote:
| https://adsharma.github.io/agentic-transpilers/
|
| Long way to go. But there is a design.
|
| Use z3/SMT instead of TLA+
| pnathan wrote:
| What's your recommendation on how to rapidly learn TLA+? I
| spent some time staring at references and the UI a few months
| ago and came away very defeated. But I'd like to actually
| level up here.
| ajmurmann wrote:
| For distributed systems this makes sense. Most people aren't
| writing distributed system components though but things where
| the risk usually isn't technical like business software. I
| worked on a project where I got into arguments with the PMs
| because I pushed back on optimizing performance on the main
| page of our pre-launch product. I argued that we don't have
| any real feedback that the page is what is needed and the PM
| thought I was insane for doubting the main page was needed.
| We completely redesigned that page twice, deleted it entirely
| and then had to bring it back because the sales team liked it
| for initial demos.
|
| Every process is a tradeoff and it anyways depends on your
| specific circumstances which choice is best for your team and
| project.
| agentultra wrote:
| It does depend a lot on circumstance and context.
|
| Is it absolutely important that your system is correct? ...
| begs the question, _correct with respect to what?_
| Generally: a specification.
|
| There are lots of situations where you don't know what your
| system is supposed to do, where testing a few examples is
| sufficient, or it's not terribly important that you know
| that it does what you say it ought to. Generating a batch
| report or storing some customer responses in a database?
| Trust the framework, write a few tests if you need to,
| nobody is going to find a formal specification valuable
| here.
|
| However, if you need to deploy configuration to a cluster
| and need to ensure there is at least two nodes with the a
| version of the configuration that matches the database in
| the load balancer group at all times during the migration?
| Need to make sure the migration always completes and never
| leaves the cluster in a bad state?
|
| Even smaller in scale: need to make sure that references in
| your allocator don't contain addresses outside of your
| memory pool? Need to make sure that all locks are
| eventually released?
|
| It's definitely much faster to iterate on a formal
| specification first. A model checker executes your model
| against the entire state space. If you're used to test-
| driven development or working in a statically typed
| language, this is useful feedback to get early on in the
| design process.
|
| What the scope is that is appropriate for using tools like
| this is quite large and not as niche as some folks imply. I
| don't do aerospace engineering but I've used TLA+ to model
| deployment scripts and find bugs in OpenStack deployments,
| as well as to simply learn that the design of certain async
| libraries are sound.
|
| _Update_ : more examples.
| aidenn0 wrote:
| Everything you said about formal verification is also true
| about tests; do you think software is also uniquely ill suited
| for TDD?
| constantcrying wrote:
| I did not say that software is uniquely ill suited for formal
| verification. That also would be total nonsense.
|
| I said that certain philosophies of software design are
| uniquely unsuited for formal verification.
|
| Besides, tests and formal verification are different. A test
| is essentially a formal specification for a single point or,
| depending on how you test, multiple, potentially random,
| points. Writing or changing a test is less time intensive
| than writing or changing a full formal specification for an
| entire subsystem, therefore tests are more suited to volatile
| software development practices.
| staunton wrote:
| > therefore tests are more suited to volatile software
| development practices.
|
| And having _no_ tests is even more suited...
| hinkley wrote:
| I worked on a project where a couple people formally proved our
| design sound while I was still fixing bugs in the spec. We
| don't have spherical cows. Theoretical models aren't reality.
| The map is not the territory.
| billfruit wrote:
| Dijkstra for example didn't think so, I think. He had a din
| view of testing as compared with proofs for ensuring software
| correctness.
| pron wrote:
| "Formal" means "written in a language that can be interpreted
| by a computer" and that is the very thing programmers do.
| Writing a code _is_ writing a formal specification of the
| program 's behaviour, and by definition, every piece of
| software must do that.
|
| But you're right that to benefit from any formal methods, you
| need to compare the program's behaviour to something else that
| isn't the program itself, and that other thing also needs to be
| written in a formal language. To be able to write that other
| thing, you do need to have a precise understanding of what the
| wanted behaviour should be so that you could express it in a
| formal language, but it doesn't have to cover the full
| behaviour of the software.
|
| As an example of what that means, take automated (unit) tests.
| Automated tests _are_ a formal specification, and running them
| _is_ a formal verification method. Tests are a relatively weak
| specification, and executing them is a relatively weak
| verification, compared to what we normally mean by "formal
| methods", but there is no clear qualitative difference --
| conceptual or even practical -- between tests and more powerful
| formal specification and verification. You can think of these
| practices as more powerful tests or as tests that work
| differently, but tests _are_ formal specifications, and if they
| 're applicable to a piece of software, it's likely that richer
| formal specification methods are, too (learning the cost/benfit
| of other methods is similar to learning the cost/benefit of
| tests -- you don't get it right at first, but you learn along
| the way).
| akoboldfrying wrote:
| >Automated tests are a formal specification, and running them
| is a formal verification method.
|
| This is a great way to describe it, and an important concept
| to grasp.
|
| Another kind of formal verification that exists between
| "standard" unit tests and model checking/automated theorem
| proving is property-based testing a la quickcheck: You have a
| function you want to test (e.g., a sort), you describe the
| properties of inputs ("a list of integers") and state the
| properties you want to hold of the output ("every number
| should appear the same number of times in the output as in
| the input, and every number should be >= the number to its
| left") and ask the system to generate and test lots of random
| inputs for you. These properties can often be used more or
| less directly in model checkers, which makes the checking
| exhaustive (on some small, finite domain).
| PhilipRoman wrote:
| You can also use proof assistants as "unit tests" for entire
| classes of behaviors rather than specific values. This lets you
| add proofs incrementally even when a formal spec is too
| difficult.
| senderista wrote:
| I'm not sure that "aerospace software relies heavily on
| verification" is very commonly the case at all, even if one
| would like it to be...
| kmoser wrote:
| What makes you say that? I'm pretty sure it's one of the more
| heavily verified industries. If it wasn't, air travel would
| be much more dangerous.
| nickpsecurity wrote:
| That's true. Back when they were inventing security, the
| requirement for high-security systems (A1 under TCSEC) required
| formal verification. Those who did it said its cost ranged from
| not too burdensome to very high. What's the reason?
|
| While many requirements existed, the design assurance required
| specifying what problem was being solved, how it's being
| solved, and the safety/security properties. Then, proving that
| the security properties were embodied in the artifacts in all
| states. That problem is difficult enough that most people who
| succeeded were also geniuses good at programming and math.
| Geniuses are in short supply.
|
| The lessons learned papers also pointed out a recurring
| problem. Each requirement, security policy, algorithm, etc
| needed a mathematical specification. Then, proof techniques
| were necessary that could handle each. Instead of developing
| software features, you are now both developing software and
| doing R&D on applied mathematics.
|
| Steve Lipner's The Ethics of Perfection pointed out it took
| several quarters to make a major change on a product. The
| markets value development velocity over quality, even demand
| side. So, that's already a no go for most businesses. Likewise,
| the motivations of OSS developers work against high assurance
| practices as well.
|
| If you want formal verification, then you are forced to make
| certain decisions. For one, you need to build simple systems
| that are easy to verify, from design to language. Then, you use
| the easiest, most-automated tools for the job. Most drop full
| verification to do automated, partial checking: TLA+, Alloy,
| SPARK Ada, Meta's Infer, etc.. If doing full verification, it's
| better to make or build on reusable components like we see done
| with CompCert and seL4. GEMSOS and VAX VMM advocated that back
| in the day, too.
|
| So, most software development isn't fit for use of formal
| verification. I think we can default on a combo of contracts
| (specs), automated testing, static analysis, and safe
| languages. If distributed, stronger consistency when possible.
| Then, use CI to run the verification on each commit. Some tools
| can generate patches, too.
| zozbot234 wrote:
| Static type checking is a kind of formal verification of
| software - there are formal requirements (the program doesn't
| go "wrong" in a number of rigorously defined ways) that are
| automatically checked. And you can certainly do "design and
| development" together in type-safe languages.
| commandlinefan wrote:
| I see this line of reasoning about formal methods a lot: software
| is big and complicated and hard to get right... therefore formal
| methods.
|
| On the one hand, I _want_ this to be true both for selfish and
| practical reasons: selfishly because I'm very very good at
| learning things that require an academic learning approach (read
| a book, work some exercises, practice) and if something I'm good
| at is important, that means more money for me. Practically
| because they're right, software _is_ big and complicated and hard
| to get right and as a practitioner, it's really frustrating when
| it does fail and I'm scrambling to figure out why.
|
| On the other hand, though, nobody ever seems to make a compelling
| case for how formal methods proposes to solve that problem. This
| author actually does better than most by pointing out how most
| modern "design" is a waste of time but doesn't really go into why
| TLA, say, is better than (demonstrably mostly useless) UML.
| There's sort of an implied suggestion that if you dedicate a few
| months (or years?) to learning TLA, you'll reach enlightenment
| and understand how it's helpful in a way that's impossible to
| articulate to the unenlightened. And that's not impossible to
| imagine! Calculus and Bayesian statistics are kind of like that;
| you need to really understand them before you can really see why
| they're useful.
|
| I always find myself left applying what I call "project manager"
| logic - I need to make a snap decision right now about whether or
| not I should invest time in this "formal method" stuff or not so
| I have to be observational: if it was really that helpful, more
| people would be applying it and the benefits would speak for
| themselves. They've been around a long, long time, though, and
| never caught on - it's hard not to conclude that there's probably
| a reason.
| hitchstory wrote:
| >if it was really that helpful, more people would be applying
| it and the benefits would speak for themselves. They've been
| around a long, long time, though, and never caught on - it's
| hard not to conclude that there's probably a reason.
|
| My impression is that there are actually not that many business
| domains where a large investment in time and money to get
| domain logic correctness from 98% to 99.99% correct is actually
| called for.
|
| Formal methods _are_ a large investment, too. No two ways
| around it.
|
| Also, while they havent really caught on in general, some of
| their ideas have made it into modern type systems.
| t43562 wrote:
| IMO UML is useless because whatever diagram gets produced means
| different things to different people and it's very complex but
| not check-able so people can make UML diagrams that are self
| contradictory or nonsense.
|
| I find myself using a "method" of some kind when faced with a
| problem that's hard to think about. A communications protocol -
| nice to have a state machine to describe it for example. TLA
| obviously fits that niche even better. I've been lucky enough
| not to have too many problems recently that felt like they
| justified that effort but it's of incredible value when one
| does. Domain Specific Languages - so much better to use a
| parser framework than hand-code if you want to avoid all sorts
| of undesirable outcomes.
|
| Currently most of my rework comes from changes to the
| requirements and our "customers" not really knowing what they
| want till we give them something and they say "not that."
|
| This is partly because the people asking for things don't fully
| think out all the implications of what they're asking. It's
| mostly about not having enough knowledge in one place to make
| good decisions on.
| bee_rider wrote:
| I only am familiar with formal verification in the context of a
| hardware class, which is like programming but the cost:benefit
| is wildly different (can't fix a physical chip after it has
| been fabricated very easily) and the types of designs are very
| different.
|
| But, the impression I got was that the rigors of the formal
| verifier would sort of impose a limit on the complexity of the
| design just based on... completing in a reasonable timeframe
| and in a reasonable amount of memory space. Maybe the true
| victory of demanding formal verification would be fixing:
|
| > software is big and complicated and hard to get right
|
| By making big complex programs a pain to work with, haha.
| hinkley wrote:
| If we're going to boil this frog, we need to steal wisdom from
| TLA, not teach it. Type systems have borrowed a lot from
| Hinley-Milner, and are themselves a formal, partial proof.
|
| I think I'd like to see a descendent of Property Based Testing
| that uses SAT or TLA techniques to rapidly condense the input
| space in a repeatable fashion. We should be able to deduce
| through parsing and code coverage that passing 12 to a function
| can never follow different branches than 11, but that -1 or
| 2^17 < n < 2^32 might.
| sesm wrote:
| For Calculus you can easily explain to the unenlightened that
| many physical and engineering laws work in terms of speed and
| acceleration of measurable values, and Calculus is a
| mathematical framework for working with this.
| thefaux wrote:
| Most of the articles I've read about formal methods feel like
| lead gen for consultants. That's fine but feels obnoxious when
| they implicitly act as though they have reached formal methods
| induced enlightenment that you can too if you buy a pack of
| trainings for your employees/coworkers or hire me as an employee
| to fix your bad (irresponsibly dangerous even!) programming
| habits.
|
| Get back to me when the formal methods actually generate high
| quality code that cannot deviate from the spec.
| erpellan wrote:
| How about
| https://en.wikipedia.org/wiki/SPARK_(programming_language)
| lolinder wrote:
| Ada is terminally underappreciated.
| aiono wrote:
| > Get back to me when the formal methods actually generate high
| quality code that cannot deviate from the spec
|
| That would be useful but there is a fundamental issue: code is
| too specific. In formal specification you usually don't specify
| that detailed but you specify the general behaviour of the
| system. So usually the specification corresponds to a lot of
| programs with subtle differences. That's why code a
| documentation falls short: you don't know what is intentional
| and what is just a random choice. It's simply too specific to
| describe high level requirements.
|
| The other way around (verification of program with respect to
| the specification) can be more feasible to implement.
| noelwelsh wrote:
| I would guess a majority of developers use formal methods these
| days. We just tend to call them type systems, and for some reason
| consider them a distinct category. If simulations count as formal
| methods, then tests, and particularly property-based testing,
| also make the cut.
| jcgrillo wrote:
| The lightweight formal methods callout is a good one. Maintaining
| a suite of proptest[1] strategies alongside the codebase is not a
| very much larger investment than writing unit tests by hand, but
| the insights they provide due to extensive coverage and compact
| understandable failure cases are way better. And crucially this
| approach _does_ align with normal software development practices.
|
| [1] https://crates.io/crates/proptest
| andrepd wrote:
| Yeah but proptest / qcheck is not formal methods at all. It's
| randomised testing.
| jcgrillo wrote:
| The assertion is that they're _lightweight_ formal methods.
| Or is the article (and the proceedings of SOSP '21 it links
| to) wrong?
|
| EDIT: ah I see where there might be confusion--obviously a
| library for generating random test data and making assertions
| about code under test itself doesn't constitute anything like
| "formal methods" but the idea of _using_ that library in the
| way described in the paper linked from the article does. But
| that 's kinda always the thing about software libraries..
| pfdietz wrote:
| Property tests are informal tests of formal properties. They
| don't guarantee the properties hold, or that the formal
| properties are complete, but they exploit the existence of
| these formal properties.
|
| Once you have formal properties, you can write property-based
| tests using them, and I wonder how much of the benefit of
| formal methods could be obtained just by doing this. It's
| another example of using increased computing power (testing)
| to substitute for expensive hand labor (proving theorems).
|
| I'll also observe that even theorem proving systems benefit
| from a kind of property based testing. If there's a goal to
| prove the existence of a value satisfying some property, this
| is essentially a property based testing problem. Similarly,
| find a counterexample to a universally quantified formula
| (also an existential problem) can be used to prune off
| unproductive branches of a search tree.
| jcgrillo wrote:
| There's something also in the UX dynamics. As a developer
| writing property based tests I'm encouraged to think much
| more carefully about system invariants, otherwise there's
| not much value added over unit tests. For anything
| nontrivial this entails building a model of the system and
| checking it against the system under test, like they did at
| AWS. So the decision to use this tool shapes how you think
| about the system--it makes you reason more formally about
| it rather than just winging it and writing tests to
| exercise the code.
| Tryk wrote:
| With an infinite domain (e.g.numbers) randomised testing is
| necessary, no?
| jillesvangurp wrote:
| > Maintaining a suite of proptest[1] strategies alongside the
| codebase is not a very much larger investment than writing unit
| tests by hand,
|
| I actually generate a unit tests with LLMs a lot lately. They
| do a decent job and you can just ask it to be a little more
| exhaustive, test any edge cases it can think of, or instruct it
| to deal with specific ones. I know a fair bit about how to
| write good tests and the effort you can put in that. But LLMs
| can generate better tests than me way faster.
|
| If anything, they are less likely to do a half-assed job of it
| than me because I tend to run out of patience doing repetitive
| tedious shit. This is a healthy trait to have for a software
| engineer: we are supposed to make stuff easier by automating
| it. If it feels repetitive, your reflex should be to write code
| to make it less repetitive. Documentation is the same. I
| generate that too these days and since it is so easy, I do it
| more often and sooner.
|
| LLMs might trigger a minor revolution in the adoption of formal
| method verification. Generating a correct specification is a
| combination of tedious and probably relatively easy for an LLM
| given enough context like working code, documentation, and
| other hints as to what the thing should do.
|
| I'd be a lot more likely to bother with that stuff if I can
| just let it generate specification and then briefly glance
| through them than if I have to spell everything out manually.
|
| I think using Rust kind of signals that you care about
| correctness. It's compiler is probably the closest to proving
| your system is probably correct that you can get without
| resorting to formal methods. And probably a lot easier than
| trying to bolt on formal methods to languages that don't even
| use a compiler or explicitly specified types.
| trending486 wrote:
| Modern formal methods like TLA+ and Alloy are as easy to pick up
| as Python, and other than having to write a spec (an ultra-
| simplified model of part of a system) they are completely
| automatic (based on model checkers). There is no reason for a
| modern software engineer not to have them on her radar. As a
| matter of fact most of the cloud systems you are using everyday
| have been verified with modern formal tools: Azure Cosmos DB,
| Dynamo DB, MongoDB, CockroachDB, ... and many others.
| tombert wrote:
| I haven't done enough with Alloy to speak with any degree of
| competence with it, so I'll only speak on TLA+.
|
| While I do think TLA+ is relatively easy to pick up (especially
| compared to Isabelle or Coq), and I think it's pretty awesome,
| I'm hesitant to say it's as "easy to pick up as Python". You
| need a basic understanding of set theory and state machines to
| even get started, and more advanced concepts like algorithm
| refinement to get into the really useful juicy stuff.
|
| When I've tried to push TLA+ at work, the mathey syntax is
| usually a non-starter for most of my engineering colleagues. I
| don't think my coworkers are stupid by any stretch, but they
| are decidedly uninterested in re-learning any discrete math (if
| they ever learned it in the first place, which isn't a
| guarantee). For an engineer LARPing as an academic like me,
| TLA+'s notation isn't really hard at all, but I will often
| forget that most engineers _only_ think in code.
| Taikonerd wrote:
| _> When I 've tried to push TLA+ at work, the mathey syntax
| is usually a non-starter_
|
| Would they be interested in Quint?[0] If I understand
| correctly, Quint is very similar to TLA+, but with a more
| "code-looking" notation. And it has some nice dev tooling,
| like an LSP, which also lowers the barrier to entry.
|
| [0]: https://github.com/informalsystems/quint
| tombert wrote:
| Possible, I haven't tried it.
|
| It's still an uphill battle, because the comparison to unit
| tests always comes up, and I have to explain how they're
| not equivalent.
| adsharma wrote:
| Why not use python itself then? The design by contract PEP is
| old. Not sure if anyone is working on it.
|
| https://adsharma.github.io/agentic-transpilers/
|
| https://adsharma.github.io/pysmt
| dragonwriter wrote:
| A 21-year-old deferred PEP is probably safely described as
| dead with only the most distant hope that it might one day be
| revived, or, more likely replaced with an entirely different
| PEP addressing a similar domain with a new approach.
| AnimalMuppet wrote:
| Good engineering practice is to use the _appropriate_ level of
| rigor. It depends on what the cost of failure is, and what the
| cost of the rigor is.
| brap wrote:
| I'm not super experienced with formal verification, but I did dip
| my toes in it a few times.
|
| My impression is that it's far from a magic bullet. Writing
| formal specs is basically like writing the code/tests just
| differently. And the more it covers the more it becomes the same
| thing. And it suffers from the same problems.
|
| My conclusion every time was that the code itself _is_ the formal
| spec and the formal spec _is_ the code.
|
| By analogy with construction, the code is both the building and
| the blueprint.
| davidmurdoch wrote:
| This. In my very limited experience (i didn't write the code or
| specs), I've seen the runtime code find more bugs in the formal
| specs than the formal specs finding bugs in the runtime code.
| IshKebab wrote:
| > My conclusion every time was that the code itself is the
| formal spec and the formal spec is the code.
|
| Yes you can end up with tautological specs, where it's more or
| less a copy of the code. E.g. you aren't going to benefit much
| from formally verifying `max()`.
|
| But there are _many many_ cases where the formal specification
| is _much much_ simpler and more obviously correct than the
| actual code. The classic examples are:
|
| * Any kind of encoding / decoding transformation has the
| property `decode(encode(x)) == x`.
|
| * Sorting: any sorting algorithm should result in a sorted
| array (forall i, j: i < j --> array[i] <= array[j])
| nimish wrote:
| Formal methods work great when the price of failure is absolute.
| Mostly pointless otherwise but can be a good exercise I guess.
| jillesvangurp wrote:
| With most testing and verification, there's a law of
| diminishing returns. It helps you find stuff that you need to
| fix and there is always stuff to find. But at some point you've
| found enough of the stuff that needed fixing that you can use
| the software and it starts making money for you. Most people
| stop there. It's not going to make much more money if you
| continue your efforts and the risk of a lot of financial damage
| is usually not that high. A good software license will ensure
| that. You might be better off paying a decent lawyer than
| wasting time on formal methods. Lawyers aren't cheap. But
| neither is having your software engineering team faff about
| with a lot of complex tools for weeks on end.
|
| And with software you can just do an update if something is
| found later. Not a big deal usually. There are exceptions of
| course. With hardware things get more expensive. But still,
| judging from the state of e.g. most bluetooth and other
| hardware I've ever owned, the barrier of good enough is pretty
| low there too. Mostly things work and you can usually work
| around minor issues when they don't.
|
| Some, software justifies/requires going above and beyond doing
| testing. Especially if it controls critical hardware. I've
| never worked on such stuff. And even there the notion of
| releasing often and breaking stuff by testing it seems to be
| catching on. For example SpaceX is doing agile rocket
| development. They launch starship every few months until they
| get it reliable enough to launch things into orbit.
| aiono wrote:
| I doubt that although I agree that it's much more useful when
| cost of a failure is higher. For example I work in a lab that
| formalizes requirements and we have real customers that pay us
| for formalization because they find it useful. Some products
| are things that a failure could cause injury or even possibly
| death. But not all systems have that high costs and they still
| see benefits.
| synchronousq wrote:
| I just want to note, there exist two main flavors of formal
| methods: extrinsic techniques, which are disjoint from the code
| itself and generally reason about the specifications of code, and
| intrinsic techniques, which are inline with the code itself and
| reason about the code more directly. Historically, intrinsic
| techniques (such as type systems) reason about code at a
| functional level, while extrinsic techniques (such as decidable
| model checkers like Spin/P) reason about a model of the code,
| ascribed to formalism like an automata. But imo we're currently
| in a complete golden age of formal methods research, and
| extrinsic techniques are falling out of flavor in comparison to
| intrinsic methods as pushed by type system advancements and
| projects like Verus [1].
|
| [1] https://github.com/verus-lang/verus
| hashxyz wrote:
| I don't see how the distinction makes any sense when the Verus
| project you linked requires you to write correctness specs. It
| sounded like intrinsic techniques were preferred because they
| would not require you to write and maintain a separate spec,
| but this is not the case.
| lolinder wrote:
| I prefer intrinsic techniques because it prevents the model
| from being out of sync with the implementation.
|
| The thing that's never made any sense to me about using a
| marble checker for anything but concurrency issues (which are
| hard enough to warrant it) is that once you've validated your
| model you have to go actually implement it, and that's
| usually the most error prone part of the process.
|
| If the correctness spec has to be written manually but
| prevents you from diverging from the spec in your
| implementation, that's a huge step up from extrinsic model
| checkers.
| adsharma wrote:
| TLA+ and the like work because they target a very small
| specification language.
|
| With the large footprint of Rust, I've seen questions raised
| about how this will work. But haven't seen good answers.
|
| Would love to read more.
| begueradj wrote:
| For those interested in the information exchange about this same
| article: https://news.ycombinator.com/item?id=40753989
| markusde wrote:
| I'd recommend anyone with a passing interest in the role formal
| techniques can play in software development watch this [1] talk.
| Mike Dodds is a principal scientist at Galois (a company which
| has a lot of experience with applying formal methods in industry
| and government) and the talk does a good job at explaining where
| they've seen value-added from formal methods, and the right kind
| of formal methods for different applications.
|
| [1]: https://www.youtube.com/watch?v=gfvvowAc130
| Animats wrote:
| The article is thin on specifics.
|
| Some problems specify well. A database is an example. A spec for
| a database can be written in terms of exhaustive search, running
| the query against everything. Now show that an efficient database
| yields the same output.
| IshKebab wrote:
| IMO formal software verification is still waaay too difficult to
| be worth it in all but the most extreme cases. That's really
| different to formal hardware verification where it is a no-
| brainer.
|
| I keep trying to learn it, but you need to be a real expert. Like
| "I wrote the compiler" level expert for most systems.
|
| For example I tried to prove a varint encoder/decoder. It worked
| for one or two bytes, but not more. Asking for help the reason
| was that the compiler internally only unrolls loops 5 times, or
| some random internal detail like that that you could never really
| hope to know.
|
| I've been learning Lean recently, and ... I mean I like it, but
| if you learn it you're going to encounter documentation like
| this:
|
| > Definitional equality includes e-equivalence of functions and
| single-constructor inductive types. That is, fun x => f x is
| definitionally equal to f, and S.mk x.f1 x.f2 is definitionally
| equal to x, if S is a structure with fields f1 and f2. It also
| features proof irrelevance, so any two proofs of the same
| proposition are definitionally equal. It is reflexive, symmetric,
| and a congruence.
|
| And that's not really a knock on Lean - it seems to have some of
| the better documentation out of the alternatives.
___________________________________________________________________
(page generated 2025-01-10 23:00 UTC)