[HN Gopher] GenAI-Accelerated TLA+ Challenge
___________________________________________________________________
GenAI-Accelerated TLA+ Challenge
Author : lemmster
Score : 30 points
Date : 2025-05-06 17:54 UTC (5 hours ago)
(HTM) web link (foundation.tlapl.us)
(TXT) w3m dump (foundation.tlapl.us)
| Taikonerd wrote:
| Using LLMs for formal specs / formal modeling makes a lot of
| sense to me. If an LLM can do the work of going from informal
| English-language specs to TLA+ / Dafny / etc, then it can hook
| into a very mature ecosystem of automated proof tools.
|
| I'm picturing it something like this:
|
| 1. Human developer says, "if a user isn't authenticated, they
| shouldn't be able to place an order."
|
| 2. LLM takes this, and its knowledge of the codebase, and turns
| it into a formal spec -- like, "there is no code path where
| User.is_authenticated is false and Orders.place() is called."
|
| 3. Existing code analysis tools can confirm or find a
| counterexample.
| omneity wrote:
| A fascinating thought. But then who verifies that the TLA+
| specification does indeed match the human specification?
|
| I'm guessing using an LLM as a translator narrows the gap, and
| better LLMs will make it narrower eventually, but is there a
| way to quantify this? For example how would it compare to a
| human translating the spec into TLA+?
| justanotheratom wrote:
| maybe run it through few other LLMs depending on how much
| confidence you need - o3 pro, gemini 2.5 pro, claude 3.7,
| grok 3, etc..
| svieira wrote:
| Then you need to be able to formally prove the equivalence
| of various TLA+ programs (maybe that's a solved problem?)
| omneity wrote:
| No idea about SOTA but naively it doesn't seem like a
| very difficult problem:
|
| - Ensure all TLA+ specs produced have the same
| inputs/outputs (domains, mostly a prompting problem and
| can solved with retries)
|
| - That all TLA+ produce the same outputs for the same
| inputs (making them functionally equivalent in practice,
| might be computationally intensive)
|
| Of course that assumes your input domains are countable
| but it's probably okay to sample from large ranges for a
| certain "level" of equivalence.
|
| EDIT: Not sure how that will work with non-determinism
| though.
| justanotheratom wrote:
| I didn't mean generate separate TLA programs. Rather,
| other LLMs review and comment on whether this TLA program
| satisfies the user's specification.
| Taikonerd wrote:
| A fair question! I'd say it's not that different from using
| an LLM to write regular code: who verifies that the code the
| LLM wrote is indeed what you meant?
| fmap wrote:
| The usual way to check whether a definition is correct is to
| prove properties about it that you think should hold. TLA+
| has good support for this, both with model checking as well
| as simple proofs.
| frogmeister57 wrote:
| It makes a lot of sense only for graphics card sales people.
| For everyone else with a working neuron the sole idea is utter
| nonsense.
| max_ wrote:
| Leslie Lamport said that he invented TLA+ so people could "think
| above the code".
|
| It was meant as a tool for people to improve their thinking and
| description of systems.
|
| LLM generation of TLA+ code is just intellectual masterbation.
|
| It may get the work done for your boss. But you intellect will
| still remain bald -- in which case you are better off not writing
| TLA+ at all.
| warkdarrior wrote:
| > [TLA+] was meant as a tool for people to improve their
| thinking and description of systems.
|
| Why the speciesism? Why couldn't LLMs use TLA+ by translating a
| natural-language request into a TLA+ model and then checking it
| in TLA+?
| jjmarr wrote:
| Not the OP, but I would rather give a formal specification of
| my system to an AI and have it generate the code.
|
| I believe the point is it's easier for a human to verify a
| system's correctness as expressed in TLA+ and verify code
| correctly matches the system than it is to correctly verify
| the entire code as a system at once.
|
| Then, if my model of the system is flawed, TLA+ will tell me.
|
| I'm an AI bull so if I give the LLM a natural language
| description, I'd like the LLM to explain the model instead of
| just writing the TLA+ code.
| frogmeister57 wrote:
| Using generative chatbots to write a formal spec is the most
| stupid idea ever. Specs are all about reasoning. You need to do
| the thinking to model the system in a very simplified manner.
| Formal methods and the generative BS are at the antipodes of
| reliability. This is an insult to reason. Please keep this
| nonsense away from the serious parts of CS.
___________________________________________________________________
(page generated 2025-05-06 23:01 UTC)