[HN Gopher] GenAI-Accelerated TLA+ Challenge
       ___________________________________________________________________
        
       GenAI-Accelerated TLA+ Challenge
        
       Author : lemmster
       Score  : 30 points
       Date   : 2025-05-06 17:54 UTC (5 hours ago)
        
 (HTM) web link (foundation.tlapl.us)
 (TXT) w3m dump (foundation.tlapl.us)
        
       | Taikonerd wrote:
       | Using LLMs for formal specs / formal modeling makes a lot of
       | sense to me. If an LLM can do the work of going from informal
       | English-language specs to TLA+ / Dafny / etc, then it can hook
       | into a very mature ecosystem of automated proof tools.
       | 
       | I'm picturing it something like this:
       | 
       | 1. Human developer says, "if a user isn't authenticated, they
       | shouldn't be able to place an order."
       | 
       | 2. LLM takes this, and its knowledge of the codebase, and turns
       | it into a formal spec -- like, "there is no code path where
       | User.is_authenticated is false and Orders.place() is called."
       | 
       | 3. Existing code analysis tools can confirm or find a
       | counterexample.
        
         | omneity wrote:
         | A fascinating thought. But then who verifies that the TLA+
         | specification does indeed match the human specification?
         | 
         | I'm guessing using an LLM as a translator narrows the gap, and
         | better LLMs will make it narrower eventually, but is there a
         | way to quantify this? For example how would it compare to a
         | human translating the spec into TLA+?
        
           | justanotheratom wrote:
           | maybe run it through few other LLMs depending on how much
           | confidence you need - o3 pro, gemini 2.5 pro, claude 3.7,
           | grok 3, etc..
        
             | svieira wrote:
             | Then you need to be able to formally prove the equivalence
             | of various TLA+ programs (maybe that's a solved problem?)
        
               | omneity wrote:
               | No idea about SOTA but naively it doesn't seem like a
               | very difficult problem:
               | 
               | - Ensure all TLA+ specs produced have the same
               | inputs/outputs (domains, mostly a prompting problem and
               | can solved with retries)
               | 
               | - That all TLA+ produce the same outputs for the same
               | inputs (making them functionally equivalent in practice,
               | might be computationally intensive)
               | 
               | Of course that assumes your input domains are countable
               | but it's probably okay to sample from large ranges for a
               | certain "level" of equivalence.
               | 
               | EDIT: Not sure how that will work with non-determinism
               | though.
        
               | justanotheratom wrote:
               | I didn't mean generate separate TLA programs. Rather,
               | other LLMs review and comment on whether this TLA program
               | satisfies the user's specification.
        
           | Taikonerd wrote:
           | A fair question! I'd say it's not that different from using
           | an LLM to write regular code: who verifies that the code the
           | LLM wrote is indeed what you meant?
        
           | fmap wrote:
           | The usual way to check whether a definition is correct is to
           | prove properties about it that you think should hold. TLA+
           | has good support for this, both with model checking as well
           | as simple proofs.
        
         | frogmeister57 wrote:
         | It makes a lot of sense only for graphics card sales people.
         | For everyone else with a working neuron the sole idea is utter
         | nonsense.
        
       | max_ wrote:
       | Leslie Lamport said that he invented TLA+ so people could "think
       | above the code".
       | 
       | It was meant as a tool for people to improve their thinking and
       | description of systems.
       | 
       | LLM generation of TLA+ code is just intellectual masterbation.
       | 
       | It may get the work done for your boss. But you intellect will
       | still remain bald -- in which case you are better off not writing
       | TLA+ at all.
        
         | warkdarrior wrote:
         | > [TLA+] was meant as a tool for people to improve their
         | thinking and description of systems.
         | 
         | Why the speciesism? Why couldn't LLMs use TLA+ by translating a
         | natural-language request into a TLA+ model and then checking it
         | in TLA+?
        
           | jjmarr wrote:
           | Not the OP, but I would rather give a formal specification of
           | my system to an AI and have it generate the code.
           | 
           | I believe the point is it's easier for a human to verify a
           | system's correctness as expressed in TLA+ and verify code
           | correctly matches the system than it is to correctly verify
           | the entire code as a system at once.
           | 
           | Then, if my model of the system is flawed, TLA+ will tell me.
           | 
           | I'm an AI bull so if I give the LLM a natural language
           | description, I'd like the LLM to explain the model instead of
           | just writing the TLA+ code.
        
       | frogmeister57 wrote:
       | Using generative chatbots to write a formal spec is the most
       | stupid idea ever. Specs are all about reasoning. You need to do
       | the thinking to model the system in a very simplified manner.
       | Formal methods and the generative BS are at the antipodes of
       | reliability. This is an insult to reason. Please keep this
       | nonsense away from the serious parts of CS.
        
       ___________________________________________________________________
       (page generated 2025-05-06 23:01 UTC)