[HN Gopher] Use Prolog to improve LLM's reasoning
       ___________________________________________________________________
        
       Use Prolog to improve LLM's reasoning
        
       Author : shchegrikovich
       Score  : 134 points
       Date   : 2024-10-13 21:22 UTC (4 days ago)
        
 (HTM) web link (shchegrikovich.substack.com)
 (TXT) w3m dump (shchegrikovich.substack.com)
        
       | pjmlp wrote:
       | So we are back to Japanese Fifth Generation plan from 1980's. :)
        
         | thelastparadise wrote:
         | Watson did it too, a while back.
        
         | tokinonagare wrote:
         | Missing some LISP but yeah it's funny how old things are new
         | again (same story with wasm, RISC archs, etc.)
        
           | nxobject wrote:
           | Lots of GOFAI being implemented again - decision trees, goal
           | searching and planning, agent-based strategies... just not
           | symbolic representations, and that might be the key. I figure
           | you might get an interesting contribution out of skimming old
           | AI laboratory publications and seeing whether you could find
           | a way of implementing it through a single LLM, multiple LLM
           | agents, methods of training, etc.
        
             | anthk wrote:
             | https://en.m.wikipedia.org/wiki/Constraint_satisfaction_pro
             | b...
        
         | linguae wrote:
         | This time around we have all sorts of parallel processing
         | capabilities in the form of GPUs. If I recall correctly, the
         | Fifth Generation project envisioned highly parallel machines
         | performing symbolic AI. From a hardware standpoint, those
         | researchers were way ahead of their time.
        
           | nxobject wrote:
           | And they had a self-sustaining video game industry too... if
           | only someone had had the wild thought of implementing
           | perceptrons and tensor arithmetic on the same hardware!
        
         | postepowanieadm wrote:
         | and winter is coming.
        
         | metadat wrote:
         | For the uninitiated (like me):
         | 
         |  _The Japanese Fifth Generation Project_
         | 
         | https://www.sjsu.edu/faculty/watkins/5thgen.htm
        
       | sgt101 wrote:
       | Building on this idea people have grounded LLM generated
       | reasoning logic with perceptual information from other networks :
       | https://web.stanford.edu/~joycj/projects/left_neurips_2023
        
       | a1j9o94 wrote:
       | I tried an experiment with this using a Prolog interpreter with
       | GPT-4 to try to answer complex logic questions. I found that it
       | was really difficult because the model didn't seem to know Prolog
       | well enough to write a description of any complexity.
       | 
       | It seems like you used an interpreter in the loop which is likely
       | to help. I'd also be interested to see how o1 would do in a task
       | like this or if it even makes sense to use something like prolog
       | if the models can backtrack during the "thinking" phase
        
         | lukasb wrote:
         | I bet one person could probably build a pretty good synthetic
         | NL->Prolog dataset. ROI for paying that person would be high if
         | you were building a foundation model (ie benefits beyond being
         | able to output Prolog.)
        
         | hendler wrote:
         | I also wrote wrote an LLM to Prolog interpreter for a hackathon
         | called "Logical". With a few hours effort I'm sure it could be
         | improved.
         | 
         | https://github.com/Hendler/logical
         | 
         | I think while LLMs may approach completeness here, it's good to
         | have an interpretable system to audit/verify and reproduce
         | results.
        
       | baq wrote:
       | Patiently waiting for z3-guided generation, but this is a
       | welcome, if obvious, development. Results are a bit surprising
       | and sound too optimistic, though.
        
       | nonamepcbrand1 wrote:
       | This is why GitHub CodeQL and Co-Pilot assistance is working
       | better for everyone? basically codeql uses variant of Prolog
       | (datalog) to query source code to generate better results.
        
       | z5h wrote:
       | i've come to appreciate, over the past 2 years of heavy Prolog
       | use, that all coding should be (eventually) be done in Prolog.
       | 
       | It's one of few languages that is simultaneously a standalone
       | logical formalism, and a standalone representation of
       | computation. (With caveats and exceptions, I know). So a Prolog
       | program can stand in as a document of all facts, rules and
       | relations that a person/organization understands/declares to be
       | true. Even if AI writes code for us, we should expect to have it
       | presented and manipulated as a logical formalism.
       | 
       | Now if someone cares to argue that some other language/compiler
       | is better at generating more performant code on certain
       | architectures, then that person can declare their arguments in a
       | logical formalism (Prolog) and we can use Prolog to translate
       | between language representations, compile, optimize, etc.
        
         | dmead wrote:
         | It's taken ages for anything from functional programming to
         | penetrate general use. Do you think uptake of logic stuff will
         | be any faster?
        
           | johnnyjeans wrote:
           | Prolog (and logic programming in general) is much older than
           | you think. In fact, if we take modern functional programming
           | to have been born with John Backus' Turing Award
           | presentation[1], then it even predates it.
           | 
           | Many advancements to functional programming were implemented
           | on top of Prolog! Erlang's early versions were built on top
           | of a Prolog-derived language who's name escapes me. It's the
           | source of Erlang's unfamiliar syntax for more unlearned
           | programmers. It's very much like writing Prolog if you had
           | return values and no cuts or complex terms.
           | 
           | As for penetrating general use, probably not without a major
           | shift in the industry. But it's a very popular language just
           | on the periphery, even to this day.
           | 
           | [1] - https://dl.acm.org/doi/10.1145/359576.359579
        
         | cmrdporcupine wrote:
         | So why Prolog in particular and not another logic language like
         | Mercury or Oz/Mozart etc?
        
           | infradig wrote:
           | It's not meant to be taken literally, it refers to any
           | language of logic programming". Apologies to Monty Python.
        
         | tomcam wrote:
         | Is it your thought that for the average programmer Prolog is
         | easier to read and maintain than say Go, C#, or Java?
        
       | anthk wrote:
       | Use Constraint Satisfaction Problem Solvers. It commes up with
       | Common Lisp with ease.
        
       | mise_en_place wrote:
       | I really enjoyed tinkering with languages like Prolog and Coq.
       | Interactive theorem proving with LLMs would be awesome to try
       | out, if possible.
        
       | fsndz wrote:
       | This is basically the LLM modulo approach recommended by Prof.
       | Subbarao Kambhampati. Interesting but only works mostly for
       | problems that have some math/first degree logic puzzle at their
       | heart. Will fail at improving perf at ARC-AGI for example...
       | Difficult to mimic reasoning by basic trial and error then hoping
       | for the best: https://www.lycee.ai/blog/why-sam-altman-is-wrong
        
       | YeGoblynQueenne wrote:
       | That's not going to work. Garbage in - Garbage out is success-set
       | equivalent to Garbage in - Prolog out.
       | 
       | Garbage is garbage and failure to reason is failure to reason no
       | matter the language. If your LLM can't translate your problem to
       | a Prolog program that solves your problem- Prolog can't solve
       | your problem.
        
         | Philpax wrote:
         | This is a shallow critique that does not engage with the core
         | idea. Specifying the problem is not the same as solving the
         | problem.
        
       | arjun_khamkar wrote:
       | Would Creating a prolog dataset would be beneficial, so that
       | future LLM's can be trained on it and then they would be able to
       | output prolog code.
        
       | DeborahWrites wrote:
       | You're telling me the seemingly arbitrary 6 weeks of Prolog on my
       | comp sci course 11yrs ago is suddenly about to be relevant? I did
       | not see this one coming . . .
        
         | fullstackwife wrote:
         | Is there any need to look at this generated Prolog code?
        
       | UniverseHacker wrote:
       | I think this general idea is going to be the key to really making
       | LLMs widely useful for solving real problems.
       | 
       | I've been playing with using GPT-4 together with the Wolfram
       | Alpha plugin, and the combo of the two can reliably solve
       | difficult quantitative problems that neither can individually by
       | working together, much like a human using a calculator.
        
       | de6u99er wrote:
       | I always thought that Prolog is great for reasoning in the
       | semantic web. It doesn't surprise me that LLM people stumble on
       | it.
        
       | bytebach wrote:
       | An application I am developing for a customer needed to read
       | constraints around clinical trials and essentially build a query
       | from them. Constraints involve prior treatments, biomarkers, type
       | of disease (cancers) etc.
       | 
       | Using just an LLM did not produce reliable queries, despite
       | trying many many prompts, so being an old Prolog hacker I
       | wondered if using it might impose more 'logic' on the LLM. So we
       | precede the textual description of the constraints with the
       | following prompt:
       | 
       | -------------
       | 
       | Now consider the following Prolog predicates:
       | 
       | biomarker(Name, Status) where Status will be one of the following
       | integers -
       | 
       | Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3
       | Amplified = 4 Deleted = 5 Positive = 6 Negative = 7
       | 
       | tumor(Name, Status) where Status will be one of the following
       | integers if know else left unbound -
       | 
       | Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression =
       | 4
       | 
       | chemo(Name)
       | 
       | surgery(Name) Where Name may be an unbound variable
       | 
       | other_treatment(Name)
       | 
       | radiation(Name) Where Name may be an unbound variable
       | 
       | Assume you are given predicate atMost(T, N) where T is a compound
       | term and N is an integer. It will return true if the number of
       | 'occurences' of T is less than or equal N else it will fail.
       | 
       | Assume you are given a predicate atLeastOneOf(L) where L is a
       | list of compound terms. It will succeed if at least one of the
       | compound terms, when executed as a predicate returns true.
       | 
       | Assume you are given a predicate age(Min, Max) which will return
       | true if the patient's age is in between Min and Max.
       | 
       | Assume you have a predicate not(T) which returns true if
       | predicate T evaluates false and vice versa. i.e. rather than
       | '\\\\+ A' use not(A).
       | 
       | Do not implement the above helper functions.
       | 
       | VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise
       | use ';' to represent 'OR'. i.e. rather than 'A ; B' use
       | atLeastOneOf([A, B]).
       | 
       | EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT
       | and wildtype EGFR. Patient must not have mutated KRAS.
       | 
       | EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2),
       | biomarker('EGFR', 0), not(biomarker('KRAS', 1))
       | 
       | ------------------
       | 
       | The Prolog predicates, when evaluated generate the required
       | underlying query (of course the Prolog is itself a form of
       | query).
       | 
       | Anyway - the upshot was a vast improvement in the accuracy of the
       | generated query (I've yet to see a bad one). Somewhere in its
       | bowels, being told to generate Prolog 'focused' the LLM. Perhaps
       | LLMs are happier with declarative languages rather than
       | imperative ones (I know I am :) ).
        
       | gorkempacaci wrote:
       | The generated programs are only technically Prolog programs. They
       | use CLPFD, which makes these constraint programs. Prolog programs
       | are quite a bit more tricky with termination issues. I wouldn't
       | have nitpicked if it wasn't in the title.
       | 
       | Also, the experiment method has some flaws. Problems are hand-
       | picked out of a random subset of the full set. Why not run the
       | full set?
        
       | ianbicking wrote:
       | I made a pipeline using Z3 (another prover language) to get LLMs
       | to solve very specific puzzle problems:
       | https://youtu.be/UjSf0rA1blc (and a presentation:
       | https://youtu.be/TUAmfi8Ws1g)
       | 
       | Some thoughts:
       | 
       | 1. Getting an LLM to model a problem accurately is a significant
       | prompting exercise. Bridging casual logical statements and formal
       | logic is difficult. E.g., "or" statements in English usually mean
       | "xor" in logic.
       | 
       | 2. Domains usually have their own language expectations. I was
       | doing Zebra puzzles (https://en.wikipedia.org/wiki/Zebra_Puzzle)
       | and they have a very specific pattern and language. I don't think
       | it's fair to really call it intuitive or even entirely
       | unambiguous, it's something you have to learn. The LLM has to
       | learn it too. They have seen this kind of puzzle (and I think
       | most can reproduce the original Zebra puzzle from memory), but
       | they lack a really firm familiarity.
       | 
       | 3. Arguably some of the familiarity is about contextualizing the
       | problem, which is itself a prompting task. People don't naturally
       | solve Zebra puzzles that we find organically, it's something we
       | encounter in specific contexts (like a puzzle book) which is not
       | so dissimilar from prompting.
       | 
       | 4. Incidentally Claude Sonnet 3.5 has a substantial lead. And GPT
       | o1 is not much better than GPT 4o. In some sense I think o1 is a
       | kind of self-prompting, an attempt to create its own context; so
       | if you already have a well-worded prompt with instructions then
       | o1 isn't that good at improving performance over 4o.
       | 
       | 5. A lot of the prompting is really intended to slow down the
       | LLM, to keep it from jumping to conclusions or solving a task too
       | quickly (and incorrectly). Which again is a case of the prompt
       | doing what o1 tries to do generally.
       | 
       | 6. I'm not sure what tasks call for this kind of logical
       | reasoning. Not that I don't think they exist, I just don't know
       | how to recognize them. Planning tasks? Highly formalized and
       | artificially constructed problems don't seem all that
       | interesting... and the whole point of adding an LLM to the
       | process is to formalize the informal.
       | 
       | 7. Perhaps it's hard to see because real-world problems seldom
       | have conveniently exact solutions. But that's not a blocker...
       | Prolog (and Z3) can take constraints as a form of elimination,
       | providing lists of possible answers, and maybe just reducing the
       | search space is enough to move forward on some kinds of problems.
       | 
       | 8. For instance when I give my pipeline really hard Zebra
       | problems it usually doesn't succeed; one bug in one rule will
       | kill the whole thing. Also I think the LLMs have a hard time
       | keeping track of large problems; a context size problem, even
       | though the problems don't approach their formal context limits.
       | But I can imagine building the pipeline so it also tries to mark
       | low-confidence rules. Given that I can imagine removing those
       | rules, sampling the resulting (non-unique, sometimes incorrect)
       | answers and using that to revisit and perhaps correct some of
       | those rules.
       | 
       | Really I'd be most interested to hear thoughts on where this
       | logic programming might actually be applied... artificial puzzles
       | are an interesting exercise, but I can't really motivate myself
       | to go too deep.
        
       ___________________________________________________________________
       (page generated 2024-10-17 23:00 UTC)