[HN Gopher] Use Prolog to improve LLM's reasoning
___________________________________________________________________
Use Prolog to improve LLM's reasoning
Author : shchegrikovich
Score : 134 points
Date : 2024-10-13 21:22 UTC (4 days ago)
(HTM) web link (shchegrikovich.substack.com)
(TXT) w3m dump (shchegrikovich.substack.com)
| pjmlp wrote:
| So we are back to Japanese Fifth Generation plan from 1980's. :)
| thelastparadise wrote:
| Watson did it too, a while back.
| tokinonagare wrote:
| Missing some LISP but yeah it's funny how old things are new
| again (same story with wasm, RISC archs, etc.)
| nxobject wrote:
| Lots of GOFAI being implemented again - decision trees, goal
| searching and planning, agent-based strategies... just not
| symbolic representations, and that might be the key. I figure
| you might get an interesting contribution out of skimming old
| AI laboratory publications and seeing whether you could find
| a way of implementing it through a single LLM, multiple LLM
| agents, methods of training, etc.
| anthk wrote:
| https://en.m.wikipedia.org/wiki/Constraint_satisfaction_pro
| b...
| linguae wrote:
| This time around we have all sorts of parallel processing
| capabilities in the form of GPUs. If I recall correctly, the
| Fifth Generation project envisioned highly parallel machines
| performing symbolic AI. From a hardware standpoint, those
| researchers were way ahead of their time.
| nxobject wrote:
| And they had a self-sustaining video game industry too... if
| only someone had had the wild thought of implementing
| perceptrons and tensor arithmetic on the same hardware!
| postepowanieadm wrote:
| and winter is coming.
| metadat wrote:
| For the uninitiated (like me):
|
| _The Japanese Fifth Generation Project_
|
| https://www.sjsu.edu/faculty/watkins/5thgen.htm
| sgt101 wrote:
| Building on this idea people have grounded LLM generated
| reasoning logic with perceptual information from other networks :
| https://web.stanford.edu/~joycj/projects/left_neurips_2023
| a1j9o94 wrote:
| I tried an experiment with this using a Prolog interpreter with
| GPT-4 to try to answer complex logic questions. I found that it
| was really difficult because the model didn't seem to know Prolog
| well enough to write a description of any complexity.
|
| It seems like you used an interpreter in the loop which is likely
| to help. I'd also be interested to see how o1 would do in a task
| like this or if it even makes sense to use something like prolog
| if the models can backtrack during the "thinking" phase
| lukasb wrote:
| I bet one person could probably build a pretty good synthetic
| NL->Prolog dataset. ROI for paying that person would be high if
| you were building a foundation model (ie benefits beyond being
| able to output Prolog.)
| hendler wrote:
| I also wrote wrote an LLM to Prolog interpreter for a hackathon
| called "Logical". With a few hours effort I'm sure it could be
| improved.
|
| https://github.com/Hendler/logical
|
| I think while LLMs may approach completeness here, it's good to
| have an interpretable system to audit/verify and reproduce
| results.
| baq wrote:
| Patiently waiting for z3-guided generation, but this is a
| welcome, if obvious, development. Results are a bit surprising
| and sound too optimistic, though.
| nonamepcbrand1 wrote:
| This is why GitHub CodeQL and Co-Pilot assistance is working
| better for everyone? basically codeql uses variant of Prolog
| (datalog) to query source code to generate better results.
| z5h wrote:
| i've come to appreciate, over the past 2 years of heavy Prolog
| use, that all coding should be (eventually) be done in Prolog.
|
| It's one of few languages that is simultaneously a standalone
| logical formalism, and a standalone representation of
| computation. (With caveats and exceptions, I know). So a Prolog
| program can stand in as a document of all facts, rules and
| relations that a person/organization understands/declares to be
| true. Even if AI writes code for us, we should expect to have it
| presented and manipulated as a logical formalism.
|
| Now if someone cares to argue that some other language/compiler
| is better at generating more performant code on certain
| architectures, then that person can declare their arguments in a
| logical formalism (Prolog) and we can use Prolog to translate
| between language representations, compile, optimize, etc.
| dmead wrote:
| It's taken ages for anything from functional programming to
| penetrate general use. Do you think uptake of logic stuff will
| be any faster?
| johnnyjeans wrote:
| Prolog (and logic programming in general) is much older than
| you think. In fact, if we take modern functional programming
| to have been born with John Backus' Turing Award
| presentation[1], then it even predates it.
|
| Many advancements to functional programming were implemented
| on top of Prolog! Erlang's early versions were built on top
| of a Prolog-derived language who's name escapes me. It's the
| source of Erlang's unfamiliar syntax for more unlearned
| programmers. It's very much like writing Prolog if you had
| return values and no cuts or complex terms.
|
| As for penetrating general use, probably not without a major
| shift in the industry. But it's a very popular language just
| on the periphery, even to this day.
|
| [1] - https://dl.acm.org/doi/10.1145/359576.359579
| cmrdporcupine wrote:
| So why Prolog in particular and not another logic language like
| Mercury or Oz/Mozart etc?
| infradig wrote:
| It's not meant to be taken literally, it refers to any
| language of logic programming". Apologies to Monty Python.
| tomcam wrote:
| Is it your thought that for the average programmer Prolog is
| easier to read and maintain than say Go, C#, or Java?
| anthk wrote:
| Use Constraint Satisfaction Problem Solvers. It commes up with
| Common Lisp with ease.
| mise_en_place wrote:
| I really enjoyed tinkering with languages like Prolog and Coq.
| Interactive theorem proving with LLMs would be awesome to try
| out, if possible.
| fsndz wrote:
| This is basically the LLM modulo approach recommended by Prof.
| Subbarao Kambhampati. Interesting but only works mostly for
| problems that have some math/first degree logic puzzle at their
| heart. Will fail at improving perf at ARC-AGI for example...
| Difficult to mimic reasoning by basic trial and error then hoping
| for the best: https://www.lycee.ai/blog/why-sam-altman-is-wrong
| YeGoblynQueenne wrote:
| That's not going to work. Garbage in - Garbage out is success-set
| equivalent to Garbage in - Prolog out.
|
| Garbage is garbage and failure to reason is failure to reason no
| matter the language. If your LLM can't translate your problem to
| a Prolog program that solves your problem- Prolog can't solve
| your problem.
| Philpax wrote:
| This is a shallow critique that does not engage with the core
| idea. Specifying the problem is not the same as solving the
| problem.
| arjun_khamkar wrote:
| Would Creating a prolog dataset would be beneficial, so that
| future LLM's can be trained on it and then they would be able to
| output prolog code.
| DeborahWrites wrote:
| You're telling me the seemingly arbitrary 6 weeks of Prolog on my
| comp sci course 11yrs ago is suddenly about to be relevant? I did
| not see this one coming . . .
| fullstackwife wrote:
| Is there any need to look at this generated Prolog code?
| UniverseHacker wrote:
| I think this general idea is going to be the key to really making
| LLMs widely useful for solving real problems.
|
| I've been playing with using GPT-4 together with the Wolfram
| Alpha plugin, and the combo of the two can reliably solve
| difficult quantitative problems that neither can individually by
| working together, much like a human using a calculator.
| de6u99er wrote:
| I always thought that Prolog is great for reasoning in the
| semantic web. It doesn't surprise me that LLM people stumble on
| it.
| bytebach wrote:
| An application I am developing for a customer needed to read
| constraints around clinical trials and essentially build a query
| from them. Constraints involve prior treatments, biomarkers, type
| of disease (cancers) etc.
|
| Using just an LLM did not produce reliable queries, despite
| trying many many prompts, so being an old Prolog hacker I
| wondered if using it might impose more 'logic' on the LLM. So we
| precede the textual description of the constraints with the
| following prompt:
|
| -------------
|
| Now consider the following Prolog predicates:
|
| biomarker(Name, Status) where Status will be one of the following
| integers -
|
| Wildtype = 0 Mutated = 1 Methylated = 2 Unmethylated = 3
| Amplified = 4 Deleted = 5 Positive = 6 Negative = 7
|
| tumor(Name, Status) where Status will be one of the following
| integers if know else left unbound -
|
| Newly diagnosed = 1 Recurrence = 2 Metastasized = 3 Progression =
| 4
|
| chemo(Name)
|
| surgery(Name) Where Name may be an unbound variable
|
| other_treatment(Name)
|
| radiation(Name) Where Name may be an unbound variable
|
| Assume you are given predicate atMost(T, N) where T is a compound
| term and N is an integer. It will return true if the number of
| 'occurences' of T is less than or equal N else it will fail.
|
| Assume you are given a predicate atLeastOneOf(L) where L is a
| list of compound terms. It will succeed if at least one of the
| compound terms, when executed as a predicate returns true.
|
| Assume you are given a predicate age(Min, Max) which will return
| true if the patient's age is in between Min and Max.
|
| Assume you have a predicate not(T) which returns true if
| predicate T evaluates false and vice versa. i.e. rather than
| '\\\\+ A' use not(A).
|
| Do not implement the above helper functions.
|
| VERY IMPORTANT: Use 'atLeastOneOf()' whenever you would otherwise
| use ';' to represent 'OR'. i.e. rather than 'A ; B' use
| atLeastOneOf([A, B]).
|
| EXAMPLE INPUT: Patient must have recurrent GBM, methylated MGMT
| and wildtype EGFR. Patient must not have mutated KRAS.
|
| EXAMPLE OUTPUT: tumor('gbm', 2), biomarker('MGMT', 2),
| biomarker('EGFR', 0), not(biomarker('KRAS', 1))
|
| ------------------
|
| The Prolog predicates, when evaluated generate the required
| underlying query (of course the Prolog is itself a form of
| query).
|
| Anyway - the upshot was a vast improvement in the accuracy of the
| generated query (I've yet to see a bad one). Somewhere in its
| bowels, being told to generate Prolog 'focused' the LLM. Perhaps
| LLMs are happier with declarative languages rather than
| imperative ones (I know I am :) ).
| gorkempacaci wrote:
| The generated programs are only technically Prolog programs. They
| use CLPFD, which makes these constraint programs. Prolog programs
| are quite a bit more tricky with termination issues. I wouldn't
| have nitpicked if it wasn't in the title.
|
| Also, the experiment method has some flaws. Problems are hand-
| picked out of a random subset of the full set. Why not run the
| full set?
| ianbicking wrote:
| I made a pipeline using Z3 (another prover language) to get LLMs
| to solve very specific puzzle problems:
| https://youtu.be/UjSf0rA1blc (and a presentation:
| https://youtu.be/TUAmfi8Ws1g)
|
| Some thoughts:
|
| 1. Getting an LLM to model a problem accurately is a significant
| prompting exercise. Bridging casual logical statements and formal
| logic is difficult. E.g., "or" statements in English usually mean
| "xor" in logic.
|
| 2. Domains usually have their own language expectations. I was
| doing Zebra puzzles (https://en.wikipedia.org/wiki/Zebra_Puzzle)
| and they have a very specific pattern and language. I don't think
| it's fair to really call it intuitive or even entirely
| unambiguous, it's something you have to learn. The LLM has to
| learn it too. They have seen this kind of puzzle (and I think
| most can reproduce the original Zebra puzzle from memory), but
| they lack a really firm familiarity.
|
| 3. Arguably some of the familiarity is about contextualizing the
| problem, which is itself a prompting task. People don't naturally
| solve Zebra puzzles that we find organically, it's something we
| encounter in specific contexts (like a puzzle book) which is not
| so dissimilar from prompting.
|
| 4. Incidentally Claude Sonnet 3.5 has a substantial lead. And GPT
| o1 is not much better than GPT 4o. In some sense I think o1 is a
| kind of self-prompting, an attempt to create its own context; so
| if you already have a well-worded prompt with instructions then
| o1 isn't that good at improving performance over 4o.
|
| 5. A lot of the prompting is really intended to slow down the
| LLM, to keep it from jumping to conclusions or solving a task too
| quickly (and incorrectly). Which again is a case of the prompt
| doing what o1 tries to do generally.
|
| 6. I'm not sure what tasks call for this kind of logical
| reasoning. Not that I don't think they exist, I just don't know
| how to recognize them. Planning tasks? Highly formalized and
| artificially constructed problems don't seem all that
| interesting... and the whole point of adding an LLM to the
| process is to formalize the informal.
|
| 7. Perhaps it's hard to see because real-world problems seldom
| have conveniently exact solutions. But that's not a blocker...
| Prolog (and Z3) can take constraints as a form of elimination,
| providing lists of possible answers, and maybe just reducing the
| search space is enough to move forward on some kinds of problems.
|
| 8. For instance when I give my pipeline really hard Zebra
| problems it usually doesn't succeed; one bug in one rule will
| kill the whole thing. Also I think the LLMs have a hard time
| keeping track of large problems; a context size problem, even
| though the problems don't approach their formal context limits.
| But I can imagine building the pipeline so it also tries to mark
| low-confidence rules. Given that I can imagine removing those
| rules, sampling the resulting (non-unique, sometimes incorrect)
| answers and using that to revisit and perhaps correct some of
| those rules.
|
| Really I'd be most interested to hear thoughts on where this
| logic programming might actually be applied... artificial puzzles
| are an interesting exercise, but I can't really motivate myself
| to go too deep.
___________________________________________________________________
(page generated 2024-10-17 23:00 UTC)