[HN Gopher] Recursively Summarizing Enables Long-Term Dialogue M...
___________________________________________________________________
Recursively Summarizing Enables Long-Term Dialogue Memory in LLMs
Author : PaulHoule
Score : 122 points
Date : 2023-09-02 17:08 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| yieldcrv wrote:
| 7 people got together and wrote this paper
|
| Can someone summarize why this is different/better/more important
| than what we've seen 4 months ago with AutoGPT or even longer ago
| with the guy who got ChatGPT to make a compression algorithm that
| other ChatGPT sessions could read and resume conversations from
| [deleted]
| keskival wrote:
| It seems to be exactly the same as what everyone has been doing
| for a long time already, and they fail in citing existing
| implementations.
| codefreakxff wrote:
| I haven't read it yet to be certain, but my impression is that
| rather than using lossless compression of entire chats, this
| uses lossy summarization to get the gist of chats. There will
| be tradeoffs between the two methods, hopefully this paper
| covers that.
| Nevermark wrote:
| This is a little sideways to the article/discussion.
|
| The short memory is a real limitation, but I have noticed most
| critiques of GPT-4's abilities apply as much, or more, to humans.
|
| I don't think anyone alive could convince me they were GPT-4 in a
| Reverse Turing Test situation. GPT-4's fast organized responses
| alone hammer human abilities.
|
| But even a team of humans, with 60 minutes to answer each
| question, could find it difficult to match GPT-4's responses to
| interesting queries. It would be a fun competition.
| wintermutestwin wrote:
| Probably my ignorance talking, but I don't understand why there
| isn't a layer on top of an LLM that uses good old computer logic
| to keep track of and validate data.
|
| I am baffled by the fact that when I ask ChatGPT to look up some
| stock data (for example) it spits out errors. How does it not
| have a built in ability to check its own output?
| a-r-t wrote:
| Because if you had a logic layer capable of validating LLM's
| output, you wouldn't need the LLM.
| ben_w wrote:
| I think we still need an LLM to enable the system as a whole
| to understand vague and half-baked human input.
|
| I can easily ask an LLM to write be a function in a random
| programming language, then feed the output to a compiler, and
| pipe errors from the compiler back to the LLM.
|
| What doesn't work so well is typing "pong in java" into a
| bash shell.
|
| This isn't a perfect solution (not even for small projects),
| but it does demonstrate that automated validation can improve
| the output.
| kendalf89 wrote:
| Even something as simple as censoring swear words would be in
| line with what openAI are trying to accomplish but they keep
| lobotomizing the model instead.
| tomr75 wrote:
| people have built their own solutions to validate data like
| this
|
| won't be one size fits all due to various types of data - and
| if you could validate all types - why do you need the llm!
| jamilton wrote:
| What do you mean by "check its own output"?
| m3kw9 wrote:
| This wouldn't work for ChatGPT right? All it cares about is the
| input context
| hansvm wrote:
| ChatGPT is used in the paper. The problem they're solving is
| how to throw a loop around the LLM (like ChatGPT) to generate
| an appropriate input context.
| ipnon wrote:
| Very interesting academies these researchers are based at ...
| makes you wonder what kind of new applications this research
| enables.
| Philpax wrote:
| What are you saying?
| MadsRC wrote:
| This seems very close to Langchain's "summary" memory
| functionality (which seems to have existed since March 2023?) -
| granted I haven't read the paper just yet
| kgeist wrote:
| I tried to build memory using recursive summarization months ago
| using open source models and what I found is that with a naive
| implementation, it would often get stuck on a certain topic
| forever because certain bits would survive all summarization
| rounds.
| washadjeffmad wrote:
| Yeah, unless this substantially mitigates amplification, even
| when using manual chunk sizing on known materials, the context
| still hangs onto its "dying thoughts" in a way that remarkably
| resembles Alzheimer's.
| cushpush wrote:
| plaque buildup?
| keskival wrote:
| You can just advice it to forget (that is, skip in
| summarization) things which seem irrelevant.
| BoorishBears wrote:
| Or you could use techniques from the 1970s that let you skip
| the kind of highly repetitive text that causes them to get
| stuck without losing important concepts (IDF)
| arketyp wrote:
| Reminds me of a "bad trip" or OCD patterns. I sometimes think
| about how little it takes to derail the human mind, by trauma
| or ontogenically, and how wishful the idea of a human-like AI
| then seems.
| TuringNYC wrote:
| A bit of personal anecdote -- at work, we have thousands of
| "Briefings" which are hour-long (sometimes day-long) in-person
| panels. We've successfully summarized each and every briefing.
| The messy transcripts are well summarized into five paragraph of
| text.
|
| More topical, we also 1:many categorized each briefing into
| topics and sub-topics with topics ending up with several dozen
| briefings and sub-topics of a dozen briefings. For this _we
| summarized the subset of associated summaries_ and tested this
| comprehensively, and had great results with LLMs.
|
| I was originally skeptical this would work, but it worked
| beautifully. Given a sufficiently large context window, we would
| not have done so, but thankfully this was not a problem.
| BoorishBears wrote:
| I mentioned this in a comment a few weeks ago, but people are
| oversimplifying the summarization part:
| https://news.ycombinator.com/item?id=37117515
|
| For a given use-case, long term memory has a subtly different
| value proposition.
|
| If I'm building a home assistant, I should be using NER to
| identify names and build an understanding of how they like to be
| spoken to in messages, or places and how they tend to be commuted
| to
|
| If I'm building a CS bot, I should be identifying queries that
| resulted in extended exchanges, or led to a sudden abandoned cart
|
| Summarization at the generic summarization level is enough for
| flashy demoes, but to build truly useful products right now you
| need to go a step further
| caprock wrote:
| This sounds a lot like the chunking concept from learning theory.
| whimsicalism wrote:
| To my intuition, all of these ways of building memory in "text
| space" seem super hacky.
|
| It seems intuitive to me that memory would be best stored in
| dense embedding space that can preserve full semantic meaning for
| the model rather than as some hacked on process of continually
| regenerating summaries.
|
| And similarly, the model needs to be trained in a setting where
| it is _aware_ of the memory and how to use it. Preferably that
| would be from the very beginning (ie. the train on text).
| nottheengineer wrote:
| It does seem hacky, but then again the whole concept of
| conversational LLMs is. You're just asking it to add an extra
| word to a given conversation and after a bit, it spits out an
| end token that tells your application to hand control back to
| the user.
|
| I think latent space and text space aren't as far apart as you
| think. LLMs are pretty stupid, but very good at speech. They
| are good at writing code because that's very similar, but fall
| apart in things that need some actual abstract thinking, like
| math.
|
| Those text space hacks do tend to work and stuff like "think
| step by step" has become common because of that.
|
| LoRAs are closer to what you mean and they're great at packing
| a lot of understanding into very little data. But adjusting
| weights for a single conversation just isn't feasible yet, so
| we're exploring text space for that purpose. Maybe someome will
| transfer the methods we discover in text space to embedding
| space to make them more efficient, but that's for the future.
| westurner wrote:
| FWIU recently there's?:
|
| - Increase the input prompt token limit (2023-09: 32K tokens
| in: OpenAI GPT-4 Enterprise, Giraffe (LLama 2))
|
| - Fine tune [a "LoRA" atop a foundation model]
|
| - TODO: ~Checkpoint w/ Copy-on-Write
| og_kalu wrote:
| >They are good at writing code because that's very similar,
| but fall apart in things that need some actual abstract
| thinking, like math.
|
| Pretty odd assertion. LLMs are not "good at speech, bad at
| abstract thinking".
|
| What do these have to do with speech ?
|
| https://general-pattern-machines.github.io/
|
| https://arxiv.org/abs/2212.09196
|
| It doesn't even hold with your example because GPT-4 is
| pretty good at Math, nowhere near "falling apart".
| riku_iki wrote:
| > GPT-4 is pretty good at Math, nowhere near "falling
| apart".
|
| its good at tasks which were included into training dataset
| in some variations.
| og_kalu wrote:
| No it's just pretty good in general lol.
| cerved wrote:
| my experience is that it's pretty subpar
| PaulHoule wrote:
| Kinda able to do some math tasks some of the time whereas
| you can use techniques from the arithmetic textbook to
| get the right answer all of the time with millions of
| times less CPU even including the overhead of round-
| tripping to ASCII numerals which is shockingly large
| compared to what a multiply costs.
|
| Kinda "the problem" with LLMS is that they successfully
| seduce people by seeming to get the right answer to
| anything 80% of the time.
| og_kalu wrote:
| Math is a lot more than just Arithmetic.
| PaulHoule wrote:
| Yeah but if you can only do arithmetic right X% of the
| time you aren't going to get other answers right as often
| as would really be useful.
|
| That said, LLMs have a magic ability to "short circuit"
| and get the right answer despite not being able to get
| the steps right. I remember scoping out designs for NLP
| systems about 5 years ago and frequently conclude that
| "that won't work" because information was lost at an
| early stage but in retrospect by short circuiting a
| system like that can outperform its parts but it still
| faces a ceiling on how accurate the answers are because
| the reasoning _is not_ sound.
| btilly wrote:
| Human reasoning is amazingly not sound.
|
| When you add in various patterns, double-checks, and
| memorized previous results, what human reasoning can do
| is astounding. But it is very, vary far from sound.
| sdenton4 wrote:
| The arithmetic issues are well documented and understood;
| it's a problem of sub-token manipulation, which has
| nothing to do with reasoning. (Similar to calling blind
| people unintelligent because they can't read the iq
| test.)
|
| And the better llms can easily write code to do the
| attention that they suck at...
| Nevermark wrote:
| I have played around with GPT-4 and some fairly simple
| but completely new math ideas. It was fabulous at
| identifying special cases I overlooked, that disproved
| conjectures.
| sudokuist wrote:
| Example?
| Nevermark wrote:
| I was playing around with prime numbers, and simple made
| up relationships between them, such as between the square
| of a prime N vs. the set of primes smaller than N, etc.
|
| It caught me out with specific examples that violated my
| conjectures. In one case the conjecture held for all but
| one case, another conjecture was generally true but not
| for 2 and 3.
|
| In one case it thought a conjecture I made was wrong, and
| I had to push it to think through why it thought it was
| wrong until it realized the conjecture was right. As soon
| as it had its epiphany, it corrected all its logic around
| that concept.
|
| It was very simple stuff, but an interesting exercise.
|
| The part I enjoyed the most was seeing GPT-4's
| understanding move and change as we pushed back on each
| other's views. You miss out on that impressive aspect of
| GPT-4 in simpler sessions.
| riku_iki wrote:
| its hard to judge how deep and unique your conjectures
| were.
|
| I did similar testing of GPT4, and my observation is that
| it starts failing after 3-4 levels of reasoning depth.
| sudokuist wrote:
| Have you tried formalizing your ideas with Isabelle? It
| has a constraint solver and will often find
| counterexamples to false arithmetical propositions[1].
|
| 1: https://isabelle.in.tum.de/overview.html
| Nevermark wrote:
| I have not, thanks for the tip.
| mannykannot wrote:
| I have not been able to figure out how that would help in
| the context of this discussion. As I see it, what's very
| interesting here is that an LLM is able to do this.
| riku_iki wrote:
| curious why you referred specifically on isabelle, which
| looks ancient and over engineered, there are many other
| tools and langs in this area.
|
| I am not criticizing, but curious about your opinion.
| sudokuist wrote:
| Can LLMs solve sudoku yet?
| digitcatphd wrote:
| Most things right now seem so. Seems like we are going through
| many rounds of iteration and guessing what works long term and
| what is a short term fix is frustrating
| 3abiton wrote:
| But there are hardware limitations for storing them in memory
| SubiculumCode wrote:
| It seems to me that sparse encodings would be more efficient
| and practical for medium term memory. Isn't rhe problem wirh
| dense embedding is memory usage?
| galaxyLogic wrote:
| Consider what happens if we use this method in our head.
| Recursively summarize the discussion so far. It will improve
| our memory. It seem 'hacky" to summarize things in your head
| but I think that is a big part of how memory works.
| coldtea wrote:
| That's what memory is. Whether sensory information (what you
| remember for a place) or the points from a meeting (where we
| only remember the big picture items, not what was said minute
| to minute) it's all summarized/compressed.
| fnordpiglet wrote:
| I would note that almost everything in computing we use today
| is super hacky stuff sufficiently abstracted and error handled
| such that it seems like it's not at all a hack.
| [deleted]
| tinco wrote:
| Why do you have the intuition that a dense embedding space
| could preserve full semantic meaning? From what I understand
| from embeddings is that they are inherently lossy. At least
| with a textual summary you could have an agent verify the
| summary actually accurately represents the information that
| it's meant to summarize.
| chefandy wrote:
| From a technical perspective, sure: it's clear why it
| functions like that, and there's no technical reason it
| shouldn't. From a user interface perspective-- likely what
| most people would judge with intuition-- that doesn't matter.
| Processes that mimick familiar interaction patterns cause
| dissonance when they don't match our mental model of those
| interactions, and familiar conversation is about as familiar
| as you get. People we interact with know the things we
| explicitly tell them or guide them through figuring out, and
| people intuitively interact with these applications as they
| would with people because they've deliberately adopted the
| voice of an individual person. Additionally, for SaaS
| products, we're used to them maintaining state automatically.
|
| (As annoying as our input can be, this is why dev teams that
| try to solve problems for non-dev users have designers.)
| eganist wrote:
| > rather than as some hacked on process of continually
| regenerating summaries.
|
| incidentally, this isn't far off from how the human brain is
| believed to work (at least with long term memories).
|
| https://news.northwestern.edu/stories/2012/09/your-memory-is...
| sudokuist wrote:
| No one knows how the brain works and how it is connected with
| the body. Did you know your gut is directly connected with
| cognition? An unhealthy digestive system has been linked with
| several neurodegenerative diseases. Also, walking and cardio
| in general is known to create new neurons and prevent
| cognitive degeneration.
|
| It's always funny to me when people on online forums
| confidently proclaim to know what cognition and thinking is
| all about and that furthermore it can be reduced to symbol
| shuffling on computers. No one has any clue how to make
| computers intelligent and anyone that claims to know is
| either doing marketing for some "AI" company or thoroughly
| confused about what counts as biological intelligence.
| eganist wrote:
| You're 100% right, no one knows how the brain works. And
| all the elements you described are probably relevant --
| including things you didn't mention, such as personality
| changes tied to heart transplants, etc.
|
| But that's probably reading a little too deeply and
| seriously into what I said.
| sudokuist wrote:
| [flagged]
| btilly wrote:
| Nobody knows, but this model works well enough to, for
| instance, treat PTSD. Read through
| https://www.ptsd.va.gov/understand_tx/emdr.asp to verify
| that.
| coldtea wrote:
| > _No one knows how the brain works and how it is connected
| with the body_
|
| Yes, but we have some informed theories.
|
| Nobody knew how physics works at Newton's time either, but
| they did know enough to model to satisifcation lots of
| phenomena like ballistics...
|
| > _Did you know your gut is directly connected with
| cognition?_
|
| Well, it is widely known in pop-science articles for
| several decades!
|
| > _No one has any clue how to make computers intelligent_
|
| For not having "any clue", sure the LLM guys did quite well
| in getting some of the way to the goalposts...
___________________________________________________________________
(page generated 2023-09-02 23:00 UTC)