[HN Gopher] Recursively Summarizing Enables Long-Term Dialogue M...
       ___________________________________________________________________
        
       Recursively Summarizing Enables Long-Term Dialogue Memory in LLMs
        
       Author : PaulHoule
       Score  : 122 points
       Date   : 2023-09-02 17:08 UTC (5 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | yieldcrv wrote:
       | 7 people got together and wrote this paper
       | 
       | Can someone summarize why this is different/better/more important
       | than what we've seen 4 months ago with AutoGPT or even longer ago
       | with the guy who got ChatGPT to make a compression algorithm that
       | other ChatGPT sessions could read and resume conversations from
        
         | [deleted]
        
         | keskival wrote:
         | It seems to be exactly the same as what everyone has been doing
         | for a long time already, and they fail in citing existing
         | implementations.
        
         | codefreakxff wrote:
         | I haven't read it yet to be certain, but my impression is that
         | rather than using lossless compression of entire chats, this
         | uses lossy summarization to get the gist of chats. There will
         | be tradeoffs between the two methods, hopefully this paper
         | covers that.
        
       | Nevermark wrote:
       | This is a little sideways to the article/discussion.
       | 
       | The short memory is a real limitation, but I have noticed most
       | critiques of GPT-4's abilities apply as much, or more, to humans.
       | 
       | I don't think anyone alive could convince me they were GPT-4 in a
       | Reverse Turing Test situation. GPT-4's fast organized responses
       | alone hammer human abilities.
       | 
       | But even a team of humans, with 60 minutes to answer each
       | question, could find it difficult to match GPT-4's responses to
       | interesting queries. It would be a fun competition.
        
       | wintermutestwin wrote:
       | Probably my ignorance talking, but I don't understand why there
       | isn't a layer on top of an LLM that uses good old computer logic
       | to keep track of and validate data.
       | 
       | I am baffled by the fact that when I ask ChatGPT to look up some
       | stock data (for example) it spits out errors. How does it not
       | have a built in ability to check its own output?
        
         | a-r-t wrote:
         | Because if you had a logic layer capable of validating LLM's
         | output, you wouldn't need the LLM.
        
           | ben_w wrote:
           | I think we still need an LLM to enable the system as a whole
           | to understand vague and half-baked human input.
           | 
           | I can easily ask an LLM to write be a function in a random
           | programming language, then feed the output to a compiler, and
           | pipe errors from the compiler back to the LLM.
           | 
           | What doesn't work so well is typing "pong in java" into a
           | bash shell.
           | 
           | This isn't a perfect solution (not even for small projects),
           | but it does demonstrate that automated validation can improve
           | the output.
        
           | kendalf89 wrote:
           | Even something as simple as censoring swear words would be in
           | line with what openAI are trying to accomplish but they keep
           | lobotomizing the model instead.
        
         | tomr75 wrote:
         | people have built their own solutions to validate data like
         | this
         | 
         | won't be one size fits all due to various types of data - and
         | if you could validate all types - why do you need the llm!
        
         | jamilton wrote:
         | What do you mean by "check its own output"?
        
       | m3kw9 wrote:
       | This wouldn't work for ChatGPT right? All it cares about is the
       | input context
        
         | hansvm wrote:
         | ChatGPT is used in the paper. The problem they're solving is
         | how to throw a loop around the LLM (like ChatGPT) to generate
         | an appropriate input context.
        
       | ipnon wrote:
       | Very interesting academies these researchers are based at ...
       | makes you wonder what kind of new applications this research
       | enables.
        
         | Philpax wrote:
         | What are you saying?
        
       | MadsRC wrote:
       | This seems very close to Langchain's "summary" memory
       | functionality (which seems to have existed since March 2023?) -
       | granted I haven't read the paper just yet
        
       | kgeist wrote:
       | I tried to build memory using recursive summarization months ago
       | using open source models and what I found is that with a naive
       | implementation, it would often get stuck on a certain topic
       | forever because certain bits would survive all summarization
       | rounds.
        
         | washadjeffmad wrote:
         | Yeah, unless this substantially mitigates amplification, even
         | when using manual chunk sizing on known materials, the context
         | still hangs onto its "dying thoughts" in a way that remarkably
         | resembles Alzheimer's.
        
           | cushpush wrote:
           | plaque buildup?
        
         | keskival wrote:
         | You can just advice it to forget (that is, skip in
         | summarization) things which seem irrelevant.
        
           | BoorishBears wrote:
           | Or you could use techniques from the 1970s that let you skip
           | the kind of highly repetitive text that causes them to get
           | stuck without losing important concepts (IDF)
        
         | arketyp wrote:
         | Reminds me of a "bad trip" or OCD patterns. I sometimes think
         | about how little it takes to derail the human mind, by trauma
         | or ontogenically, and how wishful the idea of a human-like AI
         | then seems.
        
       | TuringNYC wrote:
       | A bit of personal anecdote -- at work, we have thousands of
       | "Briefings" which are hour-long (sometimes day-long) in-person
       | panels. We've successfully summarized each and every briefing.
       | The messy transcripts are well summarized into five paragraph of
       | text.
       | 
       | More topical, we also 1:many categorized each briefing into
       | topics and sub-topics with topics ending up with several dozen
       | briefings and sub-topics of a dozen briefings. For this _we
       | summarized the subset of associated summaries_ and tested this
       | comprehensively, and had great results with LLMs.
       | 
       | I was originally skeptical this would work, but it worked
       | beautifully. Given a sufficiently large context window, we would
       | not have done so, but thankfully this was not a problem.
        
       | BoorishBears wrote:
       | I mentioned this in a comment a few weeks ago, but people are
       | oversimplifying the summarization part:
       | https://news.ycombinator.com/item?id=37117515
       | 
       | For a given use-case, long term memory has a subtly different
       | value proposition.
       | 
       | If I'm building a home assistant, I should be using NER to
       | identify names and build an understanding of how they like to be
       | spoken to in messages, or places and how they tend to be commuted
       | to
       | 
       | If I'm building a CS bot, I should be identifying queries that
       | resulted in extended exchanges, or led to a sudden abandoned cart
       | 
       | Summarization at the generic summarization level is enough for
       | flashy demoes, but to build truly useful products right now you
       | need to go a step further
        
       | caprock wrote:
       | This sounds a lot like the chunking concept from learning theory.
        
       | whimsicalism wrote:
       | To my intuition, all of these ways of building memory in "text
       | space" seem super hacky.
       | 
       | It seems intuitive to me that memory would be best stored in
       | dense embedding space that can preserve full semantic meaning for
       | the model rather than as some hacked on process of continually
       | regenerating summaries.
       | 
       | And similarly, the model needs to be trained in a setting where
       | it is _aware_ of the memory and how to use it. Preferably that
       | would be from the very beginning (ie. the train on text).
        
         | nottheengineer wrote:
         | It does seem hacky, but then again the whole concept of
         | conversational LLMs is. You're just asking it to add an extra
         | word to a given conversation and after a bit, it spits out an
         | end token that tells your application to hand control back to
         | the user.
         | 
         | I think latent space and text space aren't as far apart as you
         | think. LLMs are pretty stupid, but very good at speech. They
         | are good at writing code because that's very similar, but fall
         | apart in things that need some actual abstract thinking, like
         | math.
         | 
         | Those text space hacks do tend to work and stuff like "think
         | step by step" has become common because of that.
         | 
         | LoRAs are closer to what you mean and they're great at packing
         | a lot of understanding into very little data. But adjusting
         | weights for a single conversation just isn't feasible yet, so
         | we're exploring text space for that purpose. Maybe someome will
         | transfer the methods we discover in text space to embedding
         | space to make them more efficient, but that's for the future.
        
           | westurner wrote:
           | FWIU recently there's?:
           | 
           | - Increase the input prompt token limit (2023-09: 32K tokens
           | in: OpenAI GPT-4 Enterprise, Giraffe (LLama 2))
           | 
           | - Fine tune [a "LoRA" atop a foundation model]
           | 
           | - TODO: ~Checkpoint w/ Copy-on-Write
        
           | og_kalu wrote:
           | >They are good at writing code because that's very similar,
           | but fall apart in things that need some actual abstract
           | thinking, like math.
           | 
           | Pretty odd assertion. LLMs are not "good at speech, bad at
           | abstract thinking".
           | 
           | What do these have to do with speech ?
           | 
           | https://general-pattern-machines.github.io/
           | 
           | https://arxiv.org/abs/2212.09196
           | 
           | It doesn't even hold with your example because GPT-4 is
           | pretty good at Math, nowhere near "falling apart".
        
             | riku_iki wrote:
             | > GPT-4 is pretty good at Math, nowhere near "falling
             | apart".
             | 
             | its good at tasks which were included into training dataset
             | in some variations.
        
               | og_kalu wrote:
               | No it's just pretty good in general lol.
        
               | cerved wrote:
               | my experience is that it's pretty subpar
        
               | PaulHoule wrote:
               | Kinda able to do some math tasks some of the time whereas
               | you can use techniques from the arithmetic textbook to
               | get the right answer all of the time with millions of
               | times less CPU even including the overhead of round-
               | tripping to ASCII numerals which is shockingly large
               | compared to what a multiply costs.
               | 
               | Kinda "the problem" with LLMS is that they successfully
               | seduce people by seeming to get the right answer to
               | anything 80% of the time.
        
               | og_kalu wrote:
               | Math is a lot more than just Arithmetic.
        
               | PaulHoule wrote:
               | Yeah but if you can only do arithmetic right X% of the
               | time you aren't going to get other answers right as often
               | as would really be useful.
               | 
               | That said, LLMs have a magic ability to "short circuit"
               | and get the right answer despite not being able to get
               | the steps right. I remember scoping out designs for NLP
               | systems about 5 years ago and frequently conclude that
               | "that won't work" because information was lost at an
               | early stage but in retrospect by short circuiting a
               | system like that can outperform its parts but it still
               | faces a ceiling on how accurate the answers are because
               | the reasoning _is not_ sound.
        
               | btilly wrote:
               | Human reasoning is amazingly not sound.
               | 
               | When you add in various patterns, double-checks, and
               | memorized previous results, what human reasoning can do
               | is astounding. But it is very, vary far from sound.
        
               | sdenton4 wrote:
               | The arithmetic issues are well documented and understood;
               | it's a problem of sub-token manipulation, which has
               | nothing to do with reasoning. (Similar to calling blind
               | people unintelligent because they can't read the iq
               | test.)
               | 
               | And the better llms can easily write code to do the
               | attention that they suck at...
        
               | Nevermark wrote:
               | I have played around with GPT-4 and some fairly simple
               | but completely new math ideas. It was fabulous at
               | identifying special cases I overlooked, that disproved
               | conjectures.
        
               | sudokuist wrote:
               | Example?
        
               | Nevermark wrote:
               | I was playing around with prime numbers, and simple made
               | up relationships between them, such as between the square
               | of a prime N vs. the set of primes smaller than N, etc.
               | 
               | It caught me out with specific examples that violated my
               | conjectures. In one case the conjecture held for all but
               | one case, another conjecture was generally true but not
               | for 2 and 3.
               | 
               | In one case it thought a conjecture I made was wrong, and
               | I had to push it to think through why it thought it was
               | wrong until it realized the conjecture was right. As soon
               | as it had its epiphany, it corrected all its logic around
               | that concept.
               | 
               | It was very simple stuff, but an interesting exercise.
               | 
               | The part I enjoyed the most was seeing GPT-4's
               | understanding move and change as we pushed back on each
               | other's views. You miss out on that impressive aspect of
               | GPT-4 in simpler sessions.
        
               | riku_iki wrote:
               | its hard to judge how deep and unique your conjectures
               | were.
               | 
               | I did similar testing of GPT4, and my observation is that
               | it starts failing after 3-4 levels of reasoning depth.
        
               | sudokuist wrote:
               | Have you tried formalizing your ideas with Isabelle? It
               | has a constraint solver and will often find
               | counterexamples to false arithmetical propositions[1].
               | 
               | 1: https://isabelle.in.tum.de/overview.html
        
               | Nevermark wrote:
               | I have not, thanks for the tip.
        
               | mannykannot wrote:
               | I have not been able to figure out how that would help in
               | the context of this discussion. As I see it, what's very
               | interesting here is that an LLM is able to do this.
        
               | riku_iki wrote:
               | curious why you referred specifically on isabelle, which
               | looks ancient and over engineered, there are many other
               | tools and langs in this area.
               | 
               | I am not criticizing, but curious about your opinion.
        
             | sudokuist wrote:
             | Can LLMs solve sudoku yet?
        
         | digitcatphd wrote:
         | Most things right now seem so. Seems like we are going through
         | many rounds of iteration and guessing what works long term and
         | what is a short term fix is frustrating
        
         | 3abiton wrote:
         | But there are hardware limitations for storing them in memory
        
         | SubiculumCode wrote:
         | It seems to me that sparse encodings would be more efficient
         | and practical for medium term memory. Isn't rhe problem wirh
         | dense embedding is memory usage?
        
         | galaxyLogic wrote:
         | Consider what happens if we use this method in our head.
         | Recursively summarize the discussion so far. It will improve
         | our memory. It seem 'hacky" to summarize things in your head
         | but I think that is a big part of how memory works.
        
           | coldtea wrote:
           | That's what memory is. Whether sensory information (what you
           | remember for a place) or the points from a meeting (where we
           | only remember the big picture items, not what was said minute
           | to minute) it's all summarized/compressed.
        
         | fnordpiglet wrote:
         | I would note that almost everything in computing we use today
         | is super hacky stuff sufficiently abstracted and error handled
         | such that it seems like it's not at all a hack.
        
         | [deleted]
        
         | tinco wrote:
         | Why do you have the intuition that a dense embedding space
         | could preserve full semantic meaning? From what I understand
         | from embeddings is that they are inherently lossy. At least
         | with a textual summary you could have an agent verify the
         | summary actually accurately represents the information that
         | it's meant to summarize.
        
           | chefandy wrote:
           | From a technical perspective, sure: it's clear why it
           | functions like that, and there's no technical reason it
           | shouldn't. From a user interface perspective-- likely what
           | most people would judge with intuition-- that doesn't matter.
           | Processes that mimick familiar interaction patterns cause
           | dissonance when they don't match our mental model of those
           | interactions, and familiar conversation is about as familiar
           | as you get. People we interact with know the things we
           | explicitly tell them or guide them through figuring out, and
           | people intuitively interact with these applications as they
           | would with people because they've deliberately adopted the
           | voice of an individual person. Additionally, for SaaS
           | products, we're used to them maintaining state automatically.
           | 
           | (As annoying as our input can be, this is why dev teams that
           | try to solve problems for non-dev users have designers.)
        
         | eganist wrote:
         | > rather than as some hacked on process of continually
         | regenerating summaries.
         | 
         | incidentally, this isn't far off from how the human brain is
         | believed to work (at least with long term memories).
         | 
         | https://news.northwestern.edu/stories/2012/09/your-memory-is...
        
           | sudokuist wrote:
           | No one knows how the brain works and how it is connected with
           | the body. Did you know your gut is directly connected with
           | cognition? An unhealthy digestive system has been linked with
           | several neurodegenerative diseases. Also, walking and cardio
           | in general is known to create new neurons and prevent
           | cognitive degeneration.
           | 
           | It's always funny to me when people on online forums
           | confidently proclaim to know what cognition and thinking is
           | all about and that furthermore it can be reduced to symbol
           | shuffling on computers. No one has any clue how to make
           | computers intelligent and anyone that claims to know is
           | either doing marketing for some "AI" company or thoroughly
           | confused about what counts as biological intelligence.
        
             | eganist wrote:
             | You're 100% right, no one knows how the brain works. And
             | all the elements you described are probably relevant --
             | including things you didn't mention, such as personality
             | changes tied to heart transplants, etc.
             | 
             | But that's probably reading a little too deeply and
             | seriously into what I said.
        
               | sudokuist wrote:
               | [flagged]
        
             | btilly wrote:
             | Nobody knows, but this model works well enough to, for
             | instance, treat PTSD. Read through
             | https://www.ptsd.va.gov/understand_tx/emdr.asp to verify
             | that.
        
             | coldtea wrote:
             | > _No one knows how the brain works and how it is connected
             | with the body_
             | 
             | Yes, but we have some informed theories.
             | 
             | Nobody knew how physics works at Newton's time either, but
             | they did know enough to model to satisifcation lots of
             | phenomena like ballistics...
             | 
             | > _Did you know your gut is directly connected with
             | cognition?_
             | 
             | Well, it is widely known in pop-science articles for
             | several decades!
             | 
             | > _No one has any clue how to make computers intelligent_
             | 
             | For not having "any clue", sure the LLM guys did quite well
             | in getting some of the way to the goalposts...
        
       ___________________________________________________________________
       (page generated 2023-09-02 23:00 UTC)