[HN Gopher] The Matrix: A Bayesian learning model for LLMs
       ___________________________________________________________________
        
       The Matrix: A Bayesian learning model for LLMs
        
       Author : stoniejohnson
       Score  : 89 points
       Date   : 2024-05-04 09:27 UTC (13 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | toxik wrote:
       | Completely editorialized title. The article talks about LLMs, not
       | transformers.
        
       | drbig wrote:
       | The title of the paper is actually `The Matrix: A Bayesian
       | learning model for LLMs` and the conclusion presented in the
       | title of this post is not to be found in the abstract... Just a
       | heads up y'all.
        
         | zoky wrote:
         | I don't really care, I just want some vague reassurance that
         | we're probably not on the verge of launching Skynet...
        
       | dosinga wrote:
       | Conclusion from the paper is:
       | 
       | In this paper we present a new model to explain the behavior of
       | Large Language Models. Our frame of reference is an abstract
       | probability matrix, which contains the multinomial probabilities
       | for next token prediction in each row, where the row represents a
       | specific prompt. We then demonstrate that LLM text generation is
       | consistent with a compact representation of this abstract matrix
       | through a combination of embeddings and Bayesian learning. Our
       | model explains (the emergence of) In-Context learning with scale
       | of the LLMs, as also other phenomena like Chain of Thought
       | reasoning and the problem with large context windows. Finally, we
       | outline implications of our model and some directions for future
       | exploration.
       | 
       | Where does the "Cannot Recursively Improve" come from?
        
         | ajb wrote:
         | Looks like someone edited the title of this thread. I assume
         | "Cannot Recursively Improve" was in the old one?
        
           | dosinga wrote:
           | Yeah, looks like it. This is better!
        
       | avi_vallarapu wrote:
       | Theoretically this sounds great. I would worry about scalability
       | issues with the Bayesian learning models practical implementation
       | when dealing with the vast parameter space and data requirements
       | of state of the-art models like GPT-3 and beyond.
       | 
       | Would love to see practical implementations on large-scale
       | datasets and in varied contexts. I Liked the use of Dirichlet
       | distributions to approximate any prior over multinomial
       | distributions.
        
       | ShamelessC wrote:
       | Upvote bait for LessWrong/EA advocates?
       | 
       | *runs before stones are cast*
        
       ___________________________________________________________________
       (page generated 2024-05-04 23:00 UTC)