[HN Gopher] The Matrix: A Bayesian learning model for LLMs
___________________________________________________________________
The Matrix: A Bayesian learning model for LLMs
Author : stoniejohnson
Score : 89 points
Date : 2024-05-04 09:27 UTC (13 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| toxik wrote:
| Completely editorialized title. The article talks about LLMs, not
| transformers.
| drbig wrote:
| The title of the paper is actually `The Matrix: A Bayesian
| learning model for LLMs` and the conclusion presented in the
| title of this post is not to be found in the abstract... Just a
| heads up y'all.
| zoky wrote:
| I don't really care, I just want some vague reassurance that
| we're probably not on the verge of launching Skynet...
| dosinga wrote:
| Conclusion from the paper is:
|
| In this paper we present a new model to explain the behavior of
| Large Language Models. Our frame of reference is an abstract
| probability matrix, which contains the multinomial probabilities
| for next token prediction in each row, where the row represents a
| specific prompt. We then demonstrate that LLM text generation is
| consistent with a compact representation of this abstract matrix
| through a combination of embeddings and Bayesian learning. Our
| model explains (the emergence of) In-Context learning with scale
| of the LLMs, as also other phenomena like Chain of Thought
| reasoning and the problem with large context windows. Finally, we
| outline implications of our model and some directions for future
| exploration.
|
| Where does the "Cannot Recursively Improve" come from?
| ajb wrote:
| Looks like someone edited the title of this thread. I assume
| "Cannot Recursively Improve" was in the old one?
| dosinga wrote:
| Yeah, looks like it. This is better!
| avi_vallarapu wrote:
| Theoretically this sounds great. I would worry about scalability
| issues with the Bayesian learning models practical implementation
| when dealing with the vast parameter space and data requirements
| of state of the-art models like GPT-3 and beyond.
|
| Would love to see practical implementations on large-scale
| datasets and in varied contexts. I Liked the use of Dirichlet
| distributions to approximate any prior over multinomial
| distributions.
| ShamelessC wrote:
| Upvote bait for LessWrong/EA advocates?
|
| *runs before stones are cast*
___________________________________________________________________
(page generated 2024-05-04 23:00 UTC)