[HN Gopher] Multimodal Neurons in Pretrained Text-Only Transformers
___________________________________________________________________
Multimodal Neurons in Pretrained Text-Only Transformers
Author : og_kalu
Score : 62 points
Date : 2023-08-04 12:25 UTC (2 days ago)
(HTM) web link (huggingface.co)
(TXT) w3m dump (huggingface.co)
| ttpphd wrote:
| As a sensory scientist it's really frustrating to watch computer
| scientists appropriate our terminology and use it incorrectly.
| zvolsky wrote:
| Your frustration is understandable, but there is no fault on
| either side. Language evolves and different fields borrow and
| evolve the meanings of words all the time. Not even the
| `neuron` that you're thinking of is the original meaning of the
| word.
| amelius wrote:
| I watched Transformers as a kid, and I'm equally frustrated.
| YeGoblynQueenne wrote:
| Is this flippant remark an example of the kind of
| conversation that you want to find on the internet?
| amelius wrote:
| We're talking about names, terminology here. What deep
| conversations are you expecting to find, and can't they be
| intermingled with lighter remarks?
| doctor_eval wrote:
| I agree that a light sprinkle of humour can make a topic
| more approachable.
|
| That said, I keep thinking these posts are about
| electricity.
| YeGoblynQueenne wrote:
| Maybe I'm over-reacting. My apologies.
|
| "Deep conversation"- I like how dang puts it: curious
| conversation.
| BaseballPhysics wrote:
| IMO it's worse: they aren't just subsuming the terminology.
| There's a strain of folks in tech that think that somehow AI
| researchers have actually uncovered how the brain works.
|
| The "neural net" started out as an analogy for a particular
| kind of statistical model. Now that analogy is being mistaken
| for some sort of fundamental truth.
|
| And if you suggest otherwise, it's the _technologists_ who have
| it right, not the neuroscientists.
| naasking wrote:
| > Now that analogy is being mistaken for some sort of
| fundamental truth.
|
| I think some of the pushback you receive is because you're
| calling it a mistake when you yourself just admitted that you
| can't know if it's a mistake because of our incomplete
| knowledge of neurology.
|
| On the other side of this, there are straightforward
| arguments that there is some deep connection here. Since LLMs
| infer statistical relationships of languages produced by
| human brains, they are in a real sense building a statistical
| model of how the human brain processes language. It could be
| the case that this is merely an approximation of some kind,
| but it could also be exactly how it works.
| YeGoblynQueenne wrote:
| >> Since LLMs infer statistical relationships of languages
| produced by human brains, they are in a real sense building
| a statistical model of how the human brain processes
| language.
|
| I think we understand very well that a statistical model
| can accurately predict the behaviour of a system and still
| have nothing to do with how the system operates internally.
|
| In a sense, that's the big advantage of building
| statistical models: you don't need to consider the
| internals of the system you're modelling.
| naasking wrote:
| > I think we understand very well that a statistical
| model can accurately predict the behaviour of a system
| and still have nothing to do with how the system operates
| internally.
|
| I agree that's true of _some_ statistical models, I don
| 't think it's true of Bayesian models. Solomonoff
| Induction has shown that Bayesian inference will
| reproduce any underlying computable function given a
| suitable prior.
|
| And this should be obvious given this simple sketch of
| the argument: classical logic is just Bayesian inference
| with all probabilities pinned to 0 and 1, and a model of
| a system in classical logic _is_ a model of how it
| operates. Therefore you just need to run Bayes ' rule
| long enough for the probabilities to converge.
|
| Many papers have shown that LLMs perform Bayesian
| inference.
| BaseballPhysics wrote:
| > I think some of the pushback you receive is because
| you're calling it a mistake when you yourself just admitted
| that you can't know if it's a mistake because of our
| incomplete knowledge of neurology.
|
| I'm not the one making the affirmative claim absent
| evidence. I'm the one pointing out the lack of evidence for
| the claim.
|
| If someone says "there are aliens on the far side of the
| moon," it's not a mistake to say "there is no evidence for
| your extraordinary claim".
|
| > On the other side of this, there are straightforward
| arguments that there is some deep connection here.
|
| No, there isn't.
|
| Humans can perform math. Computers can perform math. No one
| would claim that's evidence computers think like humans or
| vice versa.
|
| > they are in a real sense building a statistical model of
| how the human brain processes language.
|
| And there you go, making exactly the kind of claim I'm
| talking about.
|
| I'm going to make this very clear: _there is absolutely no
| evidence that supports this claim_. Period.
| doctor_eval wrote:
| I agree with most of your points and I think there's a
| good argument that CS should use different terms that
| don't overlap with biology, but I don't agree with this:
|
| > there is absolutely no evidence that supports this
| claim. Period
|
| The evidence is that it understands and can respond with
| _human_ language which arose from purely biological
| processes. So until we understand how this happens we
| can't say for sure that these things are not statistical
| models of part of the human brain. Maybe these models are
| picking up on Chomsky's universal grammar or maybe they
| are emulating brains. We just don't know.
|
| It might not be strong evidence, it might be indirect,
| but it is not a total and irrefutable _absence_ of
| evidence.
| BaseballPhysics wrote:
| > The evidence is that it understands and can respond
| with human language which arose from purely biological
| processes
|
| If you don't understand how this isn't evidence, I
| honestly don't know what to do.
|
| > So until we understand how this happens we can't say
| for sure that these things are not statistical models of
| part of the human brain.
|
| _I never said that._
|
| I said _you can 't affirmatively say they are_, and
| further, that because we don't understand how the human
| brain works, the fact that we can make computers perform
| tasks that humans can is not evidence in favour of the
| idea that those computers are in some way modelling the
| way the human brain actually works. And that is the claim
| I've seen many people make. In fact that's the claim you
| just made.
|
| > We just don't know.
|
| On that we agree.
| naasking wrote:
| > Humans can perform math. Computers can perform math. No
| one would claim that's evidence computers think like
| humans or vice versa.
|
| But we would very sensibly claim that computers _can_
| think like humans when suitably programmed, and humans
| _can_ compute like computers. And the claim here is that
| learning the relationships between words _is_ an
| understanding of language, and natural language reflects
| human cognition.
|
| > I'm going to make this very clear: there is absolutely
| no evidence that supports this claim. Period.
|
| Again, that's incorrect. In what other science could you
| produce a model that nearly 100% accurately reproduces
| what the system being modelled would generate, and people
| would insist on saying that that doesn't really model the
| operation of that system? Inconsistent standards of
| evidence IMO.
|
| In any case, There have been a few studies demonstrating
| strong correlations in activation patterns between the
| human brain and neural networks. These are correlations,
| but correlations are evidence.
|
| Furthermore, I think you're failing to understand the
| argument. Human languages were invented by humans. They
| are necessarily suited to the human mind, reflecting some
| fundamental structure and operation of the human brain.
|
| It would be a fairly dramatic coincidence if other,
| random formal systems were well suited to reproducing
| natural language. In fact, the most obvious inference is
| that LLMs are likely inferring semantic models that
| encapsulate how humans categorize and think, which is why
| LLMs can translate text between human languages. This
| would not be possible if languages did not have a common
| underlying semantic structure.
| [deleted]
| [deleted]
| CuriouslyC wrote:
| That's not a fair characterization. There are patterns that
| emerge in complexity given certain basic things hold true. It
| doesn't matter whether it's neurons or code, if the system
| has those property, it will also have those dynamics. This is
| a lot easier to understand in the abstract, whereas a
| neuroscientist might ask "why does the brain do XYZ" because
| they don't understand system dynamics.
| BaseballPhysics wrote:
| I'd suggest paying closer attention to conversations about
| ChatGPT and purported sentience before claiming my comments
| are a mischaracterization.
|
| I can't count the number of times I've seen people argue
| that LLMs are proof that the human mind is just a
| statistical model.
|
| Meanwhile, any neuroscientists will be the first to tell
| you that we know surprisingly little about how neurons
| actually work together within the human brain, and
| certainly don't understand how consciousness emerges.
| mandmandam wrote:
| Seconded. I even got called arrogant for _very politely_
| pointing out that this was wrong once, because the arguer
| had been in a computer science class, and so "knew what
| they were talking about".
|
| Neurons are complex. It's arrogant to think we fully
| understand them.
| YeGoblynQueenne wrote:
| >> There are patterns that emerge in complexity given
| certain basic things hold true.
|
| Do you mean that in the abstract sense, or do you know what
| patterns are those, and when they "emerge"? Could you
| describe them? For example, can you give a mathematical
| formula for them?
| bippihippi1 wrote:
| i think it's part marketing hype to convince people it's
| cooler than it is, and that ai companies are not stealing
| your work by training models on it. ai doesn't generate
| content, all models learned by gradient descent are kernel
| machines. it's just a black box version of a function that
| approximates points in the training set. if they convince
| people ai is close to sentient and creative, it seems less
| like copyright infringement
| esalman wrote:
| Electrical engineer here, equally frustrated about convolution.
| sorenjan wrote:
| What's wrong about using the term convolution in ML?
| krackers wrote:
| It's actually cross-correlation, not convolution? (The two
| are only equal if kernel is symmetric)
| sorenjan wrote:
| It's just a matter of mirroring the kernel, and since the
| kernels are learned I don't really see the big deal. I
| guess the usage of the term convolution comes from image
| processing and other DSP where it used to be actual
| convolution, and now the operation is basically the same
| even if you might not even care about the actual values
| in each kernel.
| esalman wrote:
| Yes. Convolution itself has a lot of very useful
| properties in signal processing (transformation between
| time and frequency domain for easier compute of LTI
| systems etc.) But those are lost from the ML literature.
| novaRom wrote:
| In modern Deep Learning , very few high-impact papers contain
| any scientifical theoretical explanation. It's more like trial
| and error, "we found A works and it improves B". Tech reports.
| _0ffh wrote:
| In a very old paper I once (>20 years ago) read the sentence
| "Machine learning research is mostly experimental math".
|
| There's always been the two kinds of ML research, the
| "experimental math" and the "understanding the math" kind,
| with a degree of overlap.
| cubefox wrote:
| Ever since machine learning became practically useful with
| OCR, the the experimentalists took over more and more. It's
| now mostly about what works instead of about proving
| theorems.
| petesergeant wrote:
| My father, a chartered civil engineer, was very upset when I
| got a Software Engineering degree
| dpflan wrote:
| Do you mind highlighting the appropriated terminology and what
| would be better terms to use?
| tel wrote:
| "Superposition" seems to be another term used for this.
| Although, it implies a mechanism instead of just a behavior.
| kelipso wrote:
| Superposition appropriated from physics, and the physics
| sense of the word was probably appropriated from another
| field.
| imchillyb wrote:
| Superposition comes from geology, meaning one strata over
| another.
___________________________________________________________________
(page generated 2023-08-06 23:01 UTC)