hngopher.com

       [HN Gopher] Multimodal Neurons in Pretrained Text-Only Transformers
       ___________________________________________________________________
        
       Multimodal Neurons in Pretrained Text-Only Transformers
        
       Author : og_kalu
       Score  : 62 points
       Date   : 2023-08-04 12:25 UTC (2 days ago)
        
 (HTM) web link (huggingface.co)
 (TXT) w3m dump (huggingface.co)
        
       | ttpphd wrote:
       | As a sensory scientist it's really frustrating to watch computer
       | scientists appropriate our terminology and use it incorrectly.
        
         | zvolsky wrote:
         | Your frustration is understandable, but there is no fault on
         | either side. Language evolves and different fields borrow and
         | evolve the meanings of words all the time. Not even the
         | `neuron` that you're thinking of is the original meaning of the
         | word.
        
         | amelius wrote:
         | I watched Transformers as a kid, and I'm equally frustrated.
        
           | YeGoblynQueenne wrote:
           | Is this flippant remark an example of the kind of
           | conversation that you want to find on the internet?
        
             | amelius wrote:
             | We're talking about names, terminology here. What deep
             | conversations are you expecting to find, and can't they be
             | intermingled with lighter remarks?
        
               | doctor_eval wrote:
               | I agree that a light sprinkle of humour can make a topic
               | more approachable.
               | 
               | That said, I keep thinking these posts are about
               | electricity.
        
               | YeGoblynQueenne wrote:
               | Maybe I'm over-reacting. My apologies.
               | 
               | "Deep conversation"- I like how dang puts it: curious
               | conversation.
        
         | BaseballPhysics wrote:
         | IMO it's worse: they aren't just subsuming the terminology.
         | There's a strain of folks in tech that think that somehow AI
         | researchers have actually uncovered how the brain works.
         | 
         | The "neural net" started out as an analogy for a particular
         | kind of statistical model. Now that analogy is being mistaken
         | for some sort of fundamental truth.
         | 
         | And if you suggest otherwise, it's the _technologists_ who have
         | it right, not the neuroscientists.
        
           | naasking wrote:
           | > Now that analogy is being mistaken for some sort of
           | fundamental truth.
           | 
           | I think some of the pushback you receive is because you're
           | calling it a mistake when you yourself just admitted that you
           | can't know if it's a mistake because of our incomplete
           | knowledge of neurology.
           | 
           | On the other side of this, there are straightforward
           | arguments that there is some deep connection here. Since LLMs
           | infer statistical relationships of languages produced by
           | human brains, they are in a real sense building a statistical
           | model of how the human brain processes language. It could be
           | the case that this is merely an approximation of some kind,
           | but it could also be exactly how it works.
        
             | YeGoblynQueenne wrote:
             | >> Since LLMs infer statistical relationships of languages
             | produced by human brains, they are in a real sense building
             | a statistical model of how the human brain processes
             | language.
             | 
             | I think we understand very well that a statistical model
             | can accurately predict the behaviour of a system and still
             | have nothing to do with how the system operates internally.
             | 
             | In a sense, that's the big advantage of building
             | statistical models: you don't need to consider the
             | internals of the system you're modelling.
        
               | naasking wrote:
               | > I think we understand very well that a statistical
               | model can accurately predict the behaviour of a system
               | and still have nothing to do with how the system operates
               | internally.
               | 
               | I agree that's true of _some_ statistical models, I don
               | 't think it's true of Bayesian models. Solomonoff
               | Induction has shown that Bayesian inference will
               | reproduce any underlying computable function given a
               | suitable prior.
               | 
               | And this should be obvious given this simple sketch of
               | the argument: classical logic is just Bayesian inference
               | with all probabilities pinned to 0 and 1, and a model of
               | a system in classical logic _is_ a model of how it
               | operates. Therefore you just need to run Bayes ' rule
               | long enough for the probabilities to converge.
               | 
               | Many papers have shown that LLMs perform Bayesian
               | inference.
        
             | BaseballPhysics wrote:
             | > I think some of the pushback you receive is because
             | you're calling it a mistake when you yourself just admitted
             | that you can't know if it's a mistake because of our
             | incomplete knowledge of neurology.
             | 
             | I'm not the one making the affirmative claim absent
             | evidence. I'm the one pointing out the lack of evidence for
             | the claim.
             | 
             | If someone says "there are aliens on the far side of the
             | moon," it's not a mistake to say "there is no evidence for
             | your extraordinary claim".
             | 
             | > On the other side of this, there are straightforward
             | arguments that there is some deep connection here.
             | 
             | No, there isn't.
             | 
             | Humans can perform math. Computers can perform math. No one
             | would claim that's evidence computers think like humans or
             | vice versa.
             | 
             | > they are in a real sense building a statistical model of
             | how the human brain processes language.
             | 
             | And there you go, making exactly the kind of claim I'm
             | talking about.
             | 
             | I'm going to make this very clear: _there is absolutely no
             | evidence that supports this claim_. Period.
        
               | doctor_eval wrote:
               | I agree with most of your points and I think there's a
               | good argument that CS should use different terms that
               | don't overlap with biology, but I don't agree with this:
               | 
               | > there is absolutely no evidence that supports this
               | claim. Period
               | 
               | The evidence is that it understands and can respond with
               | _human_ language which arose from purely biological
               | processes. So until we understand how this happens we
               | can't say for sure that these things are not statistical
               | models of part of the human brain. Maybe these models are
               | picking up on Chomsky's universal grammar or maybe they
               | are emulating brains. We just don't know.
               | 
               | It might not be strong evidence, it might be indirect,
               | but it is not a total and irrefutable _absence_ of
               | evidence.
        
               | BaseballPhysics wrote:
               | > The evidence is that it understands and can respond
               | with human language which arose from purely biological
               | processes
               | 
               | If you don't understand how this isn't evidence, I
               | honestly don't know what to do.
               | 
               | > So until we understand how this happens we can't say
               | for sure that these things are not statistical models of
               | part of the human brain.
               | 
               |  _I never said that._
               | 
               | I said _you can 't affirmatively say they are_, and
               | further, that because we don't understand how the human
               | brain works, the fact that we can make computers perform
               | tasks that humans can is not evidence in favour of the
               | idea that those computers are in some way modelling the
               | way the human brain actually works. And that is the claim
               | I've seen many people make. In fact that's the claim you
               | just made.
               | 
               | > We just don't know.
               | 
               | On that we agree.
        
               | naasking wrote:
               | > Humans can perform math. Computers can perform math. No
               | one would claim that's evidence computers think like
               | humans or vice versa.
               | 
               | But we would very sensibly claim that computers _can_
               | think like humans when suitably programmed, and humans
               | _can_ compute like computers. And the claim here is that
               | learning the relationships between words _is_ an
               | understanding of language, and natural language reflects
               | human cognition.
               | 
               | > I'm going to make this very clear: there is absolutely
               | no evidence that supports this claim. Period.
               | 
               | Again, that's incorrect. In what other science could you
               | produce a model that nearly 100% accurately reproduces
               | what the system being modelled would generate, and people
               | would insist on saying that that doesn't really model the
               | operation of that system? Inconsistent standards of
               | evidence IMO.
               | 
               | In any case, There have been a few studies demonstrating
               | strong correlations in activation patterns between the
               | human brain and neural networks. These are correlations,
               | but correlations are evidence.
               | 
               | Furthermore, I think you're failing to understand the
               | argument. Human languages were invented by humans. They
               | are necessarily suited to the human mind, reflecting some
               | fundamental structure and operation of the human brain.
               | 
               | It would be a fairly dramatic coincidence if other,
               | random formal systems were well suited to reproducing
               | natural language. In fact, the most obvious inference is
               | that LLMs are likely inferring semantic models that
               | encapsulate how humans categorize and think, which is why
               | LLMs can translate text between human languages. This
               | would not be possible if languages did not have a common
               | underlying semantic structure.
        
               | [deleted]
        
               | [deleted]
        
           | CuriouslyC wrote:
           | That's not a fair characterization. There are patterns that
           | emerge in complexity given certain basic things hold true. It
           | doesn't matter whether it's neurons or code, if the system
           | has those property, it will also have those dynamics. This is
           | a lot easier to understand in the abstract, whereas a
           | neuroscientist might ask "why does the brain do XYZ" because
           | they don't understand system dynamics.
        
             | BaseballPhysics wrote:
             | I'd suggest paying closer attention to conversations about
             | ChatGPT and purported sentience before claiming my comments
             | are a mischaracterization.
             | 
             | I can't count the number of times I've seen people argue
             | that LLMs are proof that the human mind is just a
             | statistical model.
             | 
             | Meanwhile, any neuroscientists will be the first to tell
             | you that we know surprisingly little about how neurons
             | actually work together within the human brain, and
             | certainly don't understand how consciousness emerges.
        
               | mandmandam wrote:
               | Seconded. I even got called arrogant for _very politely_
               | pointing out that this was wrong once, because the arguer
               | had been in a computer science class, and so  "knew what
               | they were talking about".
               | 
               | Neurons are complex. It's arrogant to think we fully
               | understand them.
        
             | YeGoblynQueenne wrote:
             | >> There are patterns that emerge in complexity given
             | certain basic things hold true.
             | 
             | Do you mean that in the abstract sense, or do you know what
             | patterns are those, and when they "emerge"? Could you
             | describe them? For example, can you give a mathematical
             | formula for them?
        
           | bippihippi1 wrote:
           | i think it's part marketing hype to convince people it's
           | cooler than it is, and that ai companies are not stealing
           | your work by training models on it. ai doesn't generate
           | content, all models learned by gradient descent are kernel
           | machines. it's just a black box version of a function that
           | approximates points in the training set. if they convince
           | people ai is close to sentient and creative, it seems less
           | like copyright infringement
        
         | esalman wrote:
         | Electrical engineer here, equally frustrated about convolution.
        
           | sorenjan wrote:
           | What's wrong about using the term convolution in ML?
        
             | krackers wrote:
             | It's actually cross-correlation, not convolution? (The two
             | are only equal if kernel is symmetric)
        
               | sorenjan wrote:
               | It's just a matter of mirroring the kernel, and since the
               | kernels are learned I don't really see the big deal. I
               | guess the usage of the term convolution comes from image
               | processing and other DSP where it used to be actual
               | convolution, and now the operation is basically the same
               | even if you might not even care about the actual values
               | in each kernel.
        
               | esalman wrote:
               | Yes. Convolution itself has a lot of very useful
               | properties in signal processing (transformation between
               | time and frequency domain for easier compute of LTI
               | systems etc.) But those are lost from the ML literature.
        
         | novaRom wrote:
         | In modern Deep Learning , very few high-impact papers contain
         | any scientifical theoretical explanation. It's more like trial
         | and error, "we found A works and it improves B". Tech reports.
        
           | _0ffh wrote:
           | In a very old paper I once (>20 years ago) read the sentence
           | "Machine learning research is mostly experimental math".
           | 
           | There's always been the two kinds of ML research, the
           | "experimental math" and the "understanding the math" kind,
           | with a degree of overlap.
        
             | cubefox wrote:
             | Ever since machine learning became practically useful with
             | OCR, the the experimentalists took over more and more. It's
             | now mostly about what works instead of about proving
             | theorems.
        
         | petesergeant wrote:
         | My father, a chartered civil engineer, was very upset when I
         | got a Software Engineering degree
        
         | dpflan wrote:
         | Do you mind highlighting the appropriated terminology and what
         | would be better terms to use?
        
           | tel wrote:
           | "Superposition" seems to be another term used for this.
           | Although, it implies a mechanism instead of just a behavior.
        
             | kelipso wrote:
             | Superposition appropriated from physics, and the physics
             | sense of the word was probably appropriated from another
             | field.
        
               | imchillyb wrote:
               | Superposition comes from geology, meaning one strata over
               | another.
        
       ___________________________________________________________________
       (page generated 2023-08-06 23:01 UTC)