[HN Gopher] We Found an Neuron in GPT-2
       ___________________________________________________________________
        
       We Found an Neuron in GPT-2
        
       Author : todsacerdoti
       Score  : 86 points
       Date   : 2023-02-16 17:01 UTC (5 hours ago)
        
 (HTM) web link (clementneo.com)
 (TXT) w3m dump (clementneo.com)
        
       | canadianfella wrote:
       | [dead]
        
       | unshavedyak wrote:
       | I wonder if this stuff will ever be applicable to a person and a
       | laptop (or if it is now?).
       | 
       | Ie this seems like such a cool area to be in but the data volumes
       | required are huge, complex, etc. Code is simple, cheap, lean, etc
       | by comparison.
       | 
       | Do we have any insight on how this area of research could be
       | usable with less hardware and data? Is there a visible future
       | where a guy and a laptop can make a big program? _(without
       | depending on tech getting small /cheap in 50 years or w/e)_
        
         | fragmede wrote:
         | Does this hypothetical laptop have a GPU? StableDiffusion is in
         | this realm of "stuff" and is runnable on consumer GPU systems.
         | It's a bit of trouble to get set Yu p if you're not a python
         | dev (and kinda still is if you are) but it's a pretty neat ML
         | model to play around with.
        
         | nl wrote:
         | You could run this analysis on a laptop very easily.
         | 
         | The pile-10k dataset they used for analysis is 33MB, and GPT2
         | runs ok on a CPU. For the full 10K analysis it's probably
         | quicker to get a GPU though.
        
       | nailer wrote:
       | > The choice depends on whether the word that comes after starts
       | with a vowel or not
       | 
       | Or an aitch / h, but only sometimes.
        
         | jakear wrote:
         | More accurately, it's whether the word that comes after starts
         | with a vowel _sound_. This is why `an  'istoric` is correct,
         | and `a historic` is correct, but `an historic` is incorrect (as
         | famously used by Steven Colbert).
        
           | edgyquant wrote:
           | I is a vowel so your rule about sound doesn't apply here.
           | It's also not a rule I've ever heard, in school we're taught
           | that only vowels get a before them
        
       | tqi wrote:
       | > We started out with the question: How does GPT-2 know when to
       | use the word an over a? The choice depends on whether the word
       | that comes after starts with a vowel or not, but GPT-2 is only
       | capable of predicting one word at a time. We still don't have a
       | full answer...
       | 
       | I'm not sure I understand why this is an open question. While I
       | get that GPT-2 is predicting only one word at a time, it doesn't
       | seem that surprising that there might be cases where there is a
       | dominant bigram (ie "an apple" in the case of their example
       | prompt) that would trigger an "an" prediction, without actually
       | predicting the following word first.
       | 
       | Am I missing something?
        
         | laszlokorte wrote:
         | From my naive point of view it seems obvious that at any point
         | where both an ,,a" or a ,,an" would fit the model randomly
         | selects one of them and by doing so reduces the set of possible
         | nouns to follow.
        
         | lalos wrote:
         | Not to mention that the corpus mostly will have the correct
         | case for most common words like apple. Train it with essays of
         | ESL students and you'll get something else.
        
         | prox wrote:
         | Wouldn't this also be correlated somehow in its vector space?
        
       | [deleted]
        
       | niccl wrote:
       | so evidence for a grandmother cell
       | (https://en.wikipedia.org/wiki/Grandmother_cell)
        
       | radus wrote:
       | The way they're going about this investigation is reminiscent of
       | how we figure things out in biology and neuroscience. Perhaps a
       | biologist won't do the best job at fixing a radio [1], but they
       | might do alright debugging neural networks.
       | 
       | 1: https://www.cell.com/cancer-
       | cell/pdf/S1535-6108(02)00133-2.p...
        
       | rhelz wrote:
       | Since complex systems are composed of simpler systems, seems like
       | for any sufficiently complex system you'd be able to find subsets
       | of it which are isomorphic which any sufficiently simple system.
        
       | LudwigNagasena wrote:
       | It's notable how successful LLMs despite the lack of any
       | linguistic tools in their architectures. It would be interesting
       | to know how different a model would be if it operated on eg
       | dependency trees instead of the linear list of tokens. Surely,
       | the question of "a/an" would be solved with ease as the model
       | would be required to come up with a noun token before choosing
       | its determiner. I wonder if the developers of LLMs explored those
       | approaches but found them infeasible due to large preprocessing
       | times, immaturity of such tools and/or little benefit.
        
         | lalos wrote:
         | Not relevant but:
         | 
         | "Every time I fire a linguist, the performance of the speech
         | recognizer goes up". - Frederick Jelinek
        
           | jojobas wrote:
           | Surely the remaining linguists muster up some improvements
           | out of fear for their jobs!
        
       ___________________________________________________________________
       (page generated 2023-02-16 23:00 UTC)