[HN Gopher] We Found an Neuron in GPT-2
___________________________________________________________________
We Found an Neuron in GPT-2
Author : todsacerdoti
Score : 86 points
Date : 2023-02-16 17:01 UTC (5 hours ago)
(HTM) web link (clementneo.com)
(TXT) w3m dump (clementneo.com)
| canadianfella wrote:
| [dead]
| unshavedyak wrote:
| I wonder if this stuff will ever be applicable to a person and a
| laptop (or if it is now?).
|
| Ie this seems like such a cool area to be in but the data volumes
| required are huge, complex, etc. Code is simple, cheap, lean, etc
| by comparison.
|
| Do we have any insight on how this area of research could be
| usable with less hardware and data? Is there a visible future
| where a guy and a laptop can make a big program? _(without
| depending on tech getting small /cheap in 50 years or w/e)_
| fragmede wrote:
| Does this hypothetical laptop have a GPU? StableDiffusion is in
| this realm of "stuff" and is runnable on consumer GPU systems.
| It's a bit of trouble to get set Yu p if you're not a python
| dev (and kinda still is if you are) but it's a pretty neat ML
| model to play around with.
| nl wrote:
| You could run this analysis on a laptop very easily.
|
| The pile-10k dataset they used for analysis is 33MB, and GPT2
| runs ok on a CPU. For the full 10K analysis it's probably
| quicker to get a GPU though.
| nailer wrote:
| > The choice depends on whether the word that comes after starts
| with a vowel or not
|
| Or an aitch / h, but only sometimes.
| jakear wrote:
| More accurately, it's whether the word that comes after starts
| with a vowel _sound_. This is why `an 'istoric` is correct,
| and `a historic` is correct, but `an historic` is incorrect (as
| famously used by Steven Colbert).
| edgyquant wrote:
| I is a vowel so your rule about sound doesn't apply here.
| It's also not a rule I've ever heard, in school we're taught
| that only vowels get a before them
| tqi wrote:
| > We started out with the question: How does GPT-2 know when to
| use the word an over a? The choice depends on whether the word
| that comes after starts with a vowel or not, but GPT-2 is only
| capable of predicting one word at a time. We still don't have a
| full answer...
|
| I'm not sure I understand why this is an open question. While I
| get that GPT-2 is predicting only one word at a time, it doesn't
| seem that surprising that there might be cases where there is a
| dominant bigram (ie "an apple" in the case of their example
| prompt) that would trigger an "an" prediction, without actually
| predicting the following word first.
|
| Am I missing something?
| laszlokorte wrote:
| From my naive point of view it seems obvious that at any point
| where both an ,,a" or a ,,an" would fit the model randomly
| selects one of them and by doing so reduces the set of possible
| nouns to follow.
| lalos wrote:
| Not to mention that the corpus mostly will have the correct
| case for most common words like apple. Train it with essays of
| ESL students and you'll get something else.
| prox wrote:
| Wouldn't this also be correlated somehow in its vector space?
| [deleted]
| niccl wrote:
| so evidence for a grandmother cell
| (https://en.wikipedia.org/wiki/Grandmother_cell)
| radus wrote:
| The way they're going about this investigation is reminiscent of
| how we figure things out in biology and neuroscience. Perhaps a
| biologist won't do the best job at fixing a radio [1], but they
| might do alright debugging neural networks.
|
| 1: https://www.cell.com/cancer-
| cell/pdf/S1535-6108(02)00133-2.p...
| rhelz wrote:
| Since complex systems are composed of simpler systems, seems like
| for any sufficiently complex system you'd be able to find subsets
| of it which are isomorphic which any sufficiently simple system.
| LudwigNagasena wrote:
| It's notable how successful LLMs despite the lack of any
| linguistic tools in their architectures. It would be interesting
| to know how different a model would be if it operated on eg
| dependency trees instead of the linear list of tokens. Surely,
| the question of "a/an" would be solved with ease as the model
| would be required to come up with a noun token before choosing
| its determiner. I wonder if the developers of LLMs explored those
| approaches but found them infeasible due to large preprocessing
| times, immaturity of such tools and/or little benefit.
| lalos wrote:
| Not relevant but:
|
| "Every time I fire a linguist, the performance of the speech
| recognizer goes up". - Frederick Jelinek
| jojobas wrote:
| Surely the remaining linguists muster up some improvements
| out of fear for their jobs!
___________________________________________________________________
(page generated 2023-02-16 23:00 UTC)