[HN Gopher] Mapping the semantic void: Strange goings-on in GPT ...
___________________________________________________________________
Mapping the semantic void: Strange goings-on in GPT embedding
spaces
Author : georgehill
Score : 87 points
Date : 2023-12-19 13:48 UTC (9 hours ago)
(HTM) web link (www.lesswrong.com)
(TXT) w3m dump (www.lesswrong.com)
| PaulHoule wrote:
| Should try asking it to define things in French. Korean, etc.
| empath-nirvana wrote:
| That is _profoundly_ interesting, and I think a serious study of
| that is going to reveal a lot about human (or at least english
| speaking) psychology.
| kridsdale1 wrote:
| I would love to compare the results across many languages to
| find the language concepts that are truly universal and
| fundamental to all people.
|
| I suspect they will be the "antiquated" or "child-like"
| concepts identified here: the things that matter to hunter-
| gatherer people.
| lainga wrote:
| sort-of related is https://en.wikipedia.org/wiki/Swadesh_list
| JPLeRouzic wrote:
| Please can someone explain the gist of this article to a mere
| human with no PhD in AI?
| JPLeRouzic wrote:
| I asked ChatGPT, and it reluctantly told:
|
| _" The phenomena discussed in the text are more about the
| inherent characteristics of the model's representation space
| rather than the input data being flawed."_
|
| It took three attempts, the first two were merely a
| rephrasing/paraphrasing of the abstract.
| ta988 wrote:
| Tokens (parts of words, words, symbols) are represented as
| vectors.
|
| They say there are nokens which are vectors that seem to
| represent something because they are organized in that space in
| a regular manner.
|
| They think those vectors may have something interesting about
| them because they seem to be somewhat organized and related to
| each other.
| ta988 wrote:
| Also the seem to correspond to things that exist in our world
| and curiously like things that have existed for a long time,
| not too recent things.
| JPLeRouzic wrote:
| Please could you point out where these strange definitions
| are found in the article?
| great_psy wrote:
| Search for the string "children" in the context of
| children books/stories
| JPLeRouzic wrote:
| Thanks Great_psy.
| lainga wrote:
| LindySpace?
| JPLeRouzic wrote:
| Thanks for taking care to explain. But aren't all embedded
| vectors representing something organized?
| ta988 wrote:
| "This is described, and then the remaining expanse of the
| embedding space is explored by using simple prompts to
| elicit definitions for non-token custom embedding vectors
| (so-called "nokens")"
|
| so they make new vectors
| netruk44 wrote:
| At a high level:
|
| It's an article observing how GPT-J token embeddings are
| positioned in 'embedding space', with some connections drawn to
| GPT 3.
|
| They then experiment with having GPT-J provide definitions for
| "nokens" (A "noken" is basically a made up embedding created by
| modifying the embedding generated from a real token) to see
| what happens when a model is presented with a novel embedding
| outside of the trained embedding space to see how it interprets
| them.
|
| Diving in a little more:
|
| An observation from the article is that the first letter of a
| word represented by an embedding is actually encoded into that
| embedding such that you could identify what letter any
| embedding starts with, with 98% certainty.
|
| They observe this linear relationship and devise an experiment.
| They ask GPT-J what letter the word "icon" starts with, and it
| correctly replies "I". They then create a "noken" for the word
| "icon" and modify it so that it no longer represents that the
| word starts with "I" and ask GPT-J again what letter this
| "noken" starts with. GPT-J then incorrectly replies that the
| word "icon" does not start with the letter "I".
|
| So then they pose the question "can the model still define the
| word 'broccoli' even if we shift the first letter away from
| 'B'?" and the answer is almost always yes, changing the
| semantic "first letter" of an embedding doesn't change the
| model's ability to understand the embedding.
|
| They then do a series of experiments asking GPT-J to provide
| the "typical definition for the word '<noken>'." with the
| example word 'hate' being used. They gradually shifted the
| first letter away from 'H', and see that up until very high
| levels of modification, the model can still give a normal
| definition for the modified 'hate' noken ("a strong feeling of
| dislike or hostility.").
|
| At extreme modifications to the noken, the model starts
| providing strange definitions ("a person who is not a member of
| a particular group.", "a period of time during which a person
| or thing is in a state of being").
|
| They repeat the experiment and note that this behavior of the
| definition keeping stable up until 'collapse' is common across
| many tokens:
|
| > ...usually ending up with something about a person who isn't
| a member of a group by k = 100, having passed through one or
| more other themes involving things like Royal families, places
| of refuge, small round holes and yellowish-white things, to
| name a few of the baffling tropes that began to appear
| regularly.
| uoaei wrote:
| Don't believe everything you read on the internet just because
| it's written in an authoritative voice on a website you've
| heard of.
| JPLeRouzic wrote:
| Ah, ah, I agree!
| wing-_-nuts wrote:
| I try to keep up with the space over on r/locallamma and I've
| never seen such a word salad. I was convinced the author was
| writing satire until I got a few paragraphs in and it just
| ...didn't stop.
| JPLeRouzic wrote:
| Thanks that's my impression also: For me, it seems they did
| not discover a general feature of LLM, rather they discovered
| a feature they have introduced in their instance of GTP-J.
|
| That said I am a noob about LLMs.
| kgc wrote:
| Would be interesting to see how changing the tokenization
| strategy -- whole words or even phrases instead of traditional
| tokens -- changes the results.
| nybsjytm wrote:
| Based on the first three figures, it seems like the author isn't
| familiar at all with probability in high-dimensional spaces,
| where these phenomena are exactly what you'd expect. Because of
| that, the figure with intersecting spheres is a big
| misinterpretation. I stopped reading there, but I think it's good
| to generally ignore things you find on lesswrong!
| bryan0 wrote:
| Can you please explain why this distribution of embeddings is
| expected for high-dim spaces?
| nybsjytm wrote:
| It's an example of concentration of measure:
| https://en.wikipedia.org/wiki/Concentration_of_measure
|
| It's easy to verify this for yourself in a simple example in
| something like python. Draw from a multivariate normal
| distribution (or almost anything) in a 1000-dimensional space
| and look at the norm of your vector. If you do this a lot of
| times you'll see that the answer is close to the same each
| time. This even holds true if you replace the vector norm by
| the distance to an arbitrary fixed point.
|
| edit: here's some code:
|
| import numpy import matplotlib.pyplot norms = [] dists = []
| for i in range(5000): vec = numpy.random.normal(size=4096)
| norms.append(numpy.linalg.norm(vec))
| dists.append(numpy.linalg.norm(vec-numpy.ones(4096)))
| matplotlib.pyplot.hist(norms) matplotlib.pyplot.show()
| matplotlib.pyplot.hist(dists) matplotlib.pyplot.show()
|
| (didn't render right, sorry)
| ttul wrote:
| One way to think about high-dimensional spaces is that there
| is a huge amount of volume relative to the number of
| coordinates in the space. Yes, there are infinitely many
| coordinates; however, each coordinate has more dimensions to
| it relative to a lower-dimensional space. As you increase the
| dimensionality, coordinates representing meaningful data
| spread out, leading to a bizarre situation in which most data
| points in the space are as far apart as a randomly chosen
| pair of data points, even if the data points you are
| measuring the distance between should be "close" to each
| other.
|
| For instance, consider a unit cube in n dimensions. The
| volume of this cube (1 unit on each side) remains constant,
| but the distance from the center to a corner increases with
| the square root of the number of dimensions. This implies
| that in high dimensions, most of the volume of a hypercube is
| located at its corners. This is bizarre to comprehend because
| it's hard to stop imagining a "cube" in the 3D sense.
|
| Algorithms that rely on distance measures (like k-nearest
| neighbors or clustering) may perform poorly in high-
| dimensional spaces without dimensionality reduction
| techniques like PCA (Principal Component Analysis) or t-SNE
| (t-Distributed Stochastic Neighbor Embedding). And even with
| these techniques, you can get very misleading results.
| nybsjytm wrote:
| >For instance, consider a unit cube in n dimensions. The
| volume of this cube (1 unit on each side) remains constant,
| but the distance from the center to a corner increases with
| the square root of the number of dimensions. This implies
| that in high dimensions, most of the volume of a hypercube
| is located at its corners. This is bizarre to comprehend
| because it's hard to stop imagining a "cube" in the 3D
| sense.
|
| This is incorrect. If you put a little box of small side
| length x at each corner, then each box has volume x^n and
| there are 2^n many of them, so the total volume they take
| up is 2^nx^n=(2x)^n, which is very small. Maybe what you
| meant is that most of the volume of a high-dimensional
| hypersphere is near its surface.
| ttul wrote:
| You are correct and thank you for the correction.
| kridsdale1 wrote:
| I can kind of imagine it. There's more "concentration of
| surface area, per volume" in the sharp corners of a 3-cube
| than there is in the center of a face.
|
| Surface area is 2-volume.
| BoiledCabbage wrote:
| Is your argument that a set of points/tokens uniformly randomly
| distributed in 4096 dimensions would produce the same Euclidean
| distributions/shapes?
|
| If not, then your point isn't very clear to me.
| nybsjytm wrote:
| Yes, but the distribution doesn't have to be uniform. It can
| be almost anything.
| harveywi wrote:
| Maybe a dumb question: Shouldn't these be called immersions
| instead of embeddings?
| nybsjytm wrote:
| No, since there isn't a differentiable structure. And even the
| use of "embedding" doesn't really match the way mathematicians
| would use the word.
| great_psy wrote:
| So I am a bit confused about the part where you go k distance out
| from the centroids.
|
| Since there are ~5000 dimensions, in which of those dimensions
| are we moving k out ?
|
| Is the idea you just move, out, in all dimensions such that the
| final Euclidean distance is k ?
|
| Seems that's how they get multiple samples at those distances.
|
| Either way I think it's more interesting to go out in specific
| dimensions. Ideally there is a mapping between each dimension and
| something inherent about the token, like the part where a
| dimension corresponds with the first word of the token.
|
| We went through this discovery phase when we were generating
| images using autoencoders, same idea, some of those dimensions
| would correspond to certain features of the image, so moving
| along them would change the image output in some predictable way.
|
| Either way, I think the overall structure of those spaces says
| something about how the human brain works ( given we invented the
| language). I'm interested to see if anything neurologic can be
| derived from those vector embeddings.
| panarky wrote:
| _> Ideally there is a mapping ..._
|
| The complexity and interdependence of dimensions within
| embeddings make it practically impossible to ascribe specific,
| human-understandable meanings to individual elements or
| dimensions.
|
| This research actually adds to that complexity. Rather than
| making the meaning of individual dimensions more
| understandable, it shows that the embedding space has a
| strange, layered structure.
|
| It suggests a peculiar, almost nonsensical organization of
| concepts at different distances from the central point of
| typical token embeddings, which doesn't make it easier to
| pinpoint what each dimension means.
|
| It emphasizes that embeddings capture information in a
| distributed and highly contextual manner. The fact that
| embeddings for "nokens" can lead to arbitrary or bizarre
| categorizations when taken out of the typical token zone
| underscores that embedding dimensions don't have
| straightforward, easily interpretable meanings.
| kridsdale1 wrote:
| My interpretation is that the nokens are novel (un-coined)
| coordinates in a categorized zone of the vector field. Future
| linguistics will plant their flag in noken territory as we
| need new words. This already happened with "noken"!
|
| Noken-space is all the undefined territory full of noise
| equivalent to the actual visual noise we see in Stable
| Diffusion when you travel along a vector away from an island
| of coherence. To put it poetically, concept space is a vast
| hyperspace sea of random garbage (like unallocated RAM) and
| there are tiny islands or planets of meaning and value that
| look like things that matter to humanity.
|
| I don't see the categorizations as being very bizarre.
| Putting on my amateur anthropology hat, each one described in
| the paper was clearly an important topic to a "primitive"
| person. Concerned with survival, people would talk about
| group dynamics, sharp things, plants/animals, things that
| look like infections (small flat round yellow white), and
| places. These are pretty much all you need to talk about.
|
| Looking at an English LLM vector space to understand
| fundamental principles of language is like looking at human
| DNA and trying to understand the first eukaryotes. It's been
| complicated by effectively infinite generations of
| specializations adding noise.
|
| The probing work described here identifies some common
| principles that survive through the generations. A commenter
| above wondered if this was discovering something about the
| nature of the brain. I think it's the nature of culture.
| Etymology is the product of culture and history.
| mhink wrote:
| > Since there are ~5000 dimensions, in which of those
| dimensions are we moving k out ?
|
| > Is the idea you just move, out, in all dimensions such that
| the final Euclidean distance is k ?
|
| If I understand correctly, the basic idea is that in earlier
| experiments, they were able to use a relatively-simple
| technique to come up with what they call a "probe vector" in
| the embedding space which represents "the property of a word
| starting with <letter>". For any given token, the authors
| established that it was 98% probable that the token's embedding
| vector would be closer (by cosine similarity) to the "probe
| vector" representing the first letter of that word than any
| other probe vector. This is shown in the first graph of the
| section "A puzzling discovery".
|
| With that in mind, the diagram below that should start making
| more sense: "emb" is a particular token's embedding vector and
| "probe" is the probe vector for the token's first letter.
| "emb_proj" is the projection of "emb" onto "probe".
|
| What they're doing is tweaking the network weights by
| subtracting multiples of `emb_proj` from `emb` (where the
| specific multiple is the parameter K), and then seeing how it
| behaves differently for different values of K.
|
| Their original observation when doing this was that it reliably
| caused the model to claim that the first letter of the tweaked
| word was not the letter in question. In _this_ article, they
| 're trying to figure out how far they can push the tweak and
| still get reasonably accurate definitions of a token.
|
| What they discovered is that when they push a token's embedding
| vector further and further out along its "first-letter vector"
| and ask the network to define that word, the definitions it
| provides seem to follow particular themes during different
| regimes of K.
| kridsdale1 wrote:
| This is the first post in LessWrong that I found interesting and
| not insipid navel gazing.
|
| My interpretation of the finding is that the LLM training has
| revealed the primordial nature of human language, akin to the
| primary function of genetic code predominately being to ensure
| the ongoing mechanical homeostasis of the organism.
|
| Language's evolutionary purpose is to drive group-fitness. Group
| management fundamentals are thus the predominant themes in the
| undefined latent space.
|
| Who is in our tribe? Who should we trust? Who should we kill? Who
| leads us?
|
| These matters likely dominated human speech for a thousand
| millennia.
| andrewflnr wrote:
| I would at minimum want to see the analysis replicated across a
| variety of languages before I started drawing conclusions about
| "the primordial nature of human language". In particular,
| languages with no known relation to English; Chinese would be a
| good one, far from the Indo-European tree and with a big
| corpus. The membership-centric is certainly appealing, but
| really it's too appealing to let ourselves take it seriously
| without counterbalancing it with a high standard of evidence.
| __loam wrote:
| > My interpretation of the finding is that the LLM training has
| revealed the primordial nature of human language, akin to the
| primary function of genetic code predominately being to ensure
| the ongoing mechanical homeostasis of the organism.
|
| We've truly reached peak hype. It's a token predictor dude.
| flir wrote:
| Someone shoot me down if I'm wrong:
|
| In this model (one of many infinitely many possible models) a
| word is a point in 4096-space.
|
| This article is trying to tease out the structure of those
| points, and suggesting we might be able to conclude something
| about natural language from that structure - like looking at the
| large-scale structure of the Universe and deriving information
| about the Big Bang.
|
| Obvious questions: is the large-scale structure conserved across
| languages?
|
| What happens if we train on random tokens - a corpus of noise?
| Does structure still emerge?
|
| It might be interesting, it might be an artifact. I'd be curious
| to know what happens when you only examine complete words.
| torvaney wrote:
| Conserved enough to do word translation:
| https://engineering.fb.com/2018/08/31/ai-research/unsupervis...
| jakedahn wrote:
| Has anyone done this analysis for other llms like llama2?
| chpatrick wrote:
| I think that's really interesting but one thing that I'm not sure
| about is that a lot of tokens are just fragments of words like
| "ei", not whole words. Can we really makes conclusions about
| those embeddings?
___________________________________________________________________
(page generated 2023-12-19 23:02 UTC)