[HN Gopher] Mapping the semantic void: Strange goings-on in GPT ...
       ___________________________________________________________________
        
       Mapping the semantic void: Strange goings-on in GPT embedding
       spaces
        
       Author : georgehill
       Score  : 87 points
       Date   : 2023-12-19 13:48 UTC (9 hours ago)
        
 (HTM) web link (www.lesswrong.com)
 (TXT) w3m dump (www.lesswrong.com)
        
       | PaulHoule wrote:
       | Should try asking it to define things in French. Korean, etc.
        
       | empath-nirvana wrote:
       | That is _profoundly_ interesting, and I think a serious study of
       | that is going to reveal a lot about human (or at least english
       | speaking) psychology.
        
         | kridsdale1 wrote:
         | I would love to compare the results across many languages to
         | find the language concepts that are truly universal and
         | fundamental to all people.
         | 
         | I suspect they will be the "antiquated" or "child-like"
         | concepts identified here: the things that matter to hunter-
         | gatherer people.
        
           | lainga wrote:
           | sort-of related is https://en.wikipedia.org/wiki/Swadesh_list
        
       | JPLeRouzic wrote:
       | Please can someone explain the gist of this article to a mere
       | human with no PhD in AI?
        
         | JPLeRouzic wrote:
         | I asked ChatGPT, and it reluctantly told:
         | 
         |  _" The phenomena discussed in the text are more about the
         | inherent characteristics of the model's representation space
         | rather than the input data being flawed."_
         | 
         | It took three attempts, the first two were merely a
         | rephrasing/paraphrasing of the abstract.
        
         | ta988 wrote:
         | Tokens (parts of words, words, symbols) are represented as
         | vectors.
         | 
         | They say there are nokens which are vectors that seem to
         | represent something because they are organized in that space in
         | a regular manner.
         | 
         | They think those vectors may have something interesting about
         | them because they seem to be somewhat organized and related to
         | each other.
        
           | ta988 wrote:
           | Also the seem to correspond to things that exist in our world
           | and curiously like things that have existed for a long time,
           | not too recent things.
        
             | JPLeRouzic wrote:
             | Please could you point out where these strange definitions
             | are found in the article?
        
               | great_psy wrote:
               | Search for the string "children" in the context of
               | children books/stories
        
               | JPLeRouzic wrote:
               | Thanks Great_psy.
        
             | lainga wrote:
             | LindySpace?
        
           | JPLeRouzic wrote:
           | Thanks for taking care to explain. But aren't all embedded
           | vectors representing something organized?
        
             | ta988 wrote:
             | "This is described, and then the remaining expanse of the
             | embedding space is explored by using simple prompts to
             | elicit definitions for non-token custom embedding vectors
             | (so-called "nokens")"
             | 
             | so they make new vectors
        
         | netruk44 wrote:
         | At a high level:
         | 
         | It's an article observing how GPT-J token embeddings are
         | positioned in 'embedding space', with some connections drawn to
         | GPT 3.
         | 
         | They then experiment with having GPT-J provide definitions for
         | "nokens" (A "noken" is basically a made up embedding created by
         | modifying the embedding generated from a real token) to see
         | what happens when a model is presented with a novel embedding
         | outside of the trained embedding space to see how it interprets
         | them.
         | 
         | Diving in a little more:
         | 
         | An observation from the article is that the first letter of a
         | word represented by an embedding is actually encoded into that
         | embedding such that you could identify what letter any
         | embedding starts with, with 98% certainty.
         | 
         | They observe this linear relationship and devise an experiment.
         | They ask GPT-J what letter the word "icon" starts with, and it
         | correctly replies "I". They then create a "noken" for the word
         | "icon" and modify it so that it no longer represents that the
         | word starts with "I" and ask GPT-J again what letter this
         | "noken" starts with. GPT-J then incorrectly replies that the
         | word "icon" does not start with the letter "I".
         | 
         | So then they pose the question "can the model still define the
         | word 'broccoli' even if we shift the first letter away from
         | 'B'?" and the answer is almost always yes, changing the
         | semantic "first letter" of an embedding doesn't change the
         | model's ability to understand the embedding.
         | 
         | They then do a series of experiments asking GPT-J to provide
         | the "typical definition for the word '<noken>'." with the
         | example word 'hate' being used. They gradually shifted the
         | first letter away from 'H', and see that up until very high
         | levels of modification, the model can still give a normal
         | definition for the modified 'hate' noken ("a strong feeling of
         | dislike or hostility.").
         | 
         | At extreme modifications to the noken, the model starts
         | providing strange definitions ("a person who is not a member of
         | a particular group.", "a period of time during which a person
         | or thing is in a state of being").
         | 
         | They repeat the experiment and note that this behavior of the
         | definition keeping stable up until 'collapse' is common across
         | many tokens:
         | 
         | > ...usually ending up with something about a person who isn't
         | a member of a group by k = 100, having passed through one or
         | more other themes involving things like Royal families, places
         | of refuge, small round holes and yellowish-white things, to
         | name a few of the baffling tropes that began to appear
         | regularly.
        
         | uoaei wrote:
         | Don't believe everything you read on the internet just because
         | it's written in an authoritative voice on a website you've
         | heard of.
        
           | JPLeRouzic wrote:
           | Ah, ah, I agree!
        
         | wing-_-nuts wrote:
         | I try to keep up with the space over on r/locallamma and I've
         | never seen such a word salad. I was convinced the author was
         | writing satire until I got a few paragraphs in and it just
         | ...didn't stop.
        
           | JPLeRouzic wrote:
           | Thanks that's my impression also: For me, it seems they did
           | not discover a general feature of LLM, rather they discovered
           | a feature they have introduced in their instance of GTP-J.
           | 
           | That said I am a noob about LLMs.
        
       | kgc wrote:
       | Would be interesting to see how changing the tokenization
       | strategy -- whole words or even phrases instead of traditional
       | tokens -- changes the results.
        
       | nybsjytm wrote:
       | Based on the first three figures, it seems like the author isn't
       | familiar at all with probability in high-dimensional spaces,
       | where these phenomena are exactly what you'd expect. Because of
       | that, the figure with intersecting spheres is a big
       | misinterpretation. I stopped reading there, but I think it's good
       | to generally ignore things you find on lesswrong!
        
         | bryan0 wrote:
         | Can you please explain why this distribution of embeddings is
         | expected for high-dim spaces?
        
           | nybsjytm wrote:
           | It's an example of concentration of measure:
           | https://en.wikipedia.org/wiki/Concentration_of_measure
           | 
           | It's easy to verify this for yourself in a simple example in
           | something like python. Draw from a multivariate normal
           | distribution (or almost anything) in a 1000-dimensional space
           | and look at the norm of your vector. If you do this a lot of
           | times you'll see that the answer is close to the same each
           | time. This even holds true if you replace the vector norm by
           | the distance to an arbitrary fixed point.
           | 
           | edit: here's some code:
           | 
           | import numpy import matplotlib.pyplot norms = [] dists = []
           | for i in range(5000): vec = numpy.random.normal(size=4096)
           | norms.append(numpy.linalg.norm(vec))
           | dists.append(numpy.linalg.norm(vec-numpy.ones(4096)))
           | matplotlib.pyplot.hist(norms) matplotlib.pyplot.show()
           | matplotlib.pyplot.hist(dists) matplotlib.pyplot.show()
           | 
           | (didn't render right, sorry)
        
           | ttul wrote:
           | One way to think about high-dimensional spaces is that there
           | is a huge amount of volume relative to the number of
           | coordinates in the space. Yes, there are infinitely many
           | coordinates; however, each coordinate has more dimensions to
           | it relative to a lower-dimensional space. As you increase the
           | dimensionality, coordinates representing meaningful data
           | spread out, leading to a bizarre situation in which most data
           | points in the space are as far apart as a randomly chosen
           | pair of data points, even if the data points you are
           | measuring the distance between should be "close" to each
           | other.
           | 
           | For instance, consider a unit cube in n dimensions. The
           | volume of this cube (1 unit on each side) remains constant,
           | but the distance from the center to a corner increases with
           | the square root of the number of dimensions. This implies
           | that in high dimensions, most of the volume of a hypercube is
           | located at its corners. This is bizarre to comprehend because
           | it's hard to stop imagining a "cube" in the 3D sense.
           | 
           | Algorithms that rely on distance measures (like k-nearest
           | neighbors or clustering) may perform poorly in high-
           | dimensional spaces without dimensionality reduction
           | techniques like PCA (Principal Component Analysis) or t-SNE
           | (t-Distributed Stochastic Neighbor Embedding). And even with
           | these techniques, you can get very misleading results.
        
             | nybsjytm wrote:
             | >For instance, consider a unit cube in n dimensions. The
             | volume of this cube (1 unit on each side) remains constant,
             | but the distance from the center to a corner increases with
             | the square root of the number of dimensions. This implies
             | that in high dimensions, most of the volume of a hypercube
             | is located at its corners. This is bizarre to comprehend
             | because it's hard to stop imagining a "cube" in the 3D
             | sense.
             | 
             | This is incorrect. If you put a little box of small side
             | length x at each corner, then each box has volume x^n and
             | there are 2^n many of them, so the total volume they take
             | up is 2^nx^n=(2x)^n, which is very small. Maybe what you
             | meant is that most of the volume of a high-dimensional
             | hypersphere is near its surface.
        
               | ttul wrote:
               | You are correct and thank you for the correction.
        
             | kridsdale1 wrote:
             | I can kind of imagine it. There's more "concentration of
             | surface area, per volume" in the sharp corners of a 3-cube
             | than there is in the center of a face.
             | 
             | Surface area is 2-volume.
        
         | BoiledCabbage wrote:
         | Is your argument that a set of points/tokens uniformly randomly
         | distributed in 4096 dimensions would produce the same Euclidean
         | distributions/shapes?
         | 
         | If not, then your point isn't very clear to me.
        
           | nybsjytm wrote:
           | Yes, but the distribution doesn't have to be uniform. It can
           | be almost anything.
        
       | harveywi wrote:
       | Maybe a dumb question: Shouldn't these be called immersions
       | instead of embeddings?
        
         | nybsjytm wrote:
         | No, since there isn't a differentiable structure. And even the
         | use of "embedding" doesn't really match the way mathematicians
         | would use the word.
        
       | great_psy wrote:
       | So I am a bit confused about the part where you go k distance out
       | from the centroids.
       | 
       | Since there are ~5000 dimensions, in which of those dimensions
       | are we moving k out ?
       | 
       | Is the idea you just move, out, in all dimensions such that the
       | final Euclidean distance is k ?
       | 
       | Seems that's how they get multiple samples at those distances.
       | 
       | Either way I think it's more interesting to go out in specific
       | dimensions. Ideally there is a mapping between each dimension and
       | something inherent about the token, like the part where a
       | dimension corresponds with the first word of the token.
       | 
       | We went through this discovery phase when we were generating
       | images using autoencoders, same idea, some of those dimensions
       | would correspond to certain features of the image, so moving
       | along them would change the image output in some predictable way.
       | 
       | Either way, I think the overall structure of those spaces says
       | something about how the human brain works ( given we invented the
       | language). I'm interested to see if anything neurologic can be
       | derived from those vector embeddings.
        
         | panarky wrote:
         | _> Ideally there is a mapping ..._
         | 
         | The complexity and interdependence of dimensions within
         | embeddings make it practically impossible to ascribe specific,
         | human-understandable meanings to individual elements or
         | dimensions.
         | 
         | This research actually adds to that complexity. Rather than
         | making the meaning of individual dimensions more
         | understandable, it shows that the embedding space has a
         | strange, layered structure.
         | 
         | It suggests a peculiar, almost nonsensical organization of
         | concepts at different distances from the central point of
         | typical token embeddings, which doesn't make it easier to
         | pinpoint what each dimension means.
         | 
         | It emphasizes that embeddings capture information in a
         | distributed and highly contextual manner. The fact that
         | embeddings for "nokens" can lead to arbitrary or bizarre
         | categorizations when taken out of the typical token zone
         | underscores that embedding dimensions don't have
         | straightforward, easily interpretable meanings.
        
           | kridsdale1 wrote:
           | My interpretation is that the nokens are novel (un-coined)
           | coordinates in a categorized zone of the vector field. Future
           | linguistics will plant their flag in noken territory as we
           | need new words. This already happened with "noken"!
           | 
           | Noken-space is all the undefined territory full of noise
           | equivalent to the actual visual noise we see in Stable
           | Diffusion when you travel along a vector away from an island
           | of coherence. To put it poetically, concept space is a vast
           | hyperspace sea of random garbage (like unallocated RAM) and
           | there are tiny islands or planets of meaning and value that
           | look like things that matter to humanity.
           | 
           | I don't see the categorizations as being very bizarre.
           | Putting on my amateur anthropology hat, each one described in
           | the paper was clearly an important topic to a "primitive"
           | person. Concerned with survival, people would talk about
           | group dynamics, sharp things, plants/animals, things that
           | look like infections (small flat round yellow white), and
           | places. These are pretty much all you need to talk about.
           | 
           | Looking at an English LLM vector space to understand
           | fundamental principles of language is like looking at human
           | DNA and trying to understand the first eukaryotes. It's been
           | complicated by effectively infinite generations of
           | specializations adding noise.
           | 
           | The probing work described here identifies some common
           | principles that survive through the generations. A commenter
           | above wondered if this was discovering something about the
           | nature of the brain. I think it's the nature of culture.
           | Etymology is the product of culture and history.
        
         | mhink wrote:
         | > Since there are ~5000 dimensions, in which of those
         | dimensions are we moving k out ?
         | 
         | > Is the idea you just move, out, in all dimensions such that
         | the final Euclidean distance is k ?
         | 
         | If I understand correctly, the basic idea is that in earlier
         | experiments, they were able to use a relatively-simple
         | technique to come up with what they call a "probe vector" in
         | the embedding space which represents "the property of a word
         | starting with <letter>". For any given token, the authors
         | established that it was 98% probable that the token's embedding
         | vector would be closer (by cosine similarity) to the "probe
         | vector" representing the first letter of that word than any
         | other probe vector. This is shown in the first graph of the
         | section "A puzzling discovery".
         | 
         | With that in mind, the diagram below that should start making
         | more sense: "emb" is a particular token's embedding vector and
         | "probe" is the probe vector for the token's first letter.
         | "emb_proj" is the projection of "emb" onto "probe".
         | 
         | What they're doing is tweaking the network weights by
         | subtracting multiples of `emb_proj` from `emb` (where the
         | specific multiple is the parameter K), and then seeing how it
         | behaves differently for different values of K.
         | 
         | Their original observation when doing this was that it reliably
         | caused the model to claim that the first letter of the tweaked
         | word was not the letter in question. In _this_ article, they
         | 're trying to figure out how far they can push the tweak and
         | still get reasonably accurate definitions of a token.
         | 
         | What they discovered is that when they push a token's embedding
         | vector further and further out along its "first-letter vector"
         | and ask the network to define that word, the definitions it
         | provides seem to follow particular themes during different
         | regimes of K.
        
       | kridsdale1 wrote:
       | This is the first post in LessWrong that I found interesting and
       | not insipid navel gazing.
       | 
       | My interpretation of the finding is that the LLM training has
       | revealed the primordial nature of human language, akin to the
       | primary function of genetic code predominately being to ensure
       | the ongoing mechanical homeostasis of the organism.
       | 
       | Language's evolutionary purpose is to drive group-fitness. Group
       | management fundamentals are thus the predominant themes in the
       | undefined latent space.
       | 
       | Who is in our tribe? Who should we trust? Who should we kill? Who
       | leads us?
       | 
       | These matters likely dominated human speech for a thousand
       | millennia.
        
         | andrewflnr wrote:
         | I would at minimum want to see the analysis replicated across a
         | variety of languages before I started drawing conclusions about
         | "the primordial nature of human language". In particular,
         | languages with no known relation to English; Chinese would be a
         | good one, far from the Indo-European tree and with a big
         | corpus. The membership-centric is certainly appealing, but
         | really it's too appealing to let ourselves take it seriously
         | without counterbalancing it with a high standard of evidence.
        
         | __loam wrote:
         | > My interpretation of the finding is that the LLM training has
         | revealed the primordial nature of human language, akin to the
         | primary function of genetic code predominately being to ensure
         | the ongoing mechanical homeostasis of the organism.
         | 
         | We've truly reached peak hype. It's a token predictor dude.
        
       | flir wrote:
       | Someone shoot me down if I'm wrong:
       | 
       | In this model (one of many infinitely many possible models) a
       | word is a point in 4096-space.
       | 
       | This article is trying to tease out the structure of those
       | points, and suggesting we might be able to conclude something
       | about natural language from that structure - like looking at the
       | large-scale structure of the Universe and deriving information
       | about the Big Bang.
       | 
       | Obvious questions: is the large-scale structure conserved across
       | languages?
       | 
       | What happens if we train on random tokens - a corpus of noise?
       | Does structure still emerge?
       | 
       | It might be interesting, it might be an artifact. I'd be curious
       | to know what happens when you only examine complete words.
        
         | torvaney wrote:
         | Conserved enough to do word translation:
         | https://engineering.fb.com/2018/08/31/ai-research/unsupervis...
        
       | jakedahn wrote:
       | Has anyone done this analysis for other llms like llama2?
        
       | chpatrick wrote:
       | I think that's really interesting but one thing that I'm not sure
       | about is that a lot of tokens are just fragments of words like
       | "ei", not whole words. Can we really makes conclusions about
       | those embeddings?
        
       ___________________________________________________________________
       (page generated 2023-12-19 23:02 UTC)