[HN Gopher] Let's try to understand AI monosemanticity
___________________________________________________________________
Let's try to understand AI monosemanticity
Author : bananaflag
Score : 48 points
Date : 2023-11-27 21:04 UTC (1 hours ago)
(HTM) web link (www.astralcodexten.com)
(TXT) w3m dump (www.astralcodexten.com)
| turtleyacht wrote:
| By the same token, thinking in memes all the time may be a form
| of impoverished cognition.
|
| Or, is it enhanced cognition, on the part of the interpreter
| having to unpack much from little?
| aatd86 wrote:
| Some kind of single context abstract interpretation maybe.
| throwanem wrote:
| Darmok and Jalad at Mar-a-Lago.
| erikerikson wrote:
| Before finishing my read, I need to register an objection to the
| opening which reads to me so as to imply it is the only means:
|
| > Researchers simulate a weird type of pseudo-neural-tissue,
| "reward" it a little every time it becomes a little more like the
| AI they want, and eventually it becomes the AI they want.
|
| This isn't the only way. Back propagation is a hack around the
| oversimplification of neural models. By adding a sense of
| location into the network, you get linearly inseparable functions
| learned just fine.
|
| Hopfield networks with Hebbian learning are sufficient and are
| implemented by the existing proofs of concept we have.
| s1gnp0st wrote:
| > Shouldn't the AI be keeping the concept of God, Almighty
| Creator and Lord of the Universe, separate from God-
|
| This seems wrong. God-zilla is using the concept of God as a
| superlative modifier. I would expect a neuron involved in the
| concept of godhood to activate whenever any metaphorical "god-
| of-X" concept is being used.
| Sniffnoy wrote:
| I mean, it's not actually. It's just a somewhat unusual
| transcription (well, originally somewhat unusual, now obviously
| it's the official English name) of what might be more usually
| transcribed as "Gojira".
| s1gnp0st wrote:
| Ah, I thought the Japanese word was just "jira". My mistake.
| postmodest wrote:
| That's an entirely different monster.
| eichin wrote:
| Indeed, but not an entirely unrelated one though - per
| https://en.wikipedia.org/wiki/Jira_(software)#Naming the
| inspiration path was Bugzilla -> Godzilla -> Gojira ->
| Jira (which is why Confluence keeps correcting me when I
| try to spell it JIRA)
| VinLucero wrote:
| I see what you did there.
| lukev wrote:
| There's actually a somewhat reasonable analogy to human cognitive
| processes here, I think, in the sense that humans tend to form
| concepts defined by their connectivity to other concepts (c.f.
| Ferdinand de Saussure & structuralism).
|
| Human brains are also a "black box" in the sense that you can't
| scan/dissect one to build a concept graph.
|
| Neural nets do seem to have some sort of emergent structural
| concept graph, in the case of LLMs it's largely informed by human
| language (because that's what they're trained on.) To an extent,
| we can observe this empirically through their output even if the
| first principles are opaque.
| _as_text wrote:
| I just skimmed through it for now, but it has seemed kinda
| natural to me for a few months now that there would be a deep
| connection between neural networks and differential or algebraic
| geometry.
|
| Each ReLU layer is just a (quasi-)linear transformation, and a
| pass through two layers is basically also a linear
| transformation. If you say you want some piece of information to
| stay (numerically) intact as it passes through the network, you
| say you want that piece of information to be processed in the
| same way in each layer. The groups of linear transformations that
| "all process information in the same way, and their compositions
| do, as well" are basically the Lie groups. Anyone else ever had
| this thought?
|
| I imagine if nothing catastrophic happens we'll have a really
| beautiful theory of all this someday, which I won't create, but
| maybe I'll be able to understand it after a lot of hard work.
| shermantanktop wrote:
| As described in the post, this seems quite analogous to the
| operation of a bloom filter, except each "bit" is more than a
| single bit's worth of information, and the match detection has to
| do some thresholding/ranking to select a winner.
|
| That said, the post is itself clearly summarizing much more
| technical work, so my analogy is resting on shaky ground.
| gmuslera wrote:
| At least the first part reminded me of Hyperion and how AIs
| evolved there (I think the actual explanation is in The Fall of
| Hyperion), smaller but more interconnected "code".
|
| Not sure about actual implementation, but at least for us
| concepts or words are not pure nor isolated, they have multiple
| meanings that collapse into specific ones as you put several
| together
| daveguy wrote:
| All this anthropomorphizing of activation networks strikes me as
| very odd. None of these neurons "want" to do anything. They
| respond to specific input. Maybe humans are the same, but in the
| case of artificial neural networks we at least know it's a simple
| mathematical function. Also, an artificial neuron is nothing like
| a biological neuron. At the most basic -- artificial neurons
| don't "fire" except in direct response to inputs. Biological
| neurons fire _because of their internal state_ , state which is
| modified by biological signaling chemicals. It's like comparing
| apples to gorillas.
___________________________________________________________________
(page generated 2023-11-27 23:00 UTC)