[HN Gopher] "Attention", "Transformers", in Neural Network "Larg...
___________________________________________________________________
"Attention", "Transformers", in Neural Network "Large Language
Models"
Author : macleginn
Score : 21 points
Date : 2023-12-24 21:10 UTC (1 hours ago)
(HTM) web link (bactra.org)
(TXT) w3m dump (bactra.org)
| low_tech_love wrote:
| Really interesting, I like the kind of "stream of consciousness"
| approach to the content, it's refreshing. What's also interesting
| is the fact that the author felt the need to apologize and
| preface it with some forced deference due to some kind of
| internet bashing he certainly received. I hope this doesn't
| discourage him to keep publishing his notes (although I think it
| will). Why are we getting so human-phobic?
| defrost wrote:
| It's an understanderable deference when stumbling through a
| huge new field and its freshly minted jargon when tidying up
| and tying the new jargon to long standing terms in older
| fields.
|
| "As near as I can tell when the new guard says X they're pretty
| much talking about what we called Y"
|
| Does 'attention' in the AI bleeding edge really correspond to
| kernal smoothing | mapping attenuation | damping ?
|
| This is (one of) the elephants in a darkened room that Cosma is
| groping around and showing his thoughts as he goes.
|
| > I hope this doesn't discourage him to keep publishing his
| notes
|
| Doubtful, aside from the inevitable attenuation with age, he's
| been airing his thoughts for at least two decades, eg: his
| wonderful little:
|
| _A Rare Blend of Monster Raving Egomania and Utter Batshit
| Insanity_ (2002)
|
| http://bactra.org/reviews/wolfram/
| panarchy wrote:
| It is nice and it's interesting how if you go read stuff like
| Einstein's general relativity paper you (or at least I did)
| find that it's actually quite similar and not so dense.
| brcmthrowaway wrote:
| Biggest takeaway: extraction of prompts seems to be complete
| bullshit.
| haltist wrote:
| This person doesn't understand that large neural networks are
| somewhat conscious and a stepping stone to AGI. Why else would
| OpenAI be worth so much money if it wasn't a stepping stone to
| AGI? No one can answer this without making it obvious they do not
| understand that large numbers can be conscious and sentient.
| Checkmate atheists.
| ChainOfFools wrote:
| I probably would agree with the unsnarkified version of what
| you're saying to some extent, but I think it's worth mentioning
| that the argument you seem to be dismissing can take a much
| stronger form, questioning latent premises about free will by
| proposing that _neither_ computers nor humans are sentient,
| that they are both entirely deterministic and utimately amount
| to interference patterns of ancient thermodynamic gradients
| created in the formation of the universe.
| seydor wrote:
| And what do the different heads represent? Why are query, key,
| and values simply linear transforms of the input.
___________________________________________________________________
(page generated 2023-12-24 23:00 UTC)