[HN Gopher] How do transformers work?
___________________________________________________________________
How do transformers work?
Author : willem19x
Score : 83 points
Date : 2024-02-04 17:24 UTC (5 hours ago)
(HTM) web link (nintyzeros.substack.com)
(TXT) w3m dump (nintyzeros.substack.com)
| hasty_pudding wrote:
| Arent they aliens? So their technology to transform between
| vehicle and robot would be somewhat alien to humans.
| DaveExeter wrote:
| Transformers use magnetic fields to transform electricity.
|
| That's why Tesla's AC system beat out Edison's DC system.
| speed_spread wrote:
| Even as a kid, I was scared of thinking about what happened to
| humans in the vehicles when they transformed. You'd always be
| one bad pivot away from being crushed to a pulp.
| mccrory wrote:
| Does this apply to both types of transformers? Decepticons and
| Autobots?
| sbarre wrote:
| I would actually love to see an article on the design process
| for transforming toys like Transformers.
|
| It seems like such a cool design feat to come up with those. At
| least I thought so as a kid...
| LordAtlas wrote:
| There's more to them than meets the eye.
| myself248 wrote:
| How do you determine the required permeability of the core
| material?
| the_doctah wrote:
| You have something against Maximals and Predacons?
| visarga wrote:
| No, they use two coils around a metallic core. I made a few when
| I was a kid. I know.
| quickthrower2 wrote:
| No, they are a type that takes a monad as a type parameter and
| creates a new monad that extends the original monads
| capabilities. The downside is the need to use lift all the time
| to access the outer monad.
| __loam wrote:
| Time for the weekly LLM explainer.
| nerdponx wrote:
| Writing a "how does X work" post is a good way to learn about
| X. And when X is very popular, lots of people will be writing
| and sharing those posts, both for their own study and just for
| clicks.
| __loam wrote:
| Hence, the weekly LLM explainer at the top of the hackernews
| feed.
| totetsu wrote:
| How does that work?
| sva_ wrote:
| Waiting for @dang to post a list
| Beefin wrote:
| i want to see a "how do State Spaces work" paper
| visarga wrote:
| Oh that RNN-CNN duality is such a nice math trick. You can do
| recurrence in parallel.
| wojciem wrote:
| Is it only me, or after reading this article with a lot of high-
| level, vague phrases and anecdotes - skipping the actual essence
| of many smart tricks making transformers computationally
| efficient - it is actually harder to grasp how transformers
| "really work".
|
| I recommend videos from Andrej Karpathy on this topic. Well
| delivered, clearly explaining main techniques and providing
| python implementation
| kgeist wrote:
| There's also this type of articles where the first half of the
| article is easily understandable by a layman but then they
| suddenly drop a lot of jargon and math formulas and you get
| completely lost.
| Lerc wrote:
| A friend once described these kin of descriptions by analogy
| with a recipie that went;
|
| Recipie for buns, First you need flour, this is a white fined
| grained powder that is produced from ground wheat that can be
| acquired by exchanging for money (a standardised convention
| for storing value) at a store which contains many such
| products. When mixed with the raising agent and other
| ingredients you should remove the buns from the oven when
| golden brown.
| Lerc wrote:
| Agreed, I have made my own shakespeare babbler following
| Karpathy's videos. I have a decent understanding of the
| structure and process but I don't really grasp how they work.
|
| It's obvious how the error reduces, but I feel like there's
| something semanticly going on that isn't directly expressed in
| the code.
| bilsbie wrote:
| Is it normal not to come away from these with any kind of
| intuition like "ah, that's why transformers work so well"
|
| Instead I just come away feeling like it's a Frankenstein of
| different matrices and statistics.
| hiddencost wrote:
| The reason they're good is that they're not recurrent: you can
| do attention over all of the sequence at once.
|
| Recurrent models had two flaws: (1) vanishing gradient problem
| (2) not parallelizable
| fancyfredbot wrote:
| Why do we add the positional encoding to the original vector
| instead of appending it? Doesn't adding positional data to the
| embedding destroy information?
| bluescrn wrote:
| Wasn't there a better name that could have been used for these
| things?
|
| One that wasn't already used for an extremely common electrical
| device, as well as a toy/movie franchise?
| zone411 wrote:
| Articles like this pop up often on HN but I don't think they can
| lead to genuine comprehension. Doubtful that people outside ML
| circles will dive into Mamba or analogous architectures if they
| supplant transformers. They're just harder to understand.
| jimmySixDOF wrote:
| So if this article is of interest to you then this was a good
| discussion of a nice 3D visualization of a small LLM (NanoGPT) in
| motion ...
|
| LLM Visualization https://news.ycombinator.com/item?id=38505211
| (https://bbycroft.net/llm)
___________________________________________________________________
(page generated 2024-02-04 23:00 UTC)