[HN Gopher] How do transformers work?
       ___________________________________________________________________
        
       How do transformers work?
        
       Author : willem19x
       Score  : 83 points
       Date   : 2024-02-04 17:24 UTC (5 hours ago)
        
 (HTM) web link (nintyzeros.substack.com)
 (TXT) w3m dump (nintyzeros.substack.com)
        
       | hasty_pudding wrote:
       | Arent they aliens? So their technology to transform between
       | vehicle and robot would be somewhat alien to humans.
        
         | DaveExeter wrote:
         | Transformers use magnetic fields to transform electricity.
         | 
         | That's why Tesla's AC system beat out Edison's DC system.
        
         | speed_spread wrote:
         | Even as a kid, I was scared of thinking about what happened to
         | humans in the vehicles when they transformed. You'd always be
         | one bad pivot away from being crushed to a pulp.
        
       | mccrory wrote:
       | Does this apply to both types of transformers? Decepticons and
       | Autobots?
        
         | sbarre wrote:
         | I would actually love to see an article on the design process
         | for transforming toys like Transformers.
         | 
         | It seems like such a cool design feat to come up with those. At
         | least I thought so as a kid...
        
         | LordAtlas wrote:
         | There's more to them than meets the eye.
        
         | myself248 wrote:
         | How do you determine the required permeability of the core
         | material?
        
         | the_doctah wrote:
         | You have something against Maximals and Predacons?
        
       | visarga wrote:
       | No, they use two coils around a metallic core. I made a few when
       | I was a kid. I know.
        
         | quickthrower2 wrote:
         | No, they are a type that takes a monad as a type parameter and
         | creates a new monad that extends the original monads
         | capabilities. The downside is the need to use lift all the time
         | to access the outer monad.
        
       | __loam wrote:
       | Time for the weekly LLM explainer.
        
         | nerdponx wrote:
         | Writing a "how does X work" post is a good way to learn about
         | X. And when X is very popular, lots of people will be writing
         | and sharing those posts, both for their own study and just for
         | clicks.
        
           | __loam wrote:
           | Hence, the weekly LLM explainer at the top of the hackernews
           | feed.
        
           | totetsu wrote:
           | How does that work?
        
         | sva_ wrote:
         | Waiting for @dang to post a list
        
       | Beefin wrote:
       | i want to see a "how do State Spaces work" paper
        
         | visarga wrote:
         | Oh that RNN-CNN duality is such a nice math trick. You can do
         | recurrence in parallel.
        
       | wojciem wrote:
       | Is it only me, or after reading this article with a lot of high-
       | level, vague phrases and anecdotes - skipping the actual essence
       | of many smart tricks making transformers computationally
       | efficient - it is actually harder to grasp how transformers
       | "really work".
       | 
       | I recommend videos from Andrej Karpathy on this topic. Well
       | delivered, clearly explaining main techniques and providing
       | python implementation
        
         | kgeist wrote:
         | There's also this type of articles where the first half of the
         | article is easily understandable by a layman but then they
         | suddenly drop a lot of jargon and math formulas and you get
         | completely lost.
        
           | Lerc wrote:
           | A friend once described these kin of descriptions by analogy
           | with a recipie that went;
           | 
           | Recipie for buns, First you need flour, this is a white fined
           | grained powder that is produced from ground wheat that can be
           | acquired by exchanging for money (a standardised convention
           | for storing value) at a store which contains many such
           | products. When mixed with the raising agent and other
           | ingredients you should remove the buns from the oven when
           | golden brown.
        
         | Lerc wrote:
         | Agreed, I have made my own shakespeare babbler following
         | Karpathy's videos. I have a decent understanding of the
         | structure and process but I don't really grasp how they work.
         | 
         | It's obvious how the error reduces, but I feel like there's
         | something semanticly going on that isn't directly expressed in
         | the code.
        
       | bilsbie wrote:
       | Is it normal not to come away from these with any kind of
       | intuition like "ah, that's why transformers work so well"
       | 
       | Instead I just come away feeling like it's a Frankenstein of
       | different matrices and statistics.
        
         | hiddencost wrote:
         | The reason they're good is that they're not recurrent: you can
         | do attention over all of the sequence at once.
         | 
         | Recurrent models had two flaws: (1) vanishing gradient problem
         | (2) not parallelizable
        
       | fancyfredbot wrote:
       | Why do we add the positional encoding to the original vector
       | instead of appending it? Doesn't adding positional data to the
       | embedding destroy information?
        
       | bluescrn wrote:
       | Wasn't there a better name that could have been used for these
       | things?
       | 
       | One that wasn't already used for an extremely common electrical
       | device, as well as a toy/movie franchise?
        
       | zone411 wrote:
       | Articles like this pop up often on HN but I don't think they can
       | lead to genuine comprehension. Doubtful that people outside ML
       | circles will dive into Mamba or analogous architectures if they
       | supplant transformers. They're just harder to understand.
        
       | jimmySixDOF wrote:
       | So if this article is of interest to you then this was a good
       | discussion of a nice 3D visualization of a small LLM (NanoGPT) in
       | motion ...
       | 
       | LLM Visualization https://news.ycombinator.com/item?id=38505211
       | (https://bbycroft.net/llm)
        
       ___________________________________________________________________
       (page generated 2024-02-04 23:00 UTC)