[HN Gopher] Let's Think Dot by Dot: Hidden Computation in Transf...
       ___________________________________________________________________
        
       Let's Think Dot by Dot: Hidden Computation in Transformer Language
       Models
        
       Author : Jimmc414
       Score  : 67 points
       Date   : 2024-04-27 19:28 UTC (3 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | diziet wrote:
       | This is a surprising result to me, given that (in my mind) the
       | method simply does a few more forward passes, without encoding or
       | transferring meaningful state between each pass.
        
         | sdenton4 wrote:
         | You get embeddings at every activation layer of the network, at
         | every token. That's extra state accessible to the network when
         | running in recurrent 'generate the next token' mode.
        
           | ehsanu1 wrote:
           | How much extra state and computation is it per token exactly?
           | Can we account for the improvement in just those terms?
        
         | ehsanu1 wrote:
         | I've only read the abstract, but also find this strange. I
         | wonder if this is just tapping into the computational chains
         | that are already available when tokens are further away, due to
         | the positional encodings being trained that way. If so, that
         | makes the reasoning/modeling powers of LLMs even more
         | impressive and inscrutable.
        
         | dist-epoch wrote:
         | You can transfer some state just through dots. The dot count
         | could mean "the first n ideas do not work, analyze the n+1 one,
         | if that's bad, emit another dot"
        
         | pyinstallwoes wrote:
         | Can't anything be compressed into one word by comparison?
        
       | rgbrgb wrote:
       | i found a nice thread-level walkthrough of this paper by the
       | first coauthor here:
       | https://twitter.com/jacob_pfau/status/1783951795238441449
        
       ___________________________________________________________________
       (page generated 2024-04-27 23:00 UTC)