[HN Gopher] Transformers without normalization
       ___________________________________________________________________
        
       Transformers without normalization
        
       Author : kaycebasques
       Score  : 35 points
       Date   : 2025-07-24 14:48 UTC (8 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | gnabgib wrote:
       | Discussion (260 points, 4 months ago, 32 comments)
       | https://news.ycombinator.com/item?id=43369633
        
       | godelski wrote:
       | I think other than the title being a bit misleading, the paper is
       | good. I say misleading because they replace Layer Normalization
       | with a tanh function, which still bounds the range to [-1,1].
       | Plenty of people would call that normalization (an unfortunately
       | overloaded term).
       | 
       | While the result isn't too surprising it has a good ablation
       | study and helps build confidence in the mechanism. It's simple
       | and quick to implement, but I don't find that a disadvantage.
       | Arguably this is not novel, but sometimes it is worth revisiting
       | things when the rest of the environment has changed and I think
       | the study being thorough makes it useful to the community.
       | 
       | The project page is here[0] which will give you a very quick
       | understanding of the paper.
       | 
       | [0] https://jiachenzhu.github.io/DyT/
        
         | giancarlostoro wrote:
         | > (an unfortunately overloaded term)
         | 
         | I mentioned normalization in an interview, and they had no idea
         | what I was talking about given my context, they were thinking
         | of database normalization, I was thinking of DATA
         | normalization, where you uppercase all inputs for e.g. an
         | email, so when they login, casing doesn't matter, since you'll
         | uppercase it when you check against the database. I'm sure
         | there's a zillion other normalization methods for different
         | things.
        
       ___________________________________________________________________
       (page generated 2025-07-24 23:01 UTC)