[HN Gopher] YaFSDP: a sharded data parallelism framework, faster...
       ___________________________________________________________________
        
       YaFSDP: a sharded data parallelism framework, faster for pre-
       training LLMs
        
       Author : wiradikusuma
       Score  : 122 points
       Date   : 2024-06-18 11:54 UTC (11 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | dayeye2006 wrote:
       | Any idea on what are the main tricks used to achieve gains over
       | fsdp?
        
         | albertzeyer wrote:
         | The blog post seems to contain more details and the core ideas:
         | https://medium.com/yandex/yafsdp-a-tool-for-faster-llm-train...
        
           | az226 wrote:
           | Odd that they don't expand on this:
           | 
           | In Yandex's pre-trainings, the implementation of YaFSDP along
           | with other memory optimization strategies resulted in a speed
           | gain of 45%.
        
       | codetrotter wrote:
       | I was surprised to see that the Ya part meant "Yet another". I
       | mean, I've seen it before in many acronyms. But it's pretty
       | tongue in cheek of them to do that here since one would expect it
       | was just because it was made by Yandex.
        
         | shadow28 wrote:
         | Doesn't Yandex itself come from "Yet Another Indexer"?
        
           | codetrotter wrote:
           | Ah, so it does as well! I only knew that it was a portmanteau
           | of "Ia" and "index". As in "I index". Which it also is.
        
             | alexey-salmin wrote:
             | There's a third explanation of "Iandex" being "iazykovoi
             | indeks" i.e. "language-aware index". Russian language have
             | complicated morphology with three genders and six
             | grammatical cases, somewhat similar to Latin. Searching by
             | an exact word-match almost never gives good results, and
             | neither Yahoo nor AltaVista could offer any better in 1997
             | -- hence Yandex was built.
        
               | aristus wrote:
               | You mean, Yet Another Human-Organized Ontology?
        
         | mikrl wrote:
         | I was expecting it to be a Russian acronym starting with the
         | letter Ia which is pronounced Ya. It acquired its backward R
         | glyph when it was changed from an old Slavic letter I cannot
         | draw.
        
           | deaddodo wrote:
           | What do you mean that you "cannot draw" them? This is a
           | digital medium and both (well one, the other is half
           | supported) variants are valid Unicode glyphs:
           | 
           |  / E
           | 
           | Or do you mean you literally can't draw them?
        
             | Tade0 wrote:
             | It's an idiomatic expression in slavic languages,
             | indicating that the shape is particularly complex.
        
             | mikrl wrote:
             | I could not reproduce it by hand without a reference nor do
             | I have a keyboard installed which offers it as a symbol,
             | nor was I going to look it up to add the spice to a shower
             | thought tier HN comment.
             | 
             | I see you have provided it, making it more accessible for
             | my future use, at least on the timeframe of this thread
             | being in my recent HN activity.
        
       ___________________________________________________________________
       (page generated 2024-06-18 23:00 UTC)