[HN Gopher] Implementing Weight-Decomposed Low-Rank Adaptation (...
       ___________________________________________________________________
        
       Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from
       Scratch
        
       Author : rasbt
       Score  : 47 points
       Date   : 2024-02-18 18:50 UTC (4 hours ago)
        
 (HTM) web link (magazine.sebastianraschka.com)
 (TXT) w3m dump (magazine.sebastianraschka.com)
        
       | jasonjmcghee wrote:
       | This is a bit of a misleading title. Why not use the original?
       | 
       | "Improving LoRA: Implementing Weight-Decomposed Low-Rank
       | Adaptation (DoRA) from Scratch"
       | 
       | (If it's too long, just drop the "Improving LoRA: " part)
        
         | rasbt wrote:
         | Thanks, fixed!
        
       | murkt wrote:
       | Hooray, no more confusion with LoRa the radio!
        
         | 3abiton wrote:
         | I'm waiting for physicists to have their gripe with the
         | acronym.
        
         | stavros wrote:
         | Yes, but think of the explorer!
        
       | sorenjan wrote:
       | Speaking of LoRA, what happened with ZipLoRA? It's supposed to be
       | a better way of merging multiple LoRAs, and the results look good
       | in their examples. Is it being used anywhere?
       | 
       | https://ziplora.github.io/
        
         | rasbt wrote:
         | Not sure, but in general, it looks like ZipLoRA is only useful
         | in specific contexts like when you have two different tasks you
         | want to optimize for (like style and content in a vision
         | context). DoRA is more general, it's basically normalizing and
         | scaling the LoRA matrices to get much better performance.
         | According to the paper, it even works great for low ranks,
         | which also effectively makes it even more parameter-efficient
         | than OG LoRA.
        
           | sorenjan wrote:
           | I just read the article, nice write up! I think it would
           | benefit from a short explanation of what the magnitude vector
           | (m) and the directional matrix (V) are, I'm not familiar with
           | that kind of decomposition.
           | 
           | Not related to the article but tangentially relevant, would
           | it be possible to train a LoRA or DoRA with a high rank, and
           | then use SVD to see if the rank is too high and truncate to a
           | better value of r? Maybe use different ranks for different
           | layers after some training?
        
             | rasbt wrote:
             | Thanks for the feedback. Clarifying definitely wouldn't
             | hurt. Added a paragraph and new figure at the top of the
             | DoRA section: https://magazine.sebastianraschka.com/i/14179
             | 7214/introducin...
             | 
             | I haven't tried what you were suggesting, but that sounds
             | actually plausible. Interesting idea!
        
       | gliched_robot wrote:
       | This is very cool and will change the way we do lora now.
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:00 UTC)