[HN Gopher] LoRA+: Efficient Low Rank Adaptation of Large Models
       ___________________________________________________________________
        
       LoRA+: Efficient Low Rank Adaptation of Large Models
        
       Author : veryluckyxyz
       Score  : 160 points
       Date   : 2024-04-28 13:41 UTC (9 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | yau8edq12i wrote:
       | What an unfortunate name... I initially thought this was about
       | wireless communication. https://en.wikipedia.org/wiki/LoRa
        
         | bee_rider wrote:
         | The idea of low rank approximations is not new, truncated SVDs
         | for example have been used for many decades.
        
           | yau8edq12i wrote:
           | The acronym LoRA used in the context of deep learning (2021)
           | is about seven years younger than the radio communication
           | protocol called LoRa (2014). Type "lora" in a search engine
           | and see what you get.
        
             | kleiba wrote:
             | In the first 10 search results, I now get a mix of results
             | for either of the two technologies when searching with
             | Google.
        
         | blamestross wrote:
         | Yeah I would prefer they didn't offusicate the actually useful
         | search term
        
         | sorenjan wrote:
         | This gets mentioned here everytime an article about LoRA is
         | posted. Sometimes acronyms means multiple things, they're not
         | in the same field so the risk of confusion beyond short
         | headlines is negligible.
         | 
         | It's a bit like if someone reading a bicycling article and
         | getting annoyed that FTP means Functional Threshold Power
         | instead of File Transfer Protocol, or reading about machine
         | learning and getting confused that MLP doesn't mean My Little
         | Pony.
        
           | rakoo wrote:
           | "computer science" and "bicycles" aren't the same domain,
           | it's fine to have the same acronym.
           | 
           | "computer science" and "tv shows" aren't the same domain,
           | it's fine to have the same acronym.
           | 
           | "computer science" and "computer science" are the same
           | domain, it's not a good idea to use the same acronym.
        
             | WithinReason wrote:
             | Large Models is in the title so it's obviously not about
             | radio
        
               | GuB-42 wrote:
               | The acronym is also spelled out in the title: LoRA = Low
               | Rank Adaptation.
        
               | squigz wrote:
               | Context is hard!
        
               | rakoo wrote:
               | So instead of LoRa and anything else, everyone now has to
               | say LoRa (the communication protocol) or LoRa (the large
               | model thing). Having to add context all the time makes
               | everything so much simpler !
        
               | WithinReason wrote:
               | Low rank adaptation is abbreviated LoRA
        
               | squigz wrote:
               | Or potentially include the necessary context in the title
               | of the post.
        
               | rakoo wrote:
               | Large models is not spelled in full, and doesn't
               | explicitely says it's _not_ about the communication
               | protocol.
        
             | dragonwriter wrote:
             | > "computer science" and "computer science" are the same
             | domain, it's not a good idea to use the same acronym.
             | 
             | But "radio communication" is not "computer science", even
             | though people sometimes plug radio transceivers into
             | computers, just like "tv shows" aren't "computer science"
             | just because people sometimes view or store their shows on
             | a computer, and "bicycles" aren't "computer science"
             | because sometimes people mount computers on their bikes.
        
             | nostrademons wrote:
             | "Computer science" isn't really one domain anymore - the
             | field split into several subdomains in the 2010s. Just try
             | to get a job as a "computer scientist" now - the recruiter
             | would be like "No, are you a web developer? Mobile
             | developer? Backend developer? Data scientist? Data
             | engineer? Cloud engineer? AI engineer? Machine learning
             | developer?"
        
           | the__alchemist wrote:
           | I think the reason this keeps coming up is encoded in your
           | second sentence, in conjunction with the HN medium: LoRa and
           | LoRA are both, unfortunately, things that the target audience
           | are likely to be interested in and/or knowledgeable with, but
           | a general audience is not.
           | 
           | Also, both use a non-standard case mix.
        
           | 1024core wrote:
           | > Sometimes acronyms means multiple things
           | 
           | Exactly. Like WiFi: from ancient times it has meant "Wife's
           | Fidelity".
        
             | teaearlgraycold wrote:
             | Tell my WiFi love her
        
             | EGreg wrote:
             | Finally! Some people have been screaming to change the
             | acronym since 2001. But these tech bros group didn't
             | listen. Such hubris!
             | 
             | https://en.m.wikipedia.org/wiki/Crypto_naming_controversy
        
           | IshKebab wrote:
           | Yes but radio protocols and AI methods are a lot closer than
           | most overlapping acronyms. This is obvious from the fact that
           | it gets mentioned every time an article about LoRA is posted.
        
           | mattlondon wrote:
           | But these are clearly both in the same field as everyone
           | keeps saying mentioning it here! So clearly there is
           | confusion. It certainly tricked me on first reading - "ah
           | cool - efficient lora+ that sounds cool... Ah wait no it's
           | just some machine learning spam"
        
           | yau8edq12i wrote:
           | > This gets mentioned here everytime an article about LoRA is
           | posted.
           | 
           | I wonder why!
        
         | rytill wrote:
         | That's LoRa. This is LoRA.
        
         | kcorbitt wrote:
         | This specific variant "LoRA+" described in this paper is even
         | harder to search for. I was doing some research on this
         | technique recently and it turns out that "Lora+" matches with
         | "Lora" in Discord search, which is quite unhelpful. :)
        
           | SquareWheel wrote:
           | Discord search is one of the worst I've ever used. They remap
           | words like "localization" to "local", which makes it
           | impossible to search for more specific terms.
        
       | cuuupid wrote:
       | I'm struggling to understand from this paper whether the approach
       | is better in the general sense (all cases, with wider models
       | seeing greater benefits) or purely for wider models (with
       | narrower models seeing detriment)?
       | 
       | If it's the former this could effectively halve finetuning cost
       | overnight which would go a significant way towards enabling a
       | wider array of use cases for LoRA.
        
       | batterseapower wrote:
       | The other recent improvement suggested for LoRA is DoRA:
       | https://magazine.sebastianraschka.com/p/lora-and-dora-from-s....
       | It really does seem to strongly outperform LoRA - see also
       | https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...
        
         | WithinReason wrote:
         | The two methods seem to be independent, wonder if you can
         | combine them for even better performance.
         | 
         | Interestingly both seem to indirectly modify the optimisation
         | process, in my opinion effectively trying to fix a bad
         | optimiser. Seems like we still have a long way to go after
         | Adam...
        
           | neodypsis wrote:
           | > Seems like we still have a long way to go after Adam...
           | 
           | A preprint in arxiv suggests that Adam works better than SGD
           | for training LLMs due to the issue of class-imbalance [0]. It
           | appears that scaling the gradient step helps with the
           | training, for example, see another approach suggested in [1].
           | 
           | 0. https://arxiv.org/pdf/2402.19449 1.
           | https://arxiv.org/pdf/2402.02347
        
         | josalhor wrote:
         | I just skimmed over LoRA+ and DoRA and I see no reason why
         | these improvements could not go hand in hand. Actually, LoRA+
         | seems to be about efficient training while DoRA seems about
         | improving the ability to actually learn, making it
         | significantly more robust. Although I still have my questions
         | on how the improvements of LoRA+ would be applied to the
         | magnitude vector.
        
         | Ger_Onimo wrote:
         | I've just started playing with DoRAs for fine-tuning TTS models
         | towards particular styles of speech, and they're working
         | extremely well!
        
           | allpaca wrote:
           | Can you tell us more about it? Have you reported the results
           | of your experiments in a post?
        
             | mysfi wrote:
             | Count me interested here as well, specially if it is about
             | the style of speech. I had a fun project in mind that
             | involved the style of speech.
        
         | cooljoseph wrote:
         | Those blog posts are pretty bad. Just read the original paper,
         | https://arxiv.org/pdf/2402.09353. The key section is 4.1.
        
       | axpy906 wrote:
       | In 2024 are folks still swapping out LoRA adapters? Is this still
       | relevant?
        
         | bckr wrote:
         | Why would it not be?
        
         | ac2u wrote:
         | Can't tell if your tone is inquisitive or incredulous :)
         | 
         | If the later please point out the alternatives.
        
       | youssefabdelm wrote:
       | A better name would've probably been FastLoRA or something
        
         | throwaway2562 wrote:
         | fLORA
        
         | mobilemidget wrote:
         | I was expecting to read about LOng RAng radio communication.
         | 
         | https://en.wikipedia.org/wiki/LoRa
        
       | ironbound wrote:
       | I've had sucess with GaLore: Memory-Efficient LLM Training by
       | Gradient Low-Rank Projection https://arxiv.org/abs/2403.03507
        
       | allpaca wrote:
       | This is old, having been released in February... Why do you talk
       | about it now?
        
       ___________________________________________________________________
       (page generated 2024-04-28 23:01 UTC)