[HN Gopher] LoRA+: Efficient Low Rank Adaptation of Large Models
___________________________________________________________________
LoRA+: Efficient Low Rank Adaptation of Large Models
Author : veryluckyxyz
Score : 160 points
Date : 2024-04-28 13:41 UTC (9 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| yau8edq12i wrote:
| What an unfortunate name... I initially thought this was about
| wireless communication. https://en.wikipedia.org/wiki/LoRa
| bee_rider wrote:
| The idea of low rank approximations is not new, truncated SVDs
| for example have been used for many decades.
| yau8edq12i wrote:
| The acronym LoRA used in the context of deep learning (2021)
| is about seven years younger than the radio communication
| protocol called LoRa (2014). Type "lora" in a search engine
| and see what you get.
| kleiba wrote:
| In the first 10 search results, I now get a mix of results
| for either of the two technologies when searching with
| Google.
| blamestross wrote:
| Yeah I would prefer they didn't offusicate the actually useful
| search term
| sorenjan wrote:
| This gets mentioned here everytime an article about LoRA is
| posted. Sometimes acronyms means multiple things, they're not
| in the same field so the risk of confusion beyond short
| headlines is negligible.
|
| It's a bit like if someone reading a bicycling article and
| getting annoyed that FTP means Functional Threshold Power
| instead of File Transfer Protocol, or reading about machine
| learning and getting confused that MLP doesn't mean My Little
| Pony.
| rakoo wrote:
| "computer science" and "bicycles" aren't the same domain,
| it's fine to have the same acronym.
|
| "computer science" and "tv shows" aren't the same domain,
| it's fine to have the same acronym.
|
| "computer science" and "computer science" are the same
| domain, it's not a good idea to use the same acronym.
| WithinReason wrote:
| Large Models is in the title so it's obviously not about
| radio
| GuB-42 wrote:
| The acronym is also spelled out in the title: LoRA = Low
| Rank Adaptation.
| squigz wrote:
| Context is hard!
| rakoo wrote:
| So instead of LoRa and anything else, everyone now has to
| say LoRa (the communication protocol) or LoRa (the large
| model thing). Having to add context all the time makes
| everything so much simpler !
| WithinReason wrote:
| Low rank adaptation is abbreviated LoRA
| squigz wrote:
| Or potentially include the necessary context in the title
| of the post.
| rakoo wrote:
| Large models is not spelled in full, and doesn't
| explicitely says it's _not_ about the communication
| protocol.
| dragonwriter wrote:
| > "computer science" and "computer science" are the same
| domain, it's not a good idea to use the same acronym.
|
| But "radio communication" is not "computer science", even
| though people sometimes plug radio transceivers into
| computers, just like "tv shows" aren't "computer science"
| just because people sometimes view or store their shows on
| a computer, and "bicycles" aren't "computer science"
| because sometimes people mount computers on their bikes.
| nostrademons wrote:
| "Computer science" isn't really one domain anymore - the
| field split into several subdomains in the 2010s. Just try
| to get a job as a "computer scientist" now - the recruiter
| would be like "No, are you a web developer? Mobile
| developer? Backend developer? Data scientist? Data
| engineer? Cloud engineer? AI engineer? Machine learning
| developer?"
| the__alchemist wrote:
| I think the reason this keeps coming up is encoded in your
| second sentence, in conjunction with the HN medium: LoRa and
| LoRA are both, unfortunately, things that the target audience
| are likely to be interested in and/or knowledgeable with, but
| a general audience is not.
|
| Also, both use a non-standard case mix.
| 1024core wrote:
| > Sometimes acronyms means multiple things
|
| Exactly. Like WiFi: from ancient times it has meant "Wife's
| Fidelity".
| teaearlgraycold wrote:
| Tell my WiFi love her
| EGreg wrote:
| Finally! Some people have been screaming to change the
| acronym since 2001. But these tech bros group didn't
| listen. Such hubris!
|
| https://en.m.wikipedia.org/wiki/Crypto_naming_controversy
| IshKebab wrote:
| Yes but radio protocols and AI methods are a lot closer than
| most overlapping acronyms. This is obvious from the fact that
| it gets mentioned every time an article about LoRA is posted.
| mattlondon wrote:
| But these are clearly both in the same field as everyone
| keeps saying mentioning it here! So clearly there is
| confusion. It certainly tricked me on first reading - "ah
| cool - efficient lora+ that sounds cool... Ah wait no it's
| just some machine learning spam"
| yau8edq12i wrote:
| > This gets mentioned here everytime an article about LoRA is
| posted.
|
| I wonder why!
| rytill wrote:
| That's LoRa. This is LoRA.
| kcorbitt wrote:
| This specific variant "LoRA+" described in this paper is even
| harder to search for. I was doing some research on this
| technique recently and it turns out that "Lora+" matches with
| "Lora" in Discord search, which is quite unhelpful. :)
| SquareWheel wrote:
| Discord search is one of the worst I've ever used. They remap
| words like "localization" to "local", which makes it
| impossible to search for more specific terms.
| cuuupid wrote:
| I'm struggling to understand from this paper whether the approach
| is better in the general sense (all cases, with wider models
| seeing greater benefits) or purely for wider models (with
| narrower models seeing detriment)?
|
| If it's the former this could effectively halve finetuning cost
| overnight which would go a significant way towards enabling a
| wider array of use cases for LoRA.
| batterseapower wrote:
| The other recent improvement suggested for LoRA is DoRA:
| https://magazine.sebastianraschka.com/p/lora-and-dora-from-s....
| It really does seem to strongly outperform LoRA - see also
| https://www.answer.ai/posts/2024-04-26-fsdp-qdora-llama3.htm...
| WithinReason wrote:
| The two methods seem to be independent, wonder if you can
| combine them for even better performance.
|
| Interestingly both seem to indirectly modify the optimisation
| process, in my opinion effectively trying to fix a bad
| optimiser. Seems like we still have a long way to go after
| Adam...
| neodypsis wrote:
| > Seems like we still have a long way to go after Adam...
|
| A preprint in arxiv suggests that Adam works better than SGD
| for training LLMs due to the issue of class-imbalance [0]. It
| appears that scaling the gradient step helps with the
| training, for example, see another approach suggested in [1].
|
| 0. https://arxiv.org/pdf/2402.19449 1.
| https://arxiv.org/pdf/2402.02347
| josalhor wrote:
| I just skimmed over LoRA+ and DoRA and I see no reason why
| these improvements could not go hand in hand. Actually, LoRA+
| seems to be about efficient training while DoRA seems about
| improving the ability to actually learn, making it
| significantly more robust. Although I still have my questions
| on how the improvements of LoRA+ would be applied to the
| magnitude vector.
| Ger_Onimo wrote:
| I've just started playing with DoRAs for fine-tuning TTS models
| towards particular styles of speech, and they're working
| extremely well!
| allpaca wrote:
| Can you tell us more about it? Have you reported the results
| of your experiments in a post?
| mysfi wrote:
| Count me interested here as well, specially if it is about
| the style of speech. I had a fun project in mind that
| involved the style of speech.
| cooljoseph wrote:
| Those blog posts are pretty bad. Just read the original paper,
| https://arxiv.org/pdf/2402.09353. The key section is 4.1.
| axpy906 wrote:
| In 2024 are folks still swapping out LoRA adapters? Is this still
| relevant?
| bckr wrote:
| Why would it not be?
| ac2u wrote:
| Can't tell if your tone is inquisitive or incredulous :)
|
| If the later please point out the alternatives.
| youssefabdelm wrote:
| A better name would've probably been FastLoRA or something
| throwaway2562 wrote:
| fLORA
| mobilemidget wrote:
| I was expecting to read about LOng RAng radio communication.
|
| https://en.wikipedia.org/wiki/LoRa
| ironbound wrote:
| I've had sucess with GaLore: Memory-Efficient LLM Training by
| Gradient Low-Rank Projection https://arxiv.org/abs/2403.03507
| allpaca wrote:
| This is old, having been released in February... Why do you talk
| about it now?
___________________________________________________________________
(page generated 2024-04-28 23:01 UTC)