[HN Gopher] Show HN: Finetune LLaMA-7B on commodity GPUs using y...
___________________________________________________________________
Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
I've been playing around with https://github.com/zphang/minimal-
llama/ and https://github.com/tloen/alpaca-
lora/blob/main/finetune.py, and wanted to create a simple UI where
you can just paste text, tweak the parameters, and finetune the
model quickly using a modern GPU. To prepare the data, simply
separate your text with two blank lines. There's an inference tab,
so you can test how the tuned model behaves. This is my first
foray into the world of LLM finetuning, Python, Torch,
Transformers, LoRA, PEFT, and Gradio. Enjoy!
Author : lxe
Score : 407 points
Date : 2023-03-22 04:15 UTC (18 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| antman wrote:
| What do the training screenshot examples try to accomplish? In
| what way would the model be different after the example fine-
| tuning?
| rafaguin wrote:
| Same question here!
| lxe wrote:
| Just added a screenshot of the inference UI. I just finished
| using it to tune on a subset of
| https://huggingface.co/datasets/Anthropic/hh-rlhf, and it seems
| to be working.
| callesgg wrote:
| Now, use this library to "bootstrapp the smarts of LLaMA from its
| own smartness" like this:
|
| 1. Ask it things. Let it answer.
|
| 2. Ask it to find errors in the answer it outputted and for it to
| correct the answer.
|
| 3. Use the original prompt and the corrected output as training
| data.
|
| This should, with each iteration make the model less and less
| likely to output statements that are self contradictions or
| obviously wrong, until the model can no longer spot its own
| faults.
| 8jy89hui wrote:
| You should try using a larger model like llama-35b or even
| GPT-3 for the feedback. That way you might be able to condense
| knowledge from these really big models into a smaller model
| jointpdf wrote:
| Or it will twist itself into a giant hairball of contorted
| logic, like GPT3.5 does when I (a human) encourage it to
| explain its errors.
| tysam_and wrote:
| This is a cool idea in theory and I think could be useful in
| certain kinds of circumstances, but this particular
| instantiation would likely go into a bad bias spiral.
|
| This is somewhat similar to how GANs try to learn the density
| of the underlying data, but here you do not have the underlying
| data as a reference, if that makes sense. It's sort of like
| filling a mattress with helium instead of air. Sure, the
| mattress will be lighter, but that does not mean you will float
| on it, if that makes any sense at all.
|
| Hope that helps as a cogent answer to this question.
| jkeisling wrote:
| For those skeptical of the above comment, this technique
| absolutely works and powers production-grade models like
| Anthropic's Claude. There's plenty of literature on this, but
| here are a couple papers that might be helpful for people doing
| their own training: - Constitutional AI: by Anthropic, an
| "RLAIF" technique that creates the preference model for
| "finding errors" based on a set of around 70 "principles" the
| AI uses to check its own output, not human feedback like in
| ChatGPT. This technique taught the Claude bot to avoid harmful
| output with few to no manual harmfulness labels!
| https://arxiv.org/abs/2212.08073. Not sure if there's a
| HuggingFace implementation with LoRA / PEFT yet like there is
| for regular RLHF, so somebody may need to implement this for
| Llama still
|
| - Self-Instruct: Creates artificial training data on
| instruction tuning from an untuned base model, from a tiny seed
| of prompts, and filters out the bad ones before fine-tuning.
| Manages to approach Instruct-GPT performance with only ~100
| human labels. https://arxiv.org/abs/2212.10560
| Drakim wrote:
| I recall reading that when training AlphaZero they would start
| pitching it against itself doing millions of games in a few
| days, which worked great because there is an external metric
| (who wins the chess game) that would objectively be a good
| measure to train towards.
|
| But if you let an AI's approval be the metric, things turn a
| lot more fussy and subjective. The goal is not actually "to
| write a good answer without error" but actually "to write an
| answer that is approved by the AI". Those are very different
| goals, and as you keep using it you'll get a bigger and bigger
| divergence, until eventually the AI is just answering complete
| garbage nonsense that precisely hits certain sweet spots in the
| grading AI.
|
| This divergence of the target vs the actual human goal is a
| pretty interesting problem in AI safety research. I love the
| example where an AI trained to stay alive as long as possible
| in Tetris realized that pausing the game was the best strategy.
| aqme28 wrote:
| You're describing a GAN basically.
|
| But yeah, you're going to need an objective metric or human
| input otherwise the system is going to diverge in strange
| ways.
| Dwedit wrote:
| That wasn't an AI, that was a "Make the numbers go up"
| (lexagraphic ordering) system with TAS rewinding for short
| term bruteforcing.
| MattPalmer1086 wrote:
| Interesting, but the core point remains true. The algorithm
| optimises for something which may not entirely coincide
| with the creators intentions.
| newswasboring wrote:
| I honestly think I might do this experiment, just to see what
| comes out. I know it will be utter garbage, but it will
| probably be interesting utter garbage.
| callesgg wrote:
| Please do :)
|
| The correction prompt is very important, it will definitely
| determine the outcome of the process, a bad correction
| prompt will obviously lead to a garbage result.
|
| Training in steps with different prompts might be of value.
| First step might be to fix contradictions, then factual
| errors if that is an issue. This is an idea that I got when
| viewing the he output of LLaMA, it often contains
| contradictions (eg. an example I have seen is "Peter is a
| boy and he is part of the Gama sorority"). Asking it to fix
| those types of issues should be a first good step.
|
| But I suspect that this type of training would need to be
| mixed with original training data. Otherwise the
| restructuring in the model caused by the new training would
| most likely garble the rest of the model.
| syntaxing wrote:
| Is there any library that allows you to train with a Mac M1/M2? I
| know it will be slower but I rather spend money on a Mac Studio
| rather multiple graphic cards to get around the VRAM limitation.
| speedgoose wrote:
| For training you could rent a GPU server for a short period.
| tempaccount420 wrote:
| Renting is always less economical than owning. You can always
| sell your Apple Studio and still pay less.
| te_chris wrote:
| Or use something like google vertex to run a docker job on
| gpus
| tysam_and wrote:
| Personally I would recommend Colab or another notebook
| environment like a Lambda machine.
|
| Much cheaper, and simpler than a bare metal machine, data
| ingress/egress is hard though for Colab you can just mount
| a Gdrive.
|
| Unfortunately the API for training on M* chips (via MPS) is
| apparently still extremely buggy, so we have a ways to go
| before that is fully mainstream. And yes, I know that
| PyTorch just mainlined their mps support last week
| too...but from what I've heard the low level interface
| itself still needs some work. D:
| lxe wrote:
| So hard I've been using lambda labs, vast.ai, and runpod
| to rent machines. A 3090 is about 30 cents an hour.
| nwoli wrote:
| Just fyi colab is way way more expensive now after the
| "credits" update than it was a year ago. Lamdalabs is
| about as cheap at this point
| capableweb wrote:
| If you're fine with using other individuals machines
| (meaning, you don't care about data privacy for the
| training set), using vast.ai is probably the cheapest way
| to do it today. But, quality of machines/network speed
| vary greatly as the machines are hosted by individuals
| around the world.
| te_chris wrote:
| Vertex has a training service, rather than spinning up a
| notebook. If you can dockerize the training job you just
| need to upload the container and data to gcs then it's
| point and click to run once - assume it's some sort of
| kubeflow or whatever in the background.
| Tepix wrote:
| Very interesting project, thanks. I see that DeepSpeed is on your
| TODO list. I wonder what the biggest LLaMA model is that can be
| fine-tuned on 2x RTX 3090 & 128GB RAM.
| lxe wrote:
| You can probably finetune a 13b one with that. Try these
| scripts: https://github.com/zphang/minimal-llama/#minimal-llama
| meghan_rain wrote:
| what if i want to finetune with long documents? say AI papers
| that are ~10 pages long on average? how would they be
| tokenized given that max_seq_length is 512?
| amrb wrote:
| Split your training data into chucks of text that make
| sense. A random dataset example
| https://huggingface.co/datasets/imdb
| meghan_rain wrote:
| Thanks, what does making sense mean? Be logically
| coherent (eg a paragraph of text in a document?)
|
| And does the training then create windows of ngrams on
| those chunks? Or what is the input/output?
|
| The reason I ask: If I had question/answer pairs, the
| question is the input, the answer is the output.
|
| What is the "output" when the input is just a (logically
| coherent) chunk of text?
| lxe wrote:
| > What is the "output" when the input is just a
| (logically coherent) chunk of text?
|
| It probably won't change much if it's just a single
| sample. If you put in a large corpus of samples that
| repeat on the same theme, then the model will be "tuned"
| to repeat that theme. If you increase the number of
| epochs, you can overtrain it, meaning that it will just
| spit out the training data text.
| holoduke wrote:
| I am really an AI noob. But lets say i tried the 7b model for
| translations purposes with below acceptable results. Can i train
| the model with 1 million of translated sentences to improve the
| quality of the translation output?
| Method-X wrote:
| Can you give an example on how you prompted the model? Your
| issue is probably related to that, but I would need an example
| to be sure. I've found the 7b Alpaca model [1] to work
| surprisingly well! Here's how you're supposed to prompt it:
|
| Below is an instruction that describes a task. Write a response
| that appropriately completes the request.
|
| ### Instruction: {instruction}
|
| ### Response:
|
| or
|
| Below is an instruction that describes a task, paired with an
| input that provides further context. Write a response that
| appropriately completes the request.
|
| ### Instruction: {instruction}
|
| ### Input: {input}
|
| ### Response:
|
| [1] https://github.com/cocktailpeanut/dalai
| benob wrote:
| Yes you can finetune the model with reference output for any
| kind of language task. For translation, you are better off
| starting from a model specifically trained for that purpose
| such as facebook/nllb-200-distilled-1.3B. It will be faster and
| more accurate.
| wsgeorge wrote:
| > For translation, you are better off starting from a model
| specifically trained for that purpose
|
| Yes, but wasn't the whole point of the recent LLM research to
| show that you didn't need to fine-tine for a specific task?
| Closi wrote:
| Sure, but you will get better results at a smaller number
| of parameters from a specifically trained model right now
| if you are trying to train/host it yourself.
|
| Remember that GPT3 is 175 billion parameters so many times
| bigger than both the above models (and gpt4 is rumoured to
| be bigger still), which also allows it to be more
| generalisable.
|
| If GPT3 was trained at 7 billion parameters it might also
| lose it's language translation capabilities.
| inportb wrote:
| You could _just_ use a much bigger model to perform
| arbitrary tasks without fine-tuning.
| reissbaker wrote:
| This is awesome! I noticed it said a prereq is >16GB VRAM -- is
| that >= 16GB, or is it really explicitly greater than 16? Would
| be sweet to be able to finetune locally on, say, a 3080.
| capableweb wrote:
| I gave this a try and it seems to max out using about 12GB of
| VRAM on a RTX 3090 Ti.
| capableweb wrote:
| Tried the 30b-hf set too, but was too much (24GB available).
| 13b-hf works fine, maxing out at 17GB.
| [deleted]
| [deleted]
| Taek wrote:
| Looks like you need about 120 GB to fine-tune the 65B model with
| this code at a sequence length of 512. How does the memory usage
| scale as the sequence length grows?
| joshxyz wrote:
| man i hope there are online calculators that lets people
| visualize the costs of training these things.
| lxe wrote:
| Lots of VRAM, but the method is the same. Here's someone who
| finetuned llama-30b on alpaca dataset for example:
| https://github.com/deep-diver/Alpaca-LoRA-Serve
| arpowers wrote:
| Really takes that much vram??
| ioedward wrote:
| Normally people split up the model across multiple GPUs, i.e.
| model/tensor parallelism.
| lxe wrote:
| > How does the memory usage scale as the sequence length grows?
|
| That's a good question. I was under the assumption it's
| linearly proportional, but I can test it out I guess.
| Taek wrote:
| I suspect its linear with a small constant factor.
| albertzeyer wrote:
| The attention implementation is in a way that the memory scales
| quadratically with sequence lengths. Overall, this is still a
| small factor compared to just the model weights, but at some
| seq lengths, this would dominate.
|
| By using flash attention, you can get memory requirement down
| to scale linearly with sequence lengths.
| larodi wrote:
| hmm...I would think of PROLOG or other rule-based system as doing
| inference, but not of neural-network (which is essentially a mesh
| of matrix multiplications and functions).
|
| this statement that NNs do inference is not entirely correct
| IMHO.
| selfhoster11 wrote:
| I think you are unfairly downvoted for this, probably because I
| have just as strong opinion in the other direction: I see GPT
| (especially -4) as a kind of "killer Prolog-style inference
| engine".
|
| How does Prolog work? You:
|
| * pass it some predicates * specify rules about what the
| relationships mean and how various things are computed from the
| data in predicates * query a variable * answer pops out.
|
| How can GPT do this task? You:
|
| * pass it some predicates (structured machine-readable syntax
| or natural-language sentences) * specify rules (in natural-
| language sentences, though it helps to iterate on the wording a
| bit to make the rules more rigid, and more likely to provide
| the correct output ~every time) - you don't normally need to
| specify relationships explicitly because GPT can usually figure
| it out * include some additional "massaging" wording to get it
| reproducibly outputting the kind of result you want * query a
| variable (tell it what to find out/infer from the data) *
| answer pops out (in human-readable language or structured
| syntax).
|
| In some ways, they are very much alike. And GPT is much more
| natural to program with than Prolog.
| happycube wrote:
| They get there in _completely_ different ways though. If
| there 's no answer Prolog will fail out, and GPT* will
| usually make (stuff) up.
| 6gvONxR4sf7o wrote:
| Inference is a very established term in the field for a variety
| of things, not just the kind of inference you're referring to.
| Hedepig wrote:
| Slightly tangential, is there some kind of crowdsourced effort to
| build training data for fine tuning? Alpaca used the training
| data built from gpt-3.5, so there are terms of use restrictions
| Metus wrote:
| https://open-assistant.io
| Hedepig wrote:
| This looks good. Is the training data in the repo itself?
| amrb wrote:
| You can get datasets here, depending on the training you want
| to do: https://huggingface.co/datasets/
| Phemist wrote:
| IANAL, but wouldnt feeding the training data into alpaca and
| then having it output similar text not construe new data that
| is not copyrighted by stanford/openai/facebook? (It would be a
| significantly creative and novel worm to get the prompts
| working correctly..) Obviously you are also not bound by the
| openai terms of use, and Im not sure if stanford's terms of use
| are as broad and well-defined...
| Hedepig wrote:
| The terms of ChatGPT/GPT-3.5 explicitly state that one cannot
| use their data to construct competitive models
| NavinF wrote:
| Not enforceable. All they can do is ban your ChatGPT
| account.
| amrb wrote:
| I'm not a lawyer but if a successful company was built
| off chatgpt's output and without having a contract in
| place. I could see the totally morale megaCorp's trying
| to legally take an ownership stake.
|
| Even recently US copywrite office have asked you too list
| any parts built with AI as we don't have the laws in
| place to cover this:
|
| https://www.copyright.gov/ai/
| YetAnotherNick wrote:
| > Not enforcable
|
| Are you willing to assign a upper limit on this
| probability and bet for it?
| throwaway1851 wrote:
| OpenAI disclaims ownership interest in the model output.
| If a subscriber (who has a contractual relationship with
| OpenAI) chooses to generate outputs that _could_ be used
| to train a competing model, and chooses to share those
| outputs with third parties, that is not prohibited by the
| agreement. Further, the data being shared belongs to the
| subscriber and can be licensed however they desire
| (though actually, model outputs may not be copyrightable
| at all). If a third party who does not have a contractual
| relationship with OpenAI chooses to take this data and
| train a competing model, they are using the data under
| valid license and have breached no obligation to OpenAI.
| Phemist wrote:
| Yes, but 1) you would not be using their data, and 2) you
| are not bound by their ToS if you never signed up to their
| service right..
| la64710 wrote:
| Can this be used to programmatically train the model ? I got a
| big website that I would like llama to chew on and be aware of
| the content there so it can answer questions.
| capableweb wrote:
| Yes. Check `main.py`, rewrite it to load the text from whatever
| you want.
| spunker540 wrote:
| Is there a consensus yet on when you should fine-tune vs when you
| should use prompt engineering?
|
| It seems not that hard to include in your prompt "here is some
| examples, please write like this and follow the style set here".
| While that may make every completion request more expensive (more
| tokens), it also seems like fine tuning these models can also be
| quite expensive.
|
| I'm curious if there are other trade offs besides cost-- maybe
| quality achieved is better with fine tuning? Very interested to
| see how it all plays out. On the one hand a massive model like
| gpt4 can probably be prompted to match any style quite well,
| albeit costly, vs fine tuning a cheap model may get exactly what
| you want.
| stu2b50 wrote:
| Empirically, while the ability for LLMs to zero shot learn is
| impressive, it's significantly worse than fine tuning. An
| obvious example is LLaMA itself, from which it's quite hard to
| get useful instructional behavior out, and requires a
| significant amount of prompt engineering, and is still brittle
| at that.
|
| Fine tuning it in just 52k examples (alpaca) makes a night and
| day difference in usability for instruction following.
| wazer5 wrote:
| Even GPT-4 can only handle a few pages of text as prompt for
| examples. In most cases you'd want to fine tune.
| spunker540 wrote:
| I guess I'm just curious what the killer use-cases are for
| fine tuning. For example it seems like overkill to fine tune
| a Shakespeare model, because you can just say "write like
| Shakespeare" and it already knows what you want.
| ttoinou wrote:
| I guess you'd want to fine tune for content that wasn't
| already parsed before
| underlines wrote:
| to my understanding there are 4 levels to add
| information:
|
| 1. train a model
|
| 2. fine tune a model
|
| 3. create embeddings for a model
|
| 4. use few shot prompt examples at inference time
|
| These have decreasing resource need, but also decreasing
| quality.
|
| For example, the GPT-3 API (not yet the GPT-4 API) has a
| functionality to send it your own embeddings, for example
| of your own source code documentation. Then you can query
| GPT-3 and it "knows" your source code doc and answers
| specifically with that in mind.
| akrymski wrote:
| How does Llama compare to the actually open source Flan UL2?
| yousnail wrote:
| my instance doesn't seem impressed:
|
| So, I stumbled upon this Simple LLaMA FineTuner project by
| Aleksey Smolenchuk, claiming to be a beginner-friendly tool for
| fine-tuning the LLaMA-7B language model using the LoRA method via
| the PEFT library. It supposedly runs on a regular Colab Tesla T4
| instance for smaller datasets and sample lengths.
|
| The so-called "intuitive" UI lets users manage datasets, adjust
| parameters, and train/evaluate models. However, I can't help but
| question the actual value of such a tool. Is it just an attempt
| to dumb down the process for newcomers? Are there any plans to
| cater to more experienced users?
|
| The guide provided is straightforward, but it feels like a
| solution in search of a problem. I'm skeptical about the impact
| this tool will have on NLP fine-tuning.
| bjord wrote:
| maybe put the bit it said in quotes? I didn't read closely
| enough myself the first time, it took your subsequent comments
| to make me realize what you'd done
| fbdab103 wrote:
| So you are annoyed that something targeted for beginners does
| not also cater to experts?
| yousnail wrote:
| me? re-read that s'il vous plait
| lxe wrote:
| > I can't help but question the actual value of such a tool. Is
| it just an attempt to dumb down the process for newcomers?
|
| Actually, you've hit the nail on the head here. I wanted
| something where I, a complete beginner, can quickly play around
| with data, parameters, finetune, iterate, without investing too
| much time.
|
| That's also why I've annotated all the training parameters in
| the code and UI -- so beginners like me can understand what
| each slider does to their tuning and to their generation.
| Taek wrote:
| This is exactly the sweet spot I'm looking for. Technical
| enough that I can play around, simplified enough that I'm
| investing an hour or two of my time instead of a whole
| weekend.
| bbor wrote:
| I get that you /can/ use an LLM to generate troll feedback for
| random projects... but why?
| yousnail wrote:
| I was just excited that I got it working at all :/
| sea_temple wrote:
| Is it possible to fine-tune using CPU only?
| lxe wrote:
| I haven't tried it myself, and I haven't actually heard anyone
| attempt doing it with a model this large.
| kiraaa wrote:
| you could but it will be very slow. oh and make sure the code
| is cpu compatible.
| mattfrommars wrote:
| Great job OP. Yesterday after managing to run local instance of
| alpaca.cpp and reading more about what alpaca is and how it got
| fine tuned on LLaMA, I began to wonder what will it take to fine
| tune with own set of data.
|
| With no real knowledge of LLM and only recently started to
| understand what LLM terms mean, such as 'model, inference, LLM
| model, intruction set, fine tuning' whatelse do you think is
| required to make a took like yours?
|
| This is for education purposes and love to take a jab on creating
| something like this and and write an inference - such as the dev
| behind LLaMa inference in Rush.
|
| >I am not familiar with HuggingFace libaries at all, why were
| they important in your implementaiton? > Gradio - I believe is
| the UI that allows to plugin different lLM models, I am familiar
| with text-generation-ui on GitHub that uses Gradio. >LORA I think
| further fines tines an model -- just like how LLaMa got fine
| tuned on instruciton set to produce Alpaca model.
| lxe wrote:
| > With no real knowledge of LLM and only recently started to
| understand what LLM terms mean, such as 'model, inference, LLM
| model, intruction set, fine tuning' whatelse do you think is
| required to make a took like yours?
|
| This was mee a few weeks ago. I got interested in all this when
| FlexGen (https://github.com/FMInference/FlexGen) was announced,
| which allowed to run inference using OPT model on consumer
| hardware. I'm an avid user of Stable Diffusion, and I wanted to
| see if I can have an SD equivalent of ChatGPT.
|
| Not understanding the details of hyperparameters or
| terminology, I basically asked ChatGPT to explain to me what
| these things are: Explain to someone who is a
| software engineer with limited knowledge of ML terms or linear
| algebra, what is "feed forward" and "self-attention" in the
| context of ML and large language models. Provide examples when
| possible.
|
| I did the same with all the other terms I didn't understand,
| like "ADAM optimizer", "gradient", etc. I relied on it very
| heavily and cross-referenced the answers.
|
| Looking at other people's code and just tinkering with things
| on my own really helped.
|
| Through the FlexGen discord I've discovered
| https://github.com/oobabooga/text-generation-webui where I
| spent days just playing around with models. This got me into
| the huggingface ecosystem -- their transformers library is an
| easy way to get started. I joined a few other discords, like
| LLaMA Unofficial, RWKV, Eleuther AI, Together, Hivemind and
| Petals.
|
| I bookmarked a bunch of resources but it's very sporadic. Here
| are some:
|
| - https://github.com/zphang/minimal-llama/#peft-fine-tuning-
| wi...
|
| - https://github.com/togethercomputer/OpenChatKit
|
| - https://www.cstroik.com/index.php/2023/02/18/finetuning-
| an-a...
|
| - https://github.com/huggingface/peft
|
| - https://github.com/kingoflolz/mesh-transformer-
| jax/blob/mast...
|
| - https://github.com/oobabooga/text-generation-webui
|
| - https://github.com/hizkifw/WebChatRWKVstic
|
| - https://github.com/ggerganov/whisper.cpp
|
| - https://github.com/qwopqwop200/GPTQ-for-LLaMa
|
| - https://github.com/oobabooga/text-generation-
| webui/issues/14...
|
| - https://github.com/bigscience-workshop/petals
|
| - https://github.com/alpa-projects/alpa
| nathanasmith wrote:
| I've built up a large personal library of hand made Anki
| flashcard decks over the years. This looks like just what I need
| to train a model on those decks.
| ALittleLight wrote:
| Really interesting, thanks for sharing, I'm excited to try this
| out.
|
| Would it also be possible to just train the model from scratch on
| commodity hardware and how big of a difference in training time
| would that be?
___________________________________________________________________
(page generated 2023-03-22 23:02 UTC)