https://brev.dev/blog/fine-tuning-llama-2 Search docs BiomeBlogPricingLogin * AI + A simple guide to fine tuning Llama 2 + The No-BS Guide to Fine-Tuning an LLM + The complete guide to Google Colab Compute Prices and Performance + Monitoring Resource Utilization in Google Colab + How to manage your Google Colab state and storage + Google Colab Pro+ and its Alternatives: A Comparative Analysis + Run Replicate Models on Brev * Launches! + Brev.dev to easily create, use, and pause Lambda cloud instances + Cost Optimizations to reduce your dev cloud bill by up to 94% * Dev Environments + Dev Environment Horror Story + Don't let a bad abstraction cost you 2 years + Why and how you should be composing with Docker Compose + Harness Multi-stage builds to create optimal images + Sneaking into an Uber parking lot to get your Development Environment up and running * Build Brev Yourself + Free your mac from docker + How to create and manage remote dev environments with IaaC * Dev Challenges + Using ChatGPT to make Bash palatable + Upgrading to Golang v1.18 from v.1.17 * Case Studies + How Glimpsed Used Brev.dev To Streamline Due Diligence During Their Acquisition * For Fun + How to win a 2 player game of 'Are You a Robot' AI A simple guide to fine-tuning Llama 2 Sam L'Huillier[sam] Sam L'Huillier July 24, 2023*6 min read In this guide, I show how you can fine-tune Llama 2 to be a dialog summarizer! Last weekend, I wanted to finetune Llama 2 (which now reigns supreme in the Open LLM leaderboard) on a dataset of my own collection of Google Keep notes; each one of my notes has both a title and a body so I wanted to train Llama to generate a body from a given title. This first part of the tutorial covers finetuning Llama 2 on the samsum dialog summarization dataset using Huggingface libraries. I tend to find that while Huggingface has built a superb library in transformers, their guides tend to overcomplicate things for the average joe. The second part, fine-tuning on custom data, is coming at the end of the week! To get started, get yourself either an A10, A10G, A100 (or any GPU with >24GB GPU memory). If you're not sure where to start, the Brev Cloud makes it easy to access each of these GPUs! 1. Download the model Clone Meta's Llama inference repo (which contains the download script): git clone https://github.com/facebookresearch/llama.git Then run the download script: bash download.sh It'll prompt you to enter the URL you got sent by Meta in an email. If you haven't signed up, do it here. They are surprisingly quick at sending you the email! For this guide, you only need to download the 7B model. 2. Convert model to Hugging Face format pip install git+https://github.com/huggingface/transformers cd transformers python convert_llama_weights_to_hf.py \ --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir models_hf/7B This now gives us a Hugging Face model that we can fine-tune leveraging Huggingface libraries! 3. Run the fine-tuning notebook: Clone the Llama-recipies repo: git clone https://github.com/facebookresearch/llama-recipes.git Then open the quickstart.ipynb file in your preferred notebook interface: (I use Jupyter lab like so): pip install jupyterlab jupyter lab # in the repo you want to work in Then just run the whole notebook. Make sure you change the line: model_id="./models_hf/7B" to your actual model path that you converted. And that's that! You will end up with a Lora fine-tuned. 4. Run inference on your fine-tuned model The issue here is that Huggingface only saves the adapter weights and not the full model. So we need to load the adapter weights into the full model. I struggled for a bit finding the right documentation to do this...But eventually worked it out! Import libraries: import torch from transformers import LlamaForCausalLM, LlamaTokenizer from peft import PeftModel, PeftConfig Load the tokenizer and model: model_id="./models_hf/7B" tokenizer = LlamaTokenizer.from_pretrained(model_id) model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16) Load the adapter from where you saved it post-train: model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint") Run inference: eval_prompt = """ Summarize this dialog: A: Hi Tom, are you busy tomorrow's afternoon? B: I'm pretty sure I am. What's up? A: Can you go with me to the animal shelter?. B: What do you want to do? A: I want to get a puppy for my son. B: That will make him so happy. A: Yeah, we've discussed it many times. I think he's ready now. B: That's good. Raising a dog is a tough issue. Like having a baby ;-) A: I'll get him one of those little dogs. B: One that won't grow up too big;-) A: And eat too much;-)) B: Do you know which one he would like? A: Oh, yes, I took him there last Monday. He showed me one that he really liked. B: I bet you had to drag him away. A: He wanted to take it home right away ;-). B: I wonder what he'll name it. A: He said he'd name it after his dead hamster - Lemmy - he's a great Motorhead fan :-))) --- Summary: """ model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda") model.eval() with torch.no_grad(): print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True)) Next in this series, I'll show you how you can format your own dataset to train Llama 2 on a custom task! Message me on Twitter if you want to get me to hurry up on this! Next The No-BS Guide to Fine-Tuning an LLM - On this page 1. 1. Download the model 2. 2. Convert model to Hugging Face format 3. 3. Run the fine-tuning notebook: 4. 4. Run inference on your fine-tuned model TwitterGitHubLinked In Blog Pricing Jobs Intensely non-remote in San Francisco 2261 Market St #4066, San Francisco, CA 94114 (c) 2023 Brev.dev, Inc. All rights reserved.