https://brev.dev/blog/fine-tuning-llama-2

 
 
Search docs
BiomeBlogPricingLogin

  * AI

      + A simple guide to fine tuning Llama 2
      + The No-BS Guide to Fine-Tuning an LLM
      + The complete guide to Google Colab Compute Prices and
        Performance
      + Monitoring Resource Utilization in Google Colab
      + How to manage your Google Colab state and storage
      + Google Colab Pro+ and its Alternatives: A Comparative
        Analysis
      + Run Replicate Models on Brev
  * Launches!

      + Brev.dev to easily create, use, and pause Lambda cloud
        instances
      + Cost Optimizations to reduce your dev cloud bill by up to 94%
  * Dev Environments

      + Dev Environment Horror Story
      + Don't let a bad abstraction cost you 2 years
      + Why and how you should be composing with Docker Compose
      + Harness Multi-stage builds to create optimal images
      + Sneaking into an Uber parking lot to get your Development
        Environment up and running
  * Build Brev Yourself

      + Free your mac from docker
      + How to create and manage remote dev environments with IaaC
  * Dev Challenges

      + Using ChatGPT to make Bash palatable
      + Upgrading to Golang v1.18 from v.1.17
  * Case Studies

      + How Glimpsed Used Brev.dev To Streamline Due Diligence During
        Their Acquisition
  * For Fun

      + How to win a 2 player game of 'Are You a Robot'

AI

A simple guide to fine-tuning Llama 2

Sam L'Huillier[sam]

Sam L'Huillier

July 24, 2023*6 min read

In this guide, I show how you can fine-tune Llama 2 to be a dialog
summarizer!

Last weekend, I wanted to finetune Llama 2 (which now reigns supreme
in the Open LLM leaderboard) on a dataset of my own collection of
Google Keep notes; each one of my notes has both a title and a body
so I wanted to train Llama to generate a body from a given title.

This first part of the tutorial covers finetuning Llama 2 on the
samsum dialog summarization dataset using Huggingface libraries. I
tend to find that while Huggingface has built a superb library in
transformers, their guides tend to overcomplicate things for the
average joe. The second part, fine-tuning on custom data, is coming
at the end of the week!

To get started, get yourself either an A10, A10G, A100 (or any GPU
with >24GB GPU memory). If you're not sure where to start, the Brev
Cloud makes it easy to access each of these GPUs!

1. Download the model

Clone Meta's Llama inference repo (which contains the download
script):

git clone https://github.com/facebookresearch/llama.git

Then run the download script:

bash download.sh

It'll prompt you to enter the URL you got sent by Meta in an email.
If you haven't signed up, do it here. They are surprisingly quick at
sending you the email!

For this guide, you only need to download the 7B model.

2. Convert model to Hugging Face format

pip install git+https://github.com/huggingface/transformers
cd transformers

python convert_llama_weights_to_hf.py \
    --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir models_hf/7B

This now gives us a Hugging Face model that we can fine-tune
leveraging Huggingface libraries!

3. Run the fine-tuning notebook:

Clone the Llama-recipies repo:

git clone https://github.com/facebookresearch/llama-recipes.git

Then open the quickstart.ipynb file in your preferred notebook
interface:

(I use Jupyter lab like so):

pip install jupyterlab
jupyter lab # in the repo you want to work in

Then just run the whole notebook.

Make sure you change the line:

model_id="./models_hf/7B"

to your actual model path that you converted. And that's that! You
will end up with a Lora fine-tuned.

4. Run inference on your fine-tuned model

The issue here is that Huggingface only saves the adapter weights and
not the full model. So we need to load the adapter weights into the
full model. I struggled for a bit finding the right documentation to
do this...But eventually worked it out!

Import libraries:

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig

Load the tokenizer and model:

model_id="./models_hf/7B"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

Load the adapter from where you saved it post-train:

model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

Run inference:

eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow's afternoon?
B: I'm pretty sure I am. What's up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we've discussed it many times. I think he's ready now.
B: That's good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he'd name it after his dead hamster - Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

Next in this series, I'll show you how you can format your own
dataset to train Llama 2 on a custom task! Message me on Twitter if
you want to get me to hurry up on this!

Next
    The No-BS Guide to Fine-Tuning an LLM -

On this page

 1. 1. Download the model

 2. 2. Convert model to Hugging Face format

 3. 3. Run the fine-tuning notebook:

 4. 4. Run inference on your fine-tuned model

TwitterGitHubLinked In
Blog
Pricing
Jobs

Intensely non-remote in
San Francisco


2261 Market St #4066, San Francisco, CA 94114
(c) 2023 Brev.dev, Inc. All rights reserved.