Fine-Tune a Llama 3 Model with LoRA and PEFT

Adapt a pre-trained Llama 3 model for a custom task using Low-Rank Adaptation (LoRA). This guide uses the Hugging Face ecosystem (`transformers`, `peft`, `trl`) to efficiently fine-tune on a single GPU.

intermediate1 hour5 steps

The play

Set Up Your Environment
Install the necessary Hugging Face libraries for Fine-Tuning. `transformers` for models, `datasets` for data handling, `peft` for LoRA, `accelerate` for hardware optimization, `bitsandbytes` for 4-bit quantization, and `trl` for the supervised fine-tuning trainer.
Prepare Your Dataset
Fine-Tuning requires a specific data format. For instruction tuning, a common format is a list of conversations. Create a JSONL file (`dataset.jsonl`) where each line is a JSON object with a 'text' key containing a formatted instruction-response pair.
Load Quantized Model and Tokenizer
To make Fine-Tuning feasible on consumer hardware, load the base model in 4-bit precision using `bitsandbytes`. This dramatically reduces memory usage. You'll need to be logged into your Hugging Face account (`huggingface-cli login`).
Configure LoRA with PEFT
Instead of training all model parameters (full Fine-Tuning), we use LoRA to train only a small number of new 'adapter' weights. Use the `peft` library to create a `LoraConfig` specifying which layers to adapt (often the attention layers `q_proj`, `k_proj`, `v_proj`, `o_proj`).
Run the Supervised Fine-Tuning
Use the `SFTTrainer` from the `trl` library, which orchestrates the training process. It combines the model, tokenizer, dataset, and LoRA configuration. Call the `.train()` method to start the Fine-Tuning process.

Starter code

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig
from trl import SFTTrainer
from datasets import load_dataset
import os

# Ensure you have a Hugging Face token with access to Llama 3
# Run `huggingface-cli login` in your terminal before executing this script

# 1. Create a dummy dataset file
dataset_content = [
    '{"text": "<s>[INST] What is the capital of France? [/INST] The capital of France is Paris.</s>"}',
    '{"text": "<s>[INST] Who wrote \'To Kill a Mockingbird\'? [/INST] Harper Lee wrote \'To Kill a Mockingbird\'.</s>"}',
    '{"text": "<s>[INST] What is the formula for water? [/INST] The chemical formula for water is H2O.</s>"}'
]
with open("dataset.jsonl", "w") as f:
    for line in dataset_content:
        f.write(line + "\n")

# 2. Model and Tokenizer setup
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config,
    device_map="auto",
    use_auth_token=True
)
model.config.use_cache = False # Recommended for training

tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

# 3. LoRA Configuration
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# 4. Load Dataset
dataset = load_dataset("json", data_files="dataset.jsonl", split="train")

# 5. Training Arguments
args = TrainingArguments(
    output_dir="llama3-8b-sft-tuned",
    num_train_epochs=1,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    logging_steps=1,
    save_strategy="epoch",
    learning_rate=2e-4,
    fp16=True, # Use fp16 if your GPU supports it
    max_grad_norm=0.3,
    warmup_ratio=0.03,
    lr_scheduler_type="constant",
)

# 6. Initialize Trainer
trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=dataset,
    dataset_text_field="text",
    peft_config=lora_config,
    max_seq_length=1024,
    tokenizer=tokenizer,
    packing=True, # Pack multiple short examples into one sequence
)

# 7. Start Fine-Tuning
print("Starting Fine-Tuning...")
trainer.train()

# 8. Save the trained adapter
print("Saving LoRA adapter...")
trainer.save_model("llama3-8b-sft-adapter")

print("Fine-Tuning complete.")