Article
fine-tuninglorapeftllama-3transformerspythonnlpmodel-customization
Fine-Tune a Llama 3 Model with LoRA and PEFT
Adapt a pre-trained Llama 3 model for a custom task using Low-Rank Adaptation (LoRA). This guide uses the Hugging Face ecosystem (`transformers`, `peft`, `trl`) to efficiently fine-tune on a single GPU.
intermediate1 hour5 steps
The play
- Set Up Your EnvironmentInstall the necessary Hugging Face libraries for Fine-Tuning. `transformers` for models, `datasets` for data handling, `peft` for LoRA, `accelerate` for hardware optimization, `bitsandbytes` for 4-bit quantization, and `trl` for the supervised fine-tuning trainer.
- Prepare Your DatasetFine-Tuning requires a specific data format. For instruction tuning, a common format is a list of conversations. Create a JSONL file (`dataset.jsonl`) where each line is a JSON object with a 'text' key containing a formatted instruction-response pair.
- Load Quantized Model and TokenizerTo make Fine-Tuning feasible on consumer hardware, load the base model in 4-bit precision using `bitsandbytes`. This dramatically reduces memory usage. You'll need to be logged into your Hugging Face account (`huggingface-cli login`).
- Configure LoRA with PEFTInstead of training all model parameters (full Fine-Tuning), we use LoRA to train only a small number of new 'adapter' weights. Use the `peft` library to create a `LoraConfig` specifying which layers to adapt (often the attention layers `q_proj`, `k_proj`, `v_proj`, `o_proj`).
- Run the Supervised Fine-TuningUse the `SFTTrainer` from the `trl` library, which orchestrates the training process. It combines the model, tokenizer, dataset, and LoRA configuration. Call the `.train()` method to start the Fine-Tuning process.
Starter code
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, TrainingArguments
from peft import LoraConfig
from trl import SFTTrainer
from datasets import load_dataset
import os
# Ensure you have a Hugging Face token with access to Llama 3
# Run `huggingface-cli login` in your terminal before executing this script
# 1. Create a dummy dataset file
dataset_content = [
'{"text": "<s>[INST] What is the capital of France? [/INST] The capital of France is Paris.</s>"}',
'{"text": "<s>[INST] Who wrote \'To Kill a Mockingbird\'? [/INST] Harper Lee wrote \'To Kill a Mockingbird\'.</s>"}',
'{"text": "<s>[INST] What is the formula for water? [/INST] The chemical formula for water is H2O.</s>"}'
]
with open("dataset.jsonl", "w") as f:
for line in dataset_content:
f.write(line + "\n")
# 2. Model and Tokenizer setup
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model = AutoModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto",
use_auth_token=True
)
model.config.use_cache = False # Recommended for training
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'
# 3. LoRA Configuration
lora_config = LoraConfig(
r=16,
lora_alpha=32,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
# 4. Load Dataset
dataset = load_dataset("json", data_files="dataset.jsonl", split="train")
# 5. Training Arguments
args = TrainingArguments(
output_dir="llama3-8b-sft-tuned",
num_train_epochs=1,
per_device_train_batch_size=2,
gradient_accumulation_steps=2,
gradient_checkpointing=True,
optim="paged_adamw_32bit",
logging_steps=1,
save_strategy="epoch",
learning_rate=2e-4,
fp16=True, # Use fp16 if your GPU supports it
max_grad_norm=0.3,
warmup_ratio=0.03,
lr_scheduler_type="constant",
)
# 6. Initialize Trainer
trainer = SFTTrainer(
model=model,
args=args,
train_dataset=dataset,
dataset_text_field="text",
peft_config=lora_config,
max_seq_length=1024,
tokenizer=tokenizer,
packing=True, # Pack multiple short examples into one sequence
)
# 7. Start Fine-Tuning
print("Starting Fine-Tuning...")
trainer.train()
# 8. Save the trained adapter
print("Saving LoRA adapter...")
trainer.save_model("llama3-8b-sft-adapter")
print("Fine-Tuning complete.")