Llama 4

Meta's open-source LLM family offers state-of-the-art performance comparable to proprietary models. This democratizes advanced AI, enabling practitioners to fine-tune and deploy powerful models on their own infrastructure, fostering innovation and reducing vendor lock-in.

intermediate30 min5 steps

The play

Review Llama Model Capabilities
Access Meta's official Llama resources or Hugging Face model cards to understand the performance benchmarks, available model sizes, and specific use cases for the Llama family of models.
Request Model Access
Submit a request for access to the Llama models through Meta's official portal or Hugging Face, as these models often require acceptance of their responsible use policy. Ensure your Hugging Face account is linked after approval.
Set Up Your Development Environment
Install PyTorch and the Hugging Face Transformers library. A GPU with sufficient VRAM is highly recommended for running Llama models efficiently.
Load a Llama Model and Perform Inference
Use the Hugging Face `transformers` library to load a specific Llama model and tokenizer. Run a basic text generation task to verify your setup.
Explore Customization and Deployment
Investigate options for fine-tuning Llama models on your custom datasets using techniques like LoRA, or deploy the models on your own cloud infrastructure (e.g., AWS, Azure, GCP) to build tailored AI applications.

Starter code

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

# NOTE: Replace 'meta-llama/Llama-2-7b-chat-hf' with the specific Llama model you have access to.
# Ensure you have logged into Hugging Face CLI with a token that has access to Llama models.
# hf_token = "YOUR_HF_TOKEN" # Or login via 'huggingface-cli login'

model_name = "meta-llama/Llama-2-7b-chat-hf" # Example Llama 2 model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")

prompt = "Hello, I am a large language model. What can I help you with today?"
input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)

output = model.generate(input_ids, max_new_tokens=100, num_return_sequences=1)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Source

Articlellama.meta.com