Article·llama.meta.com
llmopen-sourceresearchmachine-learningevaluation
Llama 4
Meta's open-source LLM family offers state-of-the-art performance comparable to proprietary models. This democratizes advanced AI, enabling practitioners to fine-tune and deploy powerful models on their own infrastructure, fostering innovation and reducing vendor lock-in.
intermediate30 min5 steps
The play
- Review Llama Model CapabilitiesAccess Meta's official Llama resources or Hugging Face model cards to understand the performance benchmarks, available model sizes, and specific use cases for the Llama family of models.
- Request Model AccessSubmit a request for access to the Llama models through Meta's official portal or Hugging Face, as these models often require acceptance of their responsible use policy. Ensure your Hugging Face account is linked after approval.
- Set Up Your Development EnvironmentInstall PyTorch and the Hugging Face Transformers library. A GPU with sufficient VRAM is highly recommended for running Llama models efficiently.
- Load a Llama Model and Perform InferenceUse the Hugging Face `transformers` library to load a specific Llama model and tokenizer. Run a basic text generation task to verify your setup.
- Explore Customization and DeploymentInvestigate options for fine-tuning Llama models on your custom datasets using techniques like LoRA, or deploy the models on your own cloud infrastructure (e.g., AWS, Azure, GCP) to build tailored AI applications.
Starter code
import torch from transformers import AutoTokenizer, AutoModelForCausalLM # NOTE: Replace 'meta-llama/Llama-2-7b-chat-hf' with the specific Llama model you have access to. # Ensure you have logged into Hugging Face CLI with a token that has access to Llama models. # hf_token = "YOUR_HF_TOKEN" # Or login via 'huggingface-cli login' model_name = "meta-llama/Llama-2-7b-chat-hf" # Example Llama 2 model tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto") prompt = "Hello, I am a large language model. What can I help you with today?" input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device) output = model.generate(input_ids, max_new_tokens=100, num_return_sequences=1) print(tokenizer.decode(output[0], skip_special_tokens=True))
Source