Article

hugging-faceinference-apitransformersnlptext-generationpythonai-quickstartmodel-hub

Run Your First AI Model with Hugging Face's Inference API

Hugging Face hosts thousands of AI models. Use their simple Inference API to run models instantly without managing infrastructure, or use the powerful `transformers` library for local execution. No GPU required to start.

beginner15 min4 steps

The play

Find a Model on the Hub
Navigate to the Hugging Face Hub. Use the filters on the left to find a model for your task. For this guide, we'll use a text generation model. A good starting point is `distilgpt2`, a smaller version of GPT-2.
Get Your API Token
Sign up or log in to your Hugging Face account. Go to your profile, then click 'Settings' -> 'Access Tokens'. Create a new token with the 'read' role. Copy this token, as you'll need it to authenticate API requests.
Query the Inference API
Use a simple cURL command to send a request to the model's API endpoint. Replace `YOUR_TOKEN` with the key you just copied. This runs the model on Hugging Face's infrastructure and returns the result.
Run a Model Locally with `transformers`
For more control, use the `transformers` library to download and run the model directly in your own environment. This is ideal for development, fine-tuning, and offline use. First, install the library: `pip install transformers torch`.

Starter code

# This script demonstrates two ways to use Hugging Face models.
# Install required libraries first: pip install requests transformers torch

import os
import requests
import json
from transformers import pipeline

# --- Method 1: Hugging Face Inference API (Cloud) ---

# Get your token from https://huggingface.co/settings/tokens
# It's best to set this as an environment variable
API_TOKEN = os.getenv("HF_API_TOKEN", "YOUR_TOKEN_HERE")
API_URL = "https://api-inference.huggingface.co/models/distilgpt2"

headers = {
    "Authorization": f"Bearer {API_TOKEN}",
    "Content-Type": "application/json"
}

def query_api(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return response.json()

print("--- Querying Inference API ---")
if API_TOKEN == "YOUR_TOKEN_HERE":
    print("Please set your Hugging Face API token to run the API example.")
else:
    api_payload = {"inputs": "My favorite movie is"}
    api_output = query_api(api_payload)
    print(api_output)

print("\n" + "="*30 + "\n")

# --- Method 2: `transformers` library (Local) ---

print("--- Running model locally with transformers ---")
print("Downloading model (this might take a moment)...")

# Initialize the pipeline. This downloads the model on the first run.
generator = pipeline('text-generation', model='distilgpt2')

local_output = generator(
    "My favorite movie is", 
    max_length=30, 
    num_return_sequences=1
)

print("\nLocal generation result:")
print(local_output[0]['generated_text'])