Cloudflare Launches AI Gateway — Route, Cache, and Monitor LLM Calls

Route, cache, and monitor your LLM API calls using Cloudflare AI Gateway. This enhances reliability, reduces costs by up to 90% through caching, and provides critical observability for production AI applications.

intermediate30 min5 steps

The play

Create Your AI Gateway Instance
Log in to your Cloudflare Dashboard, navigate to the 'AI Gateway' section, and create a new gateway. Note down the unique Gateway URL provided, as this will be your new LLM endpoint.
Configure LLM Providers and Routing
Within your new AI Gateway settings, add the API keys for your desired LLM providers (e.g., OpenAI, Anthropic). Define routing rules to specify primary and failover providers for automatic resilience.
Enable Caching for Cost Savings
Activate caching within the AI Gateway settings. Set appropriate cache expiration policies for routes or globally to reduce redundant LLM calls and significantly cut down on API costs.
Implement Rate Limiting (Optional)
Configure rate limits per API key, route, or user. This helps prevent abuse, control spending, and protects your LLM providers from being overwhelmed.
Update Application to Use Gateway
Modify your application's code to direct all LLM API calls to your Cloudflare AI Gateway URL. Ensure your application continues to pass the original LLM provider's API key in the `Authorization` header for authentication.

Starter code

curl -X POST \
  https://gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}/openai/chat/completions \
  -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."}, 
      {"role": "user", "content": "Tell me a joke."}
    ]
  }'