Skip to main content
Article
infrastructurecloudflareai-gatewayllmapi-managementcaching

Cloudflare Launches AI Gateway — Route, Cache, and Monitor LLM Calls

Route, cache, and monitor your LLM API calls using Cloudflare AI Gateway. This enhances reliability, reduces costs by up to 90% through caching, and provides critical observability for production AI applications.

intermediate30 min5 steps
The play
  1. Create Your AI Gateway Instance
    Log in to your Cloudflare Dashboard, navigate to the 'AI Gateway' section, and create a new gateway. Note down the unique Gateway URL provided, as this will be your new LLM endpoint.
  2. Configure LLM Providers and Routing
    Within your new AI Gateway settings, add the API keys for your desired LLM providers (e.g., OpenAI, Anthropic). Define routing rules to specify primary and failover providers for automatic resilience.
  3. Enable Caching for Cost Savings
    Activate caching within the AI Gateway settings. Set appropriate cache expiration policies for routes or globally to reduce redundant LLM calls and significantly cut down on API costs.
  4. Implement Rate Limiting (Optional)
    Configure rate limits per API key, route, or user. This helps prevent abuse, control spending, and protects your LLM providers from being overwhelmed.
  5. Update Application to Use Gateway
    Modify your application's code to direct all LLM API calls to your Cloudflare AI Gateway URL. Ensure your application continues to pass the original LLM provider's API key in the `Authorization` header for authentication.
Starter code
curl -X POST \
  https://gateway.ai.cloudflare.com/v1/{ACCOUNT_ID}/{GATEWAY_NAME}/openai/chat/completions \
  -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-3.5-turbo",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."}, 
      {"role": "user", "content": "Tell me a joke."}
    ]
  }'
Cloudflare Launches AI Gateway — Route, Cache, and Monitor LLM Calls — Action Pack