Cloudflare Launches AI Gateway — Route, Cache, and Monitor LLM Calls

Route, cache, and monitor your LLM API calls with Cloudflare AI Gateway. This service acts as a proxy, enhancing reliability, optimizing costs through caching, and providing observability for production AI applications by abstracting multiple AI providers.

intermediate15 min5 steps

The play

Enable Cloudflare AI Gateway
Navigate to your Cloudflare dashboard, select your account, and activate the AI Gateway service. This will create a unique gateway endpoint for your applications.
Configure an LLM Provider
Within the AI Gateway settings, add and configure your desired Large Language Model (LLM) provider (e.g., OpenAI, Anthropic, Google). You will need to provide your API keys for authentication.
Update Application API Endpoints
Modify your application code to direct all LLM API requests through your Cloudflare AI Gateway endpoint instead of directly calling the LLM provider's API. This enables caching, rate limiting, and observability.
Implement Caching and Rate Limiting
Configure caching policies and rate limits within the AI Gateway settings to reduce costs by serving cached responses for identical requests and protect your LLM APIs from abuse.
Monitor Usage and Performance
Utilize the AI Gateway's built-in observability features, including request/response logging and analytics dashboards, to monitor LLM usage, performance, and identify areas for optimization.

Starter code

curl -X POST "https://gateway.ai.cloudflare.com/v1/ACCOUNT_ID/GATEWAY_ID/openai/chat/completions" \
     -H "Authorization: Bearer YOUR_OPENAI_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
           "model": "gpt-3.5-turbo",
           "messages": [{"role": "user", "content": "What is the capital of France?"}],
           "temperature": 0.7
         }'

Source

Articletest.aaas.com