Multi-provider routing with fallback, rate limiting, and caching
Route between OpenAI, Anthropic, and custom providers. Easy to add new backends.
Automatic failover when providers error. Unhealthy providers are temporarily skipped.
Token bucket and sliding window limiters. Per-minute and per-hour limits with burst support.
LRU cache with TTL. Semantic key generation from messages, model, and temperature.
Track token usage and cost per provider. Optimize spend with cost-aware routing.
Latency, success rate, cache hits, and more. Per-provider health tracking.
Even distribution across healthy providers
Route to fastest responding provider
Prefer cheapest provider per token
Primary provider with fallback chain
from llm_gateway import (
Gateway, GatewayConfig, Request,
OpenAIProvider, ProviderConfig,
RateLimitConfig, CacheConfig,
)
# Create gateway with rate limiting
gateway = Gateway(
providers=[OpenAIProvider(ProviderConfig(name="openai"))],
config=GatewayConfig(
rate_limit=RateLimitConfig(requests_per_minute=60),
cache=CacheConfig(ttl_seconds=3600),
),
)
# Make request
response = await gateway.complete(Request(
messages=[{"role": "user", "content": "Hello!"}],
model="gpt-4",
))
print(response.content)
{
"total_requests": 1000,
"cached_requests": 250,
"failed_requests": 5,
"cache_hit_rate": 0.25,
"avg_latency_ms": 150.5,
"total_cost": 0.42,
"providers": {
"openai": {
"success_rate": 0.995,
"avg_latency_ms": 145.2,
"total_cost": 0.35
},
"anthropic": {
"success_rate": 0.99,
"avg_latency_ms": 160.8,
"total_cost": 0.07
}
}
}