LLM Gateway - Multi-Provider Routing

Key Features

Multi-Provider

Route between OpenAI, Anthropic, and custom providers. Easy to add new backends.

Auto Fallback

Automatic failover when providers error. Unhealthy providers are temporarily skipped.

Rate Limiting

Token bucket and sliding window limiters. Per-minute and per-hour limits with burst support.

Response Caching

LRU cache with TTL. Semantic key generation from messages, model, and temperature.

Cost Tracking

Track token usage and cost per provider. Optimize spend with cost-aware routing.

Metrics

Latency, success rate, cache hits, and more. Per-provider health tracking.

Routing Strategies

Round Robin

Even distribution across healthy providers

Lowest Latency

Route to fastest responding provider

Cost Optimized

Prefer cheapest provider per token

Priority

Primary provider with fallback chain

Quick Start

from llm_gateway import (
    Gateway, GatewayConfig, Request,
    OpenAIProvider, ProviderConfig,
    RateLimitConfig, CacheConfig,
)

# Create gateway with rate limiting
gateway = Gateway(
    providers=[OpenAIProvider(ProviderConfig(name="openai"))],
    config=GatewayConfig(
        rate_limit=RateLimitConfig(requests_per_minute=60),
        cache=CacheConfig(ttl_seconds=3600),
    ),
)

# Make request
response = await gateway.complete(Request(
    messages=[{"role": "user", "content": "Hello!"}],
    model="gpt-4",
))
print(response.content)
            

Metrics Example

{
  "total_requests": 1000,
  "cached_requests": 250,
  "failed_requests": 5,
  "cache_hit_rate": 0.25,
  "avg_latency_ms": 150.5,
  "total_cost": 0.42,
  "providers": {
    "openai": {
      "success_rate": 0.995,
      "avg_latency_ms": 145.2,
      "total_cost": 0.35
    },
    "anthropic": {
      "success_rate": 0.99,
      "avg_latency_ms": 160.8,
      "total_cost": 0.07
    }
  }
}

Request Flow