LLM Gateway

Multi-provider routing with fallback, rate limiting, and caching

104 Tests Passing Async Python Production Ready

Request Flow

Request
messages, model
->
Gateway
rate limit, cache
->
Router
strategy, fallback
->
Providers
OpenAI, Anthropic

Key Features

Multi-Provider

Route between OpenAI, Anthropic, and custom providers. Easy to add new backends.

Auto Fallback

Automatic failover when providers error. Unhealthy providers are temporarily skipped.

Rate Limiting

Token bucket and sliding window limiters. Per-minute and per-hour limits with burst support.

Response Caching

LRU cache with TTL. Semantic key generation from messages, model, and temperature.

Cost Tracking

Track token usage and cost per provider. Optimize spend with cost-aware routing.

Metrics

Latency, success rate, cache hits, and more. Per-provider health tracking.

Routing Strategies

Round Robin

Even distribution across healthy providers

Lowest Latency

Route to fastest responding provider

Cost Optimized

Prefer cheapest provider per token

Priority

Primary provider with fallback chain

Quick Start

from llm_gateway import ( Gateway, GatewayConfig, Request, OpenAIProvider, ProviderConfig, RateLimitConfig, CacheConfig, ) # Create gateway with rate limiting gateway = Gateway( providers=[OpenAIProvider(ProviderConfig(name="openai"))], config=GatewayConfig( rate_limit=RateLimitConfig(requests_per_minute=60), cache=CacheConfig(ttl_seconds=3600), ), ) # Make request response = await gateway.complete(Request( messages=[{"role": "user", "content": "Hello!"}], model="gpt-4", )) print(response.content)

Metrics Example

{
  "total_requests": 1000,
  "cached_requests": 250,
  "failed_requests": 5,
  "cache_hit_rate": 0.25,
  "avg_latency_ms": 150.5,
  "total_cost": 0.42,
  "providers": {
    "openai": {
      "success_rate": 0.995,
      "avg_latency_ms": 145.2,
      "total_cost": 0.35
    },
    "anthropic": {
      "success_rate": 0.99,
      "avg_latency_ms": 160.8,
      "total_cost": 0.07
    }
  }
}
104
Tests Passing
4
Routing Strategies
2
Rate Limiter Types