This setup provides a single OpenAI-compatible endpoint that routes requests to multiple upstream LLM providers through LiteLLM, while using Headroom to reduce API footprint by compressing and optimizing request context before it is sent upstream.
- Single local gateway endpoint for all clients
- Support for:
- Anthropic
- OpenRouter
- Google Gemini
- Headroom-based context optimization on top of LiteLLM
- Docker Compose deployment
- Easy model aliasing for coding tools and agent frameworks
The recommended layout is:
- LiteLLM acts as the main OpenAI-compatible gateway
- Headroom is installed into the same container
- Headroom is enabled as a LiteLLM callback
- Clients connect only to LiteLLM
- LiteLLM routes requests to the correct upstream provider
This is the most robust setup because:
- LiteLLM has broad provider support and is actively maintained
- Headroom can integrate directly into LiteLLM
- Gemini works best when routed directly via LiteLLM instead of going through OpenRouter
- OpenRouter is still available for models you specifically want there
- Clients only need one base URL and one API key for the local gateway
Use the following policy unless you have a strong reason to do otherwise:
- Anthropic: direct via
anthropic/... - Gemini: direct via
gemini/... - OpenRouter: via
openrouter/...
Gemini can often work through OpenRouter, but it adds another translation layer:
client -> LiteLLM -> OpenRouter -> Gemini
That increases the chance of compatibility issues, especially for chat formatting, tools, or multimodal requests. Direct Gemini routing through LiteLLM is usually cleaner and more predictable.
llm-gateway/
├─ docker-compose.yaml
├─ Dockerfile
├─ litellm_config.yaml
├─ .env # private — contains API keys, never commit
├─ .env.example # template with placeholders
├─ requirements.txt # Python test dependencies (openai, anthropic, google-genai, httpx, python-dotenv)
├─ test_e2e.py # E2E integration test suite
└─ data/
└─ headroom/
└─ .gitkeep # SQLite store for Headroom context optimization
# 1. Copy and fill in your API keys
cp .env.example .env
# edit .env with real keys
# 2. Build and start the gateway
docker compose up --build -d
# 3. Wait for the gateway to be healthy
curl http://localhost:4000/health \
-H "Authorization: Bearer sk-your-actual-master-key"
# 4. Run the E2E test suite
pip install -r requirements.txt
python test_e2e.py| Alias | Upstream provider + model |
|---|---|
claude-sonnet |
Anthropic claude-sonnet-4-6 |
claude-opus |
Anthropic claude-opus-4-6 |
gemini-flash |
Google Gemini gemini-3-flash-preview |
gemini-pro |
Google Gemini gemini-3.1-pro-preview |
openrouter-minimax |
OpenRouter minimax/minimax-m2.7 |
curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-change-me" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet",
"messages": [
{"role": "user", "content": "Say hello in one sentence."}
]
}'curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-change-me" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-flash",
"messages": [
{"role": "user", "content": "Summarize why direct Gemini routing is useful."}
]
}'curl http://localhost:4000/v1/chat/completions \
-H "Authorization: Bearer sk-change-me" \
-H "Content-Type: application/json" \
-d '{
"model": "openrouter-minimax",
"messages": [
{"role": "user", "content": "Write a short Go function that reverses a string."}
]
}'The test_e2e.py script validates both direct-provider connectivity and end-to-end
gateway routing for all 5 models:
# Full suite (direct + gateway tests)
python test_e2e.py
# Direct provider tests only (no Docker needed)
python test_e2e.py --direct-only
# Gateway tests only (assumes container is already running)
python test_e2e.py --gateway-only --skip-health
# Custom .env path
python test_e2e.py --env /path/to/.envExit code 0 = all tests passed. Exit code 1 = at least one failure (check printed results).