Skip to content

RuntimeRacer/headroom-litellm-proxy

Repository files navigation

Headroom + LiteLLM Gateway for Anthropic, OpenRouter, and Gemini

This setup provides a single OpenAI-compatible endpoint that routes requests to multiple upstream LLM providers through LiteLLM, while using Headroom to reduce API footprint by compressing and optimizing request context before it is sent upstream.

Goals

  • Single local gateway endpoint for all clients
  • Support for:
    • Anthropic
    • OpenRouter
    • Google Gemini
  • Headroom-based context optimization on top of LiteLLM
  • Docker Compose deployment
  • Easy model aliasing for coding tools and agent frameworks

Architecture

The recommended layout is:

  • LiteLLM acts as the main OpenAI-compatible gateway
  • Headroom is installed into the same container
  • Headroom is enabled as a LiteLLM callback
  • Clients connect only to LiteLLM
  • LiteLLM routes requests to the correct upstream provider

Why this design?

This is the most robust setup because:

  • LiteLLM has broad provider support and is actively maintained
  • Headroom can integrate directly into LiteLLM
  • Gemini works best when routed directly via LiteLLM instead of going through OpenRouter
  • OpenRouter is still available for models you specifically want there
  • Clients only need one base URL and one API key for the local gateway

Recommended provider routing

Use the following policy unless you have a strong reason to do otherwise:

  • Anthropic: direct via anthropic/...
  • Gemini: direct via gemini/...
  • OpenRouter: via openrouter/...

Why not Gemini through OpenRouter by default?

Gemini can often work through OpenRouter, but it adds another translation layer:

client -> LiteLLM -> OpenRouter -> Gemini

That increases the chance of compatibility issues, especially for chat formatting, tools, or multimodal requests. Direct Gemini routing through LiteLLM is usually cleaner and more predictable.


Project structure

llm-gateway/
├─ docker-compose.yaml
├─ Dockerfile
├─ litellm_config.yaml
├─ .env              # private — contains API keys, never commit
├─ .env.example      # template with placeholders
├─ requirements.txt  # Python test dependencies (openai, anthropic, google-genai, httpx, python-dotenv)
├─ test_e2e.py       # E2E integration test suite
└─ data/
   └─ headroom/
       └─ .gitkeep  # SQLite store for Headroom context optimization

Quick start

# 1. Copy and fill in your API keys
cp .env.example .env
# edit .env with real keys

# 2. Build and start the gateway
docker compose up --build -d

# 3. Wait for the gateway to be healthy
curl http://localhost:4000/health \
  -H "Authorization: Bearer sk-your-actual-master-key"

# 4. Run the E2E test suite
pip install -r requirements.txt
python test_e2e.py

Available model aliases

Alias Upstream provider + model
claude-sonnet Anthropic claude-sonnet-4-6
claude-opus Anthropic claude-opus-4-6
gemini-flash Google Gemini gemini-3-flash-preview
gemini-pro Google Gemini gemini-3.1-pro-preview
openrouter-minimax OpenRouter minimax/minimax-m2.7

Testing

Anthropic

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet",
    "messages": [
      {"role": "user", "content": "Say hello in one sentence."}
    ]
  }'

Google

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemini-flash",
    "messages": [
      {"role": "user", "content": "Summarize why direct Gemini routing is useful."}
    ]
  }'

OpenRouter

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer sk-change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openrouter-minimax",
    "messages": [
      {"role": "user", "content": "Write a short Go function that reverses a string."}
    ]
  }'

E2E test suite

The test_e2e.py script validates both direct-provider connectivity and end-to-end gateway routing for all 5 models:

# Full suite (direct + gateway tests)
python test_e2e.py

# Direct provider tests only (no Docker needed)
python test_e2e.py --direct-only

# Gateway tests only (assumes container is already running)
python test_e2e.py --gateway-only --skip-health

# Custom .env path
python test_e2e.py --env /path/to/.env

Exit code 0 = all tests passed. Exit code 1 = at least one failure (check printed results).

About

Small docker setup to combine headroom-ai proxy with LiteLLM API Gateway

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors