Smart Routers use a fast LLM to intelligently route each request to the best candidate model based on the query content.
- Request arrives with model name set to your Smart Router
- Designator LLM analyzes the query and selects the best candidate
- Request forwards to the selected model
- Response returns to the client (with routing metadata in headers)
Client Request → Smart Router → Designator LLM → Best Candidate → Response
↓
"This is a coding question,
route to Claude Sonnet"
In the Admin UI:
- Go to Smart Routers in the sidebar
- Click New Router
- Configure:
| Field | Description |
|---|---|
| Name | The model name clients will use (e.g., smart-router) |
| Description | Optional description for the Admin UI |
| Designator Model | Fast model that picks the best candidate |
| Candidates | List of models to choose from |
| Strategy | per_request (decide each time) or session (sticky) |
| Tags | Optional tags for cost attribution |
- Click Save
Router Name: smart
Designator Model: gemini/gemini-2.0-flash (fast and cheap)
Candidates:
anthropic/claude-sonnet-4-5- Best for coding and analysisopenai/gpt-4o- Good all-roundergroq/llama-3.3-70b-versatile- Fast for simple queries
Strategy: per_request
Now use it:
curl http://localhost:11434/api/chat \
-d '{"model": "smart", "messages": [{"role": "user", "content": "Write a Python function to parse JSON"}]}'The designator will route coding questions to Claude, general questions to GPT-4, and simple queries to Llama.
Choose a designator model that's:
- Fast - Adds latency to every request
- Cheap - Called on every request
- Good at classification - Needs to understand query intent
Recommended designators:
gemini/gemini-2.0-flash- Very fast, good reasoninggroq/llama-3.3-70b-versatile- Extremely fastanthropic/claude-haiku-4-5- Good balance
Avoid using expensive models (Opus, o1) as designators.
The designator evaluates each request independently. Best for:
- Varied query types
- Maximum routing accuracy
- Stateless applications
The designator picks a model on the first request, then sticks with it for the session. Best for:
- Conversational contexts
- Reducing designator calls
- Consistent model behavior within a conversation
Session is determined by the conversation history hash.
Model Intelligence enriches the designator's knowledge about each candidate model by gathering comparative assessments from the web.
- Ensure ChromaDB is configured (
CHROMA_URL) - Configure a web search provider (SearXNG recommended)
- In the Smart Router edit modal, enable Model Intelligence
- Configure:
- Search Provider: SearXNG, Perplexity, or Jina
- Summarizer Model: Model to summarize gathered intelligence
When enabled, LLM Relay:
- Searches for reviews and comparisons of your candidate models
- Extracts relevant assessments about strengths and weaknesses
- Stores summaries in ChromaDB
- Includes this context when the designator makes routing decisions
This helps the designator make more informed choices based on real-world model performance data.
Click Refresh Intelligence in the router's edit modal to re-gather model assessments. Do this when:
- You add new candidate models
- Significant time has passed (models improve)
- You want updated comparative data
Every response includes routing metadata:
X-LLM-Relay-Router: smart
X-LLM-Relay-Routed-To: anthropic/claude-sonnet-4-5
The Dashboard and Usage pages show:
- Which router was used
- Which candidate was selected
- Designator token usage (tracked separately)
- Start simple - Begin with 2-3 candidates, add more as needed
- Monitor routing - Check the Dashboard to see if routing matches expectations
- Tune candidates - Remove models that are rarely selected
- Use descriptive names - Help the designator with clear model descriptions in the purpose field
- Consider costs - The designator adds token usage to every request
- Check if your queries are too similar
- Try a smarter designator model
- Enable Model Intelligence for better context
- Switch to a faster designator (Groq, Gemini Flash)
- Use
sessionstrategy to reduce designator calls - Check designator model availability
- Verify the designator model is configured and accessible
- Check provider API key is set
- Look for errors in container logs