Smart Augmentors enhance LLM requests with real-time web content. Every request is automatically augmented with search results and scraped web pages, giving the LLM access to current information.
- Request arrives with model name set to your Smart Augmentor
- Designator LLM generates an optimized search query
- Web search finds relevant results
- Reranking orders URLs by relevance to the query
- Scraping extracts content from top URLs
- Context injection adds the content to the system prompt
- Target model receives the augmented request
Client Request → Augmentor → Designator → Search → Rerank → Scrape
↓ ↓ ↓ ↓
"AI news 2024" SearXNG Cross-Encoder Trafilatura
↓
Inject into System Prompt
↓
Target Model
- Search Provider: SearXNG (self-hosted), Perplexity, or Jina (requires API key)
In the Admin UI:
- Go to Smart Augmentors in the sidebar
- Click New Augmentor
- Configure:
| Field | Description |
|---|---|
| Name | The model name clients will use (e.g., search-claude) |
| Target Model | The underlying model to call with augmented context |
| Designator Model | Fast model that generates search queries |
| Search Provider | SearXNG, Perplexity, or Jina |
| Max Search Results | Number of search results to fetch |
| Max Scrape URLs | Number of URLs to scrape for full content |
| Max Context Tokens | Token limit for injected context |
| Scraper Provider | Built-in (trafilatura) or Jina Reader |
- Click Save
Augmentor Name: search-claude
Target Model: anthropic/claude-sonnet-4-5
Designator Model: gemini/gemini-2.0-flash
Search Provider: searxng
Max Search Results: 6
Max Scrape URLs: 3
Max Context Tokens: 8000
Now use it:
curl http://localhost:11434/api/chat \
-d '{"model": "search-claude", "messages": [{"role": "user", "content": "What happened in AI this week?"}]}'The augmentor will:
- Generate query: "AI news this week January 2025"
- Search and get 6 results
- Rerank URLs by relevance
- Scrape top 3 URLs
- Inject content into Claude's context
- Return Claude's response with current information
Self-hosted metasearch engine. No API key required, full privacy.
Setup:
# docker-compose.yml
services:
searxng:
image: searxng/searxng:latest
ports:
- "8888:8080"Configuration:
- Search Provider:
searxng - Search Provider URL:
http://searxng:8080(or leave empty for env var)
Set SEARXNG_URL in your environment.
Uses Perplexity's API for search. Requires API key.
Configuration:
- Search Provider:
perplexity - Set
PERPLEXITY_API_KEYin environment
Uses Jina's search API. Requires API key.
Configuration:
- Search Provider:
jina - Set
JINA_API_KEYin environment
Uses httpx for fetching and trafilatura for content extraction. Works well for most sites.
Pros: No external dependencies, fast Cons: May fail on JavaScript-heavy sites
Uses Jina's Reader API (r.jina.ai) for content extraction. Handles JavaScript rendering.
Configuration:
- Scraper Provider:
jina - Optionally set
JINA_API_KEYfor higher rate limits
Pros: Handles JavaScript, clean markdown output Cons: External API dependency, rate limits without API key
Smart Augmentors automatically rerank search results before scraping using a cross-encoder model. This ensures the most relevant URLs are scraped first.
Default Model: cross-encoder/ms-marco-MiniLM-L-6-v2 (~48MB, runs locally)
The reranker scores each URL based on how well its title and snippet match the search query, then selects the top URLs for scraping.
For API-based reranking, configure:
- Rerank Provider:
jina - Set
JINA_API_KEYin environment
Augmented content is injected into the system prompt:
<augmented_context>
Today's date: 2025-01-15
The following information was retrieved from the web to help answer the user's question.
Use this information to provide an accurate, up-to-date response.
## Web Search Results
### 1. AI Breakthroughs in 2025
URL: https://example.com/ai-news
Summary of the article...
---
## Scraped Content
### From: https://example.com/ai-news
Full article content here...
</augmented_context>The target model sees this context and can use it to provide informed responses.
Choose a fast, cheap model:
gemini/gemini-2.0-flash- Very fastgroq/llama-3.3-70b-versatile- Extremely fastanthropic/claude-haiku-4-5- Good balance
The designator generates search queries, not final responses.
| Use Case | Search Results | Scrape URLs |
|---|---|---|
| Quick answers | 6 | 2 |
| Research | 10 | 5 |
| Deep dive | 15 | 8 |
More scraping = better context but higher latency.
Set max_context_tokens based on your target model's context window:
- 4000-8000 for most queries
- 16000+ for deep research
X-LLM-Relay-Augmentor: search-claude
X-LLM-Relay-Augmentation: search+scrape
X-LLM-Relay-Search-Query: AI news January 2025
The Dashboard shows:
- Augmentation rate (% of requests augmented)
- Search requests
- Scrape requests
- Designator token usage
- Use fast designators - They're called on every request
- Limit scraping - 2-3 URLs is usually sufficient
- Match token limits - Don't exceed target model's context
- Monitor latency - Augmentation adds 2-5 seconds typically
- Use SearXNG - Self-hosted, no rate limits, private
- Verify search provider is configured
- Check
SEARXNG_URLor provider API key - Look for errors in container logs
- Some sites block automated access (403/401 errors)
- Try Jina Reader for JavaScript-heavy sites
- Check container logs for specific errors
- Reduce
max_scrape_urls - Use a faster designator model
- Consider caching with Smart Cache
- The designator generates queries from user input
- Try a smarter designator model
- Check if user queries are clear enough
- Reduce
max_context_tokens - Reduce
max_scrape_urls - Some scraped pages may be very long