Smart Augmentors

Smart Augmentors enhance LLM requests with real-time web content. Every request is automatically augmented with search results and scraped web pages, giving the LLM access to current information.

How It Works

Request arrives with model name set to your Smart Augmentor
Designator LLM generates an optimized search query
Web search finds relevant results
Reranking orders URLs by relevance to the query
Scraping extracts content from top URLs
Context injection adds the content to the system prompt
Target model receives the augmented request

Client Request → Augmentor → Designator → Search → Rerank → Scrape
                                ↓            ↓        ↓         ↓
                         "AI news 2024"  SearXNG  Cross-Encoder  Trafilatura
                                                        ↓
                                              Inject into System Prompt
                                                        ↓
                                                  Target Model

Prerequisites

Search Provider: SearXNG (self-hosted), Perplexity, or Jina (requires API key)

Creating a Smart Augmentor

In the Admin UI:

Go to Smart Augmentors in the sidebar
Click New Augmentor
Configure:

Field	Description
Name	The model name clients will use (e.g., `search-claude`)
Target Model	The underlying model to call with augmented context
Designator Model	Fast model that generates search queries
Search Provider	SearXNG, Perplexity, or Jina
Max Search Results	Number of search results to fetch
Max Scrape URLs	Number of URLs to scrape for full content
Max Context Tokens	Token limit for injected context
Scraper Provider	Built-in (trafilatura) or Jina Reader

Click Save

Example Configuration

Augmentor Name: search-claude

Target Model: anthropic/claude-sonnet-4-5

Designator Model: gemini/gemini-2.0-flash

Search Provider: searxng

Max Search Results: 6

Max Scrape URLs: 3

Max Context Tokens: 8000

Now use it:

curl http://localhost:11434/api/chat \
  -d '{"model": "search-claude", "messages": [{"role": "user", "content": "What happened in AI this week?"}]}'

The augmentor will:

Generate query: "AI news this week January 2025"
Search and get 6 results
Rerank URLs by relevance
Scrape top 3 URLs
Inject content into Claude's context
Return Claude's response with current information

Search Providers

SearXNG (Recommended)

Self-hosted metasearch engine. No API key required, full privacy.

Setup:

# docker-compose.yml
services:
  searxng:
    image: searxng/searxng:latest
    ports:
      - "8888:8080"

Configuration:

Search Provider: searxng
Search Provider URL: http://searxng:8080 (or leave empty for env var)

Set SEARXNG_URL in your environment.

Perplexity

Uses Perplexity's API for search. Requires API key.

Configuration:

Search Provider: perplexity
Set PERPLEXITY_API_KEY in environment

Jina Search

Uses Jina's search API. Requires API key.

Configuration:

Search Provider: jina
Set JINA_API_KEY in environment

Scraper Providers

Built-in (Default)

Uses httpx for fetching and trafilatura for content extraction. Works well for most sites.

Pros: No external dependencies, fast Cons: May fail on JavaScript-heavy sites

Jina Reader

Uses Jina's Reader API (r.jina.ai) for content extraction. Handles JavaScript rendering.

Configuration:

Scraper Provider: jina
Optionally set JINA_API_KEY for higher rate limits

Pros: Handles JavaScript, clean markdown output Cons: External API dependency, rate limits without API key

Reranking

Smart Augmentors automatically rerank search results before scraping using a cross-encoder model. This ensures the most relevant URLs are scraped first.

Default Model: cross-encoder/ms-marco-MiniLM-L-6-v2 (~48MB, runs locally)

The reranker scores each URL based on how well its title and snippet match the search query, then selects the top URLs for scraping.

Jina Reranker (Optional)

For API-based reranking, configure:

Rerank Provider: jina
Set JINA_API_KEY in environment

Context Injection

Augmented content is injected into the system prompt:

<augmented_context>
Today's date: 2025-01-15

The following information was retrieved from the web to help answer the user's question.
Use this information to provide an accurate, up-to-date response.

## Web Search Results

### 1. AI Breakthroughs in 2025
URL: https://example.com/ai-news
Summary of the article...

---

## Scraped Content

### From: https://example.com/ai-news
Full article content here...

</augmented_context>

The target model sees this context and can use it to provide informed responses.

Configuration Tips

Designator Model

Choose a fast, cheap model:

gemini/gemini-2.0-flash - Very fast
groq/llama-3.3-70b-versatile - Extremely fast
anthropic/claude-haiku-4-5 - Good balance

The designator generates search queries, not final responses.

Search vs Scrape Balance

Use Case	Search Results	Scrape URLs
Quick answers	6	2
Research	10	5
Deep dive	15	8

More scraping = better context but higher latency.

Token Limits

Set max_context_tokens based on your target model's context window:

4000-8000 for most queries
16000+ for deep research

Viewing Augmentation

Response Headers

X-LLM-Relay-Augmentor: search-claude
X-LLM-Relay-Augmentation: search+scrape
X-LLM-Relay-Search-Query: AI news January 2025

Admin UI

The Dashboard shows:

Augmentation rate (% of requests augmented)
Search requests
Scrape requests
Designator token usage

Best Practices

Use fast designators - They're called on every request
Limit scraping - 2-3 URLs is usually sufficient
Match token limits - Don't exceed target model's context
Monitor latency - Augmentation adds 2-5 seconds typically
Use SearXNG - Self-hosted, no rate limits, private

Troubleshooting

No search results

Verify search provider is configured
Check SEARXNG_URL or provider API key
Look for errors in container logs

Scraping failures

Some sites block automated access (403/401 errors)
Try Jina Reader for JavaScript-heavy sites
Check container logs for specific errors

High latency

Reduce max_scrape_urls
Use a faster designator model
Consider caching with Smart Cache

Poor search queries

The designator generates queries from user input
Try a smarter designator model
Check if user queries are clear enough

Context too large

Reduce max_context_tokens
Reduce max_scrape_urls
Some scraped pages may be very long

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smart Augmentors

How It Works

Prerequisites

Creating a Smart Augmentor

Example Configuration

Search Providers

SearXNG (Recommended)

Perplexity

Jina Search

Scraper Providers

Built-in (Default)

Jina Reader

Reranking

Jina Reranker (Optional)

Context Injection

Configuration Tips

Designator Model

Search vs Scrape Balance

Token Limits

Viewing Augmentation

Response Headers

Admin UI

Best Practices

Troubleshooting

No search results

Scraping failures

High latency

Poor search queries

Context too large

FilesExpand file tree

smart-augmentors.md

Latest commit

History

smart-augmentors.md

File metadata and controls

Smart Augmentors

How It Works

Prerequisites

Creating a Smart Augmentor

Example Configuration

Search Providers

SearXNG (Recommended)

Perplexity

Jina Search

Scraper Providers

Built-in (Default)

Jina Reader

Reranking

Jina Reranker (Optional)

Context Injection

Configuration Tips

Designator Model

Search vs Scrape Balance

Token Limits

Viewing Augmentation

Response Headers

Admin UI

Best Practices

Troubleshooting

No search results

Scraping failures

High latency

Poor search queries

Context too large