feat: add configurable result limit to reduce LLM prompt size for local/small-context models

**Is your feature request related to a problem? Please describe.**
When running Vane with local LLMs (e.g. llama.cpp, llama-server) on consumer hardware, SearXNG can return 20-25+ results per query. All results are passed directly into the LLM prompt, causing excessive prompt token counts and slow inference times on small-context or quantized models.

Observed with Qwen3.5-27B-Q4_K_M on AMD Radeon MI60:
- Up to 5,621 prompt tokens in a single LLM call
- End-to-end query time of ~190s
- 5 LLM calls per query in speed mode

A secondary issue: when multiple search engines are active, results are merged and deduplicated by SearXNG before being returned to Vane. Per-query slicing alone is insufficient — the combined post-merge result set must also be capped to prevent context overflow across agentic iterations in agentMessageHistory.

**Describe the solution you'd like**
Two new optional API parameters on `/api/search`:

- `maxResultsPerQuery` — limits results per search action (web/social/academic)
- `maxTotalResults` — caps the total post-merge result set fed into agentMessageHistory, regardless of how many engines are active

Both accept positive integers and fall back gracefully to uncapped behavior when not set, preserving backward compatibility.

**Describe alternatives you've considered**
Lowering the SearXNG `max_results` setting globally — but this affects all use cases and cannot be tuned per request.

**Additional context**
After applying these limits (5 results per query, 5 total), observed with Qwen3.5-27B-Q4_K_M:
- Prompt tokens reduced from 5,621 to ~515
- End-to-date time reduced from ~190s to ~43s (brave + bing engines)
- LLM calls reduced from 5 to 3

**Related PR**
Addressed by #1056


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add configurable result limit to reduce LLM prompt size for local/small-context models #1059

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: add configurable result limit to reduce LLM prompt size for local/small-context models #1059

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions