Skip to content

feat: add configurable result limit to reduce LLM prompt size for local/small-context models #1059

@CrossAndHorsesRanch

Description

@CrossAndHorsesRanch

Is your feature request related to a problem? Please describe.
When running Vane with local LLMs (e.g. llama.cpp, llama-server) on consumer hardware, SearXNG can return 20-25+ results per query. All results are passed directly into the LLM prompt, causing excessive prompt token counts and slow inference times on small-context or quantized models.

Observed with Qwen3.5-27B-Q4_K_M on AMD Radeon MI60:

  • Up to 5,621 prompt tokens in a single LLM call
  • End-to-end query time of ~190s
  • 5 LLM calls per query in speed mode

A secondary issue: when multiple search engines are active, results are merged and deduplicated by SearXNG before being returned to Vane. Per-query slicing alone is insufficient — the combined post-merge result set must also be capped to prevent context overflow across agentic iterations in agentMessageHistory.

Describe the solution you'd like
Two new optional API parameters on /api/search:

  • maxResultsPerQuery — limits results per search action (web/social/academic)
  • maxTotalResults — caps the total post-merge result set fed into agentMessageHistory, regardless of how many engines are active

Both accept positive integers and fall back gracefully to uncapped behavior when not set, preserving backward compatibility.

Describe alternatives you've considered
Lowering the SearXNG max_results setting globally — but this affects all use cases and cannot be tuned per request.

Additional context
After applying these limits (5 results per query, 5 total), observed with Qwen3.5-27B-Q4_K_M:

  • Prompt tokens reduced from 5,621 to ~515
  • End-to-date time reduced from ~190s to ~43s (brave + bing engines)
  • LLM calls reduced from 5 to 3

Related PR
Addressed by #1056

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions