Skip to content

Reduce token usage in conversation search#302

Merged
mbklein merged 3 commits intodeploy/prototypefrom
5373-too-much-input-filter
Mar 25, 2025
Merged

Reduce token usage in conversation search#302
mbklein merged 3 commits intodeploy/prototypefrom
5373-too-much-input-filter

Conversation

@mbklein
Copy link
Copy Markdown
Member

@mbklein mbklein commented Mar 25, 2025

Building on PR #296, which reduced cumulative token usage across multiple interactions by removing all non-current ToolMessages, this PR reduces token counts within a single interaction by:

  • Reducing redundancy in tool results (by returning only content instead of content and artifact containing the same data in different forms)
  • Filtering fields in the documents returned from the index

On average, this reduces tool content byte size by about 90%, and token counts per interaction by about 80%. This allows complex questions to succeed by keeping tool recursion from overwhelming the LLM's max token count.

The PR also adds language to the system prompt to cap tool usage at 6 per interaction, with instructions to summarize results and ask for clarification if it still can't answer the question.

@mbklein mbklein requested review from charlesLoder and kdid March 25, 2025 16:35
@mbklein mbklein merged commit 0fe80e4 into deploy/prototype Mar 25, 2025
1 check passed
@mbklein mbklein deleted the 5373-too-much-input-filter branch March 25, 2025 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants