Skip to content

Fix deferred tool search keyword matching#5014

Open
pandego wants to merge 1 commit intopydantic:mainfrom
pandego:fix/4994-tool-search-match
Open

Fix deferred tool search keyword matching#5014
pandego wants to merge 1 commit intopydantic:mainfrom
pandego:fix/4994-tool-search-match

Conversation

@pandego
Copy link
Copy Markdown

@pandego pandego commented Apr 7, 2026

Pre-Review Checklist

  • Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
  • No breaking changes in accordance with the version policy.
  • Linting and type checking pass per make format and make typecheck.
  • PR title is fit for the release changelog.

Pre-Merge Checklist

  • New tests for any fix or new behavior, maintaining 100% coverage.
  • Updated documentation for new features and behaviors, including docstrings for API docs.

Summary

search_tools was matching keywords as raw substrings, which made deferred tool lookup too greedy.

This tightens matching by tokenizing tool names, descriptions, and the query, then only returning tools that match every query term as a real token. That keeps broad prefixes like github from crowding out the tool that actually matches the rest of the query, and it avoids substring hits inside words like comment for the query me.

Test plan

  • uv run pytest tests/test_tool_search.py -k 'prefers_specific_term_matches or does_not_match_substrings_inside_words or search_returns_matching_tools or search_matches_description or search_is_case_insensitive or search_returns_no_matches or search_empty_query or max_results'
  • uv run ruff check pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py tests/test_tool_search.py

@github-actions github-actions bot added size: S Small PR (≤100 weighted lines) bug Report that something isn't working, or PR implementing a fix labels Apr 7, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

View 1 additional finding in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Docstring says "matches any term" but code requires ALL terms to match

The docstring at line 161 states the search "matches any term against tool names and descriptions", implying OR semantics. However, the new implementation at line 174 uses if score == len(terms), which requires ALL search terms to be present in the entry's search_terms set (AND semantics). The old code used any(term in searchable for term in terms) which did match the "any term" description. This is a stale docstring that now contradicts the actual behavior, violating the coding guideline rule:198 ("Rename methods/functions when their behavior changes — names must reflect actual scope") from agent_docs/index.md.

(Refers to line 161)

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +169 to +174
terms = self._search_terms(keywords, None)

matches: list[dict[str, str | None]] = []
scored_matches: list[tuple[int, dict[str, str | None]]] = []
for entry in search_tool.search_index:
searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')
if any(term in searchable for term in terms):
matches.append({'name': entry.name, 'description': entry.description})
if len(matches) >= _MAX_SEARCH_RESULTS:
break
score = len(terms & entry.search_terms)
if score == len(terms):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Non-alphanumeric keywords produce empty token set, matching ALL tools

When the model sends keywords containing only non-alphanumeric characters (e.g., "---", "!!!", "...", or whitespace-only strings like " "), _search_terms returns an empty set because _SEARCH_TOKEN_RE ([a-z0-9]+) finds no matches. The guard at line 166 (if not keywords) only checks for empty/falsy strings, so these inputs pass through. Then at line 174, len(terms & entry.search_terms) is 0 and len(terms) is also 0, so the condition score == len(terms) evaluates to 0 == 0True for every entry in the search index, causing all deferred tools to be returned. The old code was immune to this because "---".split() produces ["---"], which wouldn't substring-match any tool. The fix is to check for an empty terms set after tokenization and raise ModelRetry.

Suggested change
terms = self._search_terms(keywords, None)
matches: list[dict[str, str | None]] = []
scored_matches: list[tuple[int, dict[str, str | None]]] = []
for entry in search_tool.search_index:
searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')
if any(term in searchable for term in terms):
matches.append({'name': entry.name, 'description': entry.description})
if len(matches) >= _MAX_SEARCH_RESULTS:
break
score = len(terms & entry.search_terms)
if score == len(terms):
terms = self._search_terms(keywords, None)
if not terms:
raise ModelRetry('Please provide search keywords.')
scored_matches: list[tuple[int, dict[str, str | None]]] = []
for entry in search_tool.search_index:
score = len(terms & entry.search_terms)
if score == len(terms):
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +177 to +178
scored_matches.sort(key=lambda item: item[0], reverse=True)
matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Sorting by score is a no-op — all matched entries have identical scores

At _tool_search.py:177, scored_matches is sorted by score in descending order. However, the filter condition at line 174 is if score == len(terms), which means every entry that passes the filter has exactly the same score (len(terms)). The sort therefore never reorders anything. This appears to be scaffolding for a future scoring model (e.g., partial matches or TF-IDF weighting), but as written it's dead code that adds slight overhead and could mislead readers into thinking ranking is actually happening. The test test_tool_search_toolset_prefers_specific_term_matches has a name suggesting ranking behavior, but it actually tests that AND-filtering excludes tools missing a term — not that results are ranked.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +169 to +178
terms = self._search_terms(keywords, None)

matches: list[dict[str, str | None]] = []
scored_matches: list[tuple[int, dict[str, str | None]]] = []
for entry in search_tool.search_index:
searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')
if any(term in searchable for term in terms):
matches.append({'name': entry.name, 'description': entry.description})
if len(matches) >= _MAX_SEARCH_RESULTS:
break
score = len(terms & entry.search_terms)
if score == len(terms):
scored_matches.append((score, {'name': entry.name, 'description': entry.description}))

scored_matches.sort(key=lambda item: item[0], reverse=True)
matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Behavioral change from OR to AND matching semantics may affect recorded VCR cassettes

The old search used any(term in searchable for term in terms) (OR — match if ANY keyword appears as a substring), while the new code uses score == len(terms) (AND — ALL keywords must appear as exact tokens). This is a significant semantic change. For example, a search for 'stock weather' would previously match both stock_price (has 'stock') and get_weather (has 'weather'), but now matches neither because no single tool has both tokens. This is likely intentional (the new tests confirm it), but it means the VCR cassettes for integration tests in tests/test_tool_search.py (lines 257-311) may need re-recording if any model sends multi-keyword queries where only a subset of terms match a given tool. The existing snapshot expectations at lines 258-310 should be verified against the cassettes.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Report that something isn't working, or PR implementing a fix size: S Small PR (≤100 weighted lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

buggy search in search_tools for deferred tools

1 participant