Fix deferred tool search keyword matching by pandego · Pull Request #5014 · pydantic/pydantic-ai

pandego · 2026-04-07T16:24:36Z

Closes buggy search in search_tools for deferred tools #4994

Pre-Review Checklist

Any AI generated code has been reviewed line-by-line by the human PR author, who stands by it.
No breaking changes in accordance with the version policy.
Linting and type checking pass per make format and make typecheck.
PR title is fit for the release changelog.

Pre-Merge Checklist

New tests for any fix or new behavior, maintaining 100% coverage.
Updated documentation for new features and behaviors, including docstrings for API docs.

Summary

search_tools was matching keywords as raw substrings, which made deferred tool lookup too greedy.

This tightens matching by tokenizing tool names, descriptions, and the query, then only returning tools that match every query term as a real token. That keeps broad prefixes like github from crowding out the tool that actually matches the rest of the query, and it avoids substring hits inside words like comment for the query me.

Test plan

uv run pytest tests/test_tool_search.py -k 'prefers_specific_term_matches or does_not_match_substrings_inside_words or search_returns_matching_tools or search_matches_description or search_is_case_insensitive or search_returns_no_matches or search_empty_query or max_results'
uv run ruff check pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py tests/test_tool_search.py

devin-ai-integration

Devin Review found 4 potential issues.

View 1 additional finding in Devin Review.

devin-ai-integration · 2026-04-07T16:28:16Z

pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py

🟡 Docstring says "matches any term" but code requires ALL terms to match

The docstring at line 161 states the search "matches any term against tool names and descriptions", implying OR semantics. However, the new implementation at line 174 uses if score == len(terms), which requires ALL search terms to be present in the entry's search_terms set (AND semantics). The old code used any(term in searchable for term in terms) which did match the "any term" description. This is a stale docstring that now contradicts the actual behavior, violating the coding guideline rule:198 ("Rename methods/functions when their behavior changes — names must reflect actual scope") from agent_docs/index.md.

(Refers to line 161)

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-07T16:28:17Z

pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py

+        terms = self._search_terms(keywords, None)

-        matches: list[dict[str, str | None]] = []
+        scored_matches: list[tuple[int, dict[str, str | None]]] = []
        for entry in search_tool.search_index:
-            searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')
-            if any(term in searchable for term in terms):
-                matches.append({'name': entry.name, 'description': entry.description})
-                if len(matches) >= _MAX_SEARCH_RESULTS:
-                    break
+            score = len(terms & entry.search_terms)
+            if score == len(terms):


🔴 Non-alphanumeric keywords produce empty token set, matching ALL tools

When the model sends keywords containing only non-alphanumeric characters (e.g., "---", "!!!", "...", or whitespace-only strings like " "), _search_terms returns an empty set because _SEARCH_TOKEN_RE ([a-z0-9]+) finds no matches. The guard at line 166 (if not keywords) only checks for empty/falsy strings, so these inputs pass through. Then at line 174, len(terms & entry.search_terms) is 0 and len(terms) is also 0, so the condition score == len(terms) evaluates to 0 == 0 → True for every entry in the search index, causing all deferred tools to be returned. The old code was immune to this because "---".split() produces ["---"], which wouldn't substring-match any tool. The fix is to check for an empty terms set after tokenization and raise ModelRetry.

Suggested change

terms = self._search_terms(keywords, None)

matches: list[dict[str, str | None]] = []

scored_matches: list[tuple[int, dict[str, str | None]]] = []

for entry in search_tool.search_index:

searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')

if any(term in searchable for term in terms):

matches.append({'name': entry.name, 'description': entry.description})

if len(matches) >= _MAX_SEARCH_RESULTS:

break

score = len(terms & entry.search_terms)

if score == len(terms):

terms = self._search_terms(keywords, None)

if not terms:

raise ModelRetry('Please provide search keywords.')

scored_matches: list[tuple[int, dict[str, str | None]]] = []

for entry in search_tool.search_index:

score = len(terms & entry.search_terms)

if score == len(terms):

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-07T16:28:18Z

pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py

+        scored_matches.sort(key=lambda item: item[0], reverse=True)
+        matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]]


🚩 Sorting by score is a no-op — all matched entries have identical scores

At _tool_search.py:177, scored_matches is sorted by score in descending order. However, the filter condition at line 174 is if score == len(terms), which means every entry that passes the filter has exactly the same score (len(terms)). The sort therefore never reorders anything. This appears to be scaffolding for a future scoring model (e.g., partial matches or TF-IDF weighting), but as written it's dead code that adds slight overhead and could mislead readers into thinking ranking is actually happening. The test test_tool_search_toolset_prefers_specific_term_matches has a name suggesting ranking behavior, but it actually tests that AND-filtering excludes tools missing a term — not that results are ranked.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-07T16:28:20Z

pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py

+        terms = self._search_terms(keywords, None)

-        matches: list[dict[str, str | None]] = []
+        scored_matches: list[tuple[int, dict[str, str | None]]] = []
        for entry in search_tool.search_index:
-            searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '')
-            if any(term in searchable for term in terms):
-                matches.append({'name': entry.name, 'description': entry.description})
-                if len(matches) >= _MAX_SEARCH_RESULTS:
-                    break
+            score = len(terms & entry.search_terms)
+            if score == len(terms):
+                scored_matches.append((score, {'name': entry.name, 'description': entry.description}))
+
+        scored_matches.sort(key=lambda item: item[0], reverse=True)
+        matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]]


🚩 Behavioral change from OR to AND matching semantics may affect recorded VCR cassettes

The old search used any(term in searchable for term in terms) (OR — match if ANY keyword appears as a substring), while the new code uses score == len(terms) (AND — ALL keywords must appear as exact tokens). This is a significant semantic change. For example, a search for 'stock weather' would previously match both stock_price (has 'stock') and get_weather (has 'weather'), but now matches neither because no single tool has both tokens. This is likely intentional (the new tests confirm it), but it means the VCR cassettes for integration tests in tests/test_tool_search.py (lines 257-311) may need re-recording if any model sends multi-keyword queries where only a subset of terms match a given tool. The existing snapshot expectations at lines 258-310 should be verified against the cassettes.

Was this helpful? React with 👍 or 👎 to provide feedback.

fix: tighten deferred tool search matching

c19d15f

github-actions bot added size: S Small PR (≤100 weighted lines) bug Report that something isn't working, or PR implementing a fix labels Apr 7, 2026

devin-ai-integration bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deferred tool search keyword matching#5014

Fix deferred tool search keyword matching#5014
pandego wants to merge 1 commit intopydantic:mainfrom
pandego:fix/4994-tool-search-match

pandego commented Apr 7, 2026

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Apr 7, 2026

Uh oh!

devin-ai-integration bot Apr 7, 2026

Uh oh!

devin-ai-integration bot Apr 7, 2026

Uh oh!

devin-ai-integration bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		scored_matches.sort(key=lambda item: item[0], reverse=True)
		matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]]

Conversation

pandego commented Apr 7, 2026

Pre-Review Checklist

Pre-Merge Checklist

Summary

Test plan

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant