Fix deferred tool search keyword matching#5014
Conversation
There was a problem hiding this comment.
🟡 Docstring says "matches any term" but code requires ALL terms to match
The docstring at line 161 states the search "matches any term against tool names and descriptions", implying OR semantics. However, the new implementation at line 174 uses if score == len(terms), which requires ALL search terms to be present in the entry's search_terms set (AND semantics). The old code used any(term in searchable for term in terms) which did match the "any term" description. This is a stale docstring that now contradicts the actual behavior, violating the coding guideline rule:198 ("Rename methods/functions when their behavior changes — names must reflect actual scope") from agent_docs/index.md.
(Refers to line 161)
Was this helpful? React with 👍 or 👎 to provide feedback.
| terms = self._search_terms(keywords, None) | ||
|
|
||
| matches: list[dict[str, str | None]] = [] | ||
| scored_matches: list[tuple[int, dict[str, str | None]]] = [] | ||
| for entry in search_tool.search_index: | ||
| searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '') | ||
| if any(term in searchable for term in terms): | ||
| matches.append({'name': entry.name, 'description': entry.description}) | ||
| if len(matches) >= _MAX_SEARCH_RESULTS: | ||
| break | ||
| score = len(terms & entry.search_terms) | ||
| if score == len(terms): |
There was a problem hiding this comment.
🔴 Non-alphanumeric keywords produce empty token set, matching ALL tools
When the model sends keywords containing only non-alphanumeric characters (e.g., "---", "!!!", "...", or whitespace-only strings like " "), _search_terms returns an empty set because _SEARCH_TOKEN_RE ([a-z0-9]+) finds no matches. The guard at line 166 (if not keywords) only checks for empty/falsy strings, so these inputs pass through. Then at line 174, len(terms & entry.search_terms) is 0 and len(terms) is also 0, so the condition score == len(terms) evaluates to 0 == 0 → True for every entry in the search index, causing all deferred tools to be returned. The old code was immune to this because "---".split() produces ["---"], which wouldn't substring-match any tool. The fix is to check for an empty terms set after tokenization and raise ModelRetry.
| terms = self._search_terms(keywords, None) | |
| matches: list[dict[str, str | None]] = [] | |
| scored_matches: list[tuple[int, dict[str, str | None]]] = [] | |
| for entry in search_tool.search_index: | |
| searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '') | |
| if any(term in searchable for term in terms): | |
| matches.append({'name': entry.name, 'description': entry.description}) | |
| if len(matches) >= _MAX_SEARCH_RESULTS: | |
| break | |
| score = len(terms & entry.search_terms) | |
| if score == len(terms): | |
| terms = self._search_terms(keywords, None) | |
| if not terms: | |
| raise ModelRetry('Please provide search keywords.') | |
| scored_matches: list[tuple[int, dict[str, str | None]]] = [] | |
| for entry in search_tool.search_index: | |
| score = len(terms & entry.search_terms) | |
| if score == len(terms): |
Was this helpful? React with 👍 or 👎 to provide feedback.
| scored_matches.sort(key=lambda item: item[0], reverse=True) | ||
| matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]] |
There was a problem hiding this comment.
🚩 Sorting by score is a no-op — all matched entries have identical scores
At _tool_search.py:177, scored_matches is sorted by score in descending order. However, the filter condition at line 174 is if score == len(terms), which means every entry that passes the filter has exactly the same score (len(terms)). The sort therefore never reorders anything. This appears to be scaffolding for a future scoring model (e.g., partial matches or TF-IDF weighting), but as written it's dead code that adds slight overhead and could mislead readers into thinking ranking is actually happening. The test test_tool_search_toolset_prefers_specific_term_matches has a name suggesting ranking behavior, but it actually tests that AND-filtering excludes tools missing a term — not that results are ranked.
Was this helpful? React with 👍 or 👎 to provide feedback.
| terms = self._search_terms(keywords, None) | ||
|
|
||
| matches: list[dict[str, str | None]] = [] | ||
| scored_matches: list[tuple[int, dict[str, str | None]]] = [] | ||
| for entry in search_tool.search_index: | ||
| searchable = entry.name_lower + (' ' + entry.description_lower if entry.description_lower else '') | ||
| if any(term in searchable for term in terms): | ||
| matches.append({'name': entry.name, 'description': entry.description}) | ||
| if len(matches) >= _MAX_SEARCH_RESULTS: | ||
| break | ||
| score = len(terms & entry.search_terms) | ||
| if score == len(terms): | ||
| scored_matches.append((score, {'name': entry.name, 'description': entry.description})) | ||
|
|
||
| scored_matches.sort(key=lambda item: item[0], reverse=True) | ||
| matches = [match for _, match in scored_matches[:_MAX_SEARCH_RESULTS]] |
There was a problem hiding this comment.
🚩 Behavioral change from OR to AND matching semantics may affect recorded VCR cassettes
The old search used any(term in searchable for term in terms) (OR — match if ANY keyword appears as a substring), while the new code uses score == len(terms) (AND — ALL keywords must appear as exact tokens). This is a significant semantic change. For example, a search for 'stock weather' would previously match both stock_price (has 'stock') and get_weather (has 'weather'), but now matches neither because no single tool has both tokens. This is likely intentional (the new tests confirm it), but it means the VCR cassettes for integration tests in tests/test_tool_search.py (lines 257-311) may need re-recording if any model sends multi-keyword queries where only a subset of terms match a given tool. The existing snapshot expectations at lines 258-310 should be verified against the cassettes.
Was this helpful? React with 👍 or 👎 to provide feedback.
search_toolsfor deferred tools #4994Pre-Review Checklist
make formatandmake typecheck.Pre-Merge Checklist
Summary
search_toolswas matching keywords as raw substrings, which made deferred tool lookup too greedy.This tightens matching by tokenizing tool names, descriptions, and the query, then only returning tools that match every query term as a real token. That keeps broad prefixes like
githubfrom crowding out the tool that actually matches the rest of the query, and it avoids substring hits inside words likecommentfor the queryme.Test plan
uv run pytest tests/test_tool_search.py -k 'prefers_specific_term_matches or does_not_match_substrings_inside_words or search_returns_matching_tools or search_matches_description or search_is_case_insensitive or search_returns_no_matches or search_empty_query or max_results'uv run ruff check pydantic_ai_slim/pydantic_ai/toolsets/_tool_search.py tests/test_tool_search.py