feat(wikipedia): keyword ladder — retry zero-result queries with key terms#89
Merged
Conversation
…terms Closes #86. MediaWiki search matches article titles/text and returns zero for the planner's long natural-language queries ("nginx fastcgi_buffer_size upstream sent too big header php-fpm fix"), which hollowed out wikipedia exactly where it matters most: as the default fallback when the primary backend is rate-limited. Both stress benches showed the same two questions dying with the fallback engaged but empty-handed. When the verbatim query finds nothing, WikipediaSearch now walks progressively shorter keyword variants (4 -> 2 -> 1 leading content tokens, via the new pure src/query-keywords.ts: stopwords + generic instruction words dropped, technical tokens like HTTP/3, php-fpm, fastcgi_buffer_size kept intact) until a rung returns results. At most 3 extra calls against a keyless API, only on the would-have-been-empty path; verbatim hits are untouched (pinned by test). Applies to ALL wikipedia usage (direct, multi: sub-adapter, fallback) — returning keyword-reduced results beats returning nothing in every one of those seats. Live-validated against real Wikipedia with the exact bench queries that failed: the nginx error-string query now grounds via the ladder, and the HTTP/3 question lands on the protocol article. 693 tests (was 682): extractKeywords/keywordLadder (bench queries as regression anchors), adapter ladder walk/verbatim-hit/all-empty. Also checked in: the morning healthy-DDG bench scoreboard (4/6 PASS — the two failures are this exact gap; DDG re-throttled mid-bench and the un-laddered fallback came back empty).
…er input (js/polynomial-redos)
This was referenced Jun 12, 2026
askalf
added a commit
that referenced
this pull request
Jun 12, 2026
Same six questions that scored 1/6 under last night's DDG block and 4/6 yesterday morning. With the keyword ladder (#89) and dario 4.8.66 upstream: every run completes, citation support 0.68-1.00, cost $0.04-$0.16/query. The one FAIL (recent) completed with a fully-cited answer but kept 1 source vs the 3-source gate — --since=180d thins the pool by design; gate calibration question, not a product gap.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #86. MediaWiki search matches article titles/text and returns
zero for the planner's long natural-language queries ("nginx
fastcgi_buffer_size upstream sent too big header php-fpm fix"), which
hollowed out wikipedia exactly where it matters most: as the default
fallback when the primary backend is rate-limited. Both stress benches
showed the same two questions dying with the fallback engaged but
empty-handed.
When the verbatim query finds nothing, WikipediaSearch now walks
progressively shorter keyword variants (4 -> 2 -> 1 leading content
tokens, via the new pure src/query-keywords.ts: stopwords + generic
instruction words dropped, technical tokens like HTTP/3, php-fpm,
fastcgi_buffer_size kept intact) until a rung returns results. At most
3 extra calls against a keyless API, only on the would-have-been-empty
path; verbatim hits are untouched (pinned by test). Applies to ALL
wikipedia usage (direct, multi: sub-adapter, fallback) — returning
keyword-reduced results beats returning nothing in every one of those
seats.
Live-validated against real Wikipedia with the exact bench queries
that failed: the nginx error-string query now grounds via the ladder,
and the HTTP/3 question lands on the protocol article.
693 tests (was 682): extractKeywords/keywordLadder (bench queries as
regression anchors), adapter ladder walk/verbatim-hit/all-empty.
Also checked in: the morning healthy-DDG bench scoreboard (4/6 PASS —
the two failures are this exact gap; DDG re-throttled mid-bench and
the un-laddered fallback came back empty).