feat(wikipedia): keyword ladder — retry zero-result queries with key terms by askalf · Pull Request #89 · askalf/deepdive

askalf · 2026-06-12T10:24:10Z

Closes #86. MediaWiki search matches article titles/text and returns
zero for the planner's long natural-language queries ("nginx
fastcgi_buffer_size upstream sent too big header php-fpm fix"), which
hollowed out wikipedia exactly where it matters most: as the default
fallback when the primary backend is rate-limited. Both stress benches
showed the same two questions dying with the fallback engaged but
empty-handed.

When the verbatim query finds nothing, WikipediaSearch now walks
progressively shorter keyword variants (4 -> 2 -> 1 leading content
tokens, via the new pure src/query-keywords.ts: stopwords + generic
instruction words dropped, technical tokens like HTTP/3, php-fpm,
fastcgi_buffer_size kept intact) until a rung returns results. At most
3 extra calls against a keyless API, only on the would-have-been-empty
path; verbatim hits are untouched (pinned by test). Applies to ALL
wikipedia usage (direct, multi: sub-adapter, fallback) — returning
keyword-reduced results beats returning nothing in every one of those
seats.

Live-validated against real Wikipedia with the exact bench queries
that failed: the nginx error-string query now grounds via the ladder,
and the HTTP/3 question lands on the protocol article.

693 tests (was 682): extractKeywords/keywordLadder (bench queries as
regression anchors), adapter ladder walk/verbatim-hit/all-empty.
Also checked in: the morning healthy-DDG bench scoreboard (4/6 PASS —
the two failures are this exact gap; DDG re-throttled mid-bench and
the un-laddered fallback came back empty).

…terms Closes #86. MediaWiki search matches article titles/text and returns zero for the planner's long natural-language queries ("nginx fastcgi_buffer_size upstream sent too big header php-fpm fix"), which hollowed out wikipedia exactly where it matters most: as the default fallback when the primary backend is rate-limited. Both stress benches showed the same two questions dying with the fallback engaged but empty-handed. When the verbatim query finds nothing, WikipediaSearch now walks progressively shorter keyword variants (4 -> 2 -> 1 leading content tokens, via the new pure src/query-keywords.ts: stopwords + generic instruction words dropped, technical tokens like HTTP/3, php-fpm, fastcgi_buffer_size kept intact) until a rung returns results. At most 3 extra calls against a keyless API, only on the would-have-been-empty path; verbatim hits are untouched (pinned by test). Applies to ALL wikipedia usage (direct, multi: sub-adapter, fallback) — returning keyword-reduced results beats returning nothing in every one of those seats. Live-validated against real Wikipedia with the exact bench queries that failed: the nginx error-string query now grounds via the ladder, and the HTTP/3 question lands on the protocol article. 693 tests (was 682): extractKeywords/keywordLadder (bench queries as regression anchors), adapter ladder walk/verbatim-hit/all-empty. Also checked in: the morning healthy-DDG bench scoreboard (4/6 PASS — the two failures are this exact gap; DDG re-throttled mid-bench and the un-laddered fallback came back empty).

…er input (js/polynomial-redos)

Same six questions that scored 1/6 under last night's DDG block and 4/6 yesterday morning. With the keyword ladder (#89) and dario 4.8.66 upstream: every run completes, citation support 0.68-1.00, cost $0.04-$0.16/query. The one FAIL (recent) completed with a fully-cited answer but kept 1 source vs the 3-source gate — --since=180d thins the pool by design; gate calibration question, not a product gap.

github-advanced-security AI found potential problems Jun 12, 2026

View reviewed changes

Comment thread src/query-keywords.ts Fixed

askalf mentioned this pull request Jun 12, 2026

Fallback pass should simplify queries for encyclopedia-style backends #86

Closed

fix(query-keywords): linear punctuation trim — no quantified regex ov…

e08166b

…er input (js/polynomial-redos)

askalf merged commit 3dfffa0 into master Jun 12, 2026
5 checks passed

askalf deleted the feat/wikipedia-keyword-ladder branch June 12, 2026 10:28

This was referenced Jun 12, 2026

release: 0.23.0 — Wikipedia keyword ladder #90

Merged

bench: v0.23.0 validation — 5/6 PASS, 6/6 completed #91

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(wikipedia): keyword ladder — retry zero-result queries with key terms#89

feat(wikipedia): keyword ladder — retry zero-result queries with key terms#89
askalf merged 2 commits into
masterfrom
feat/wikipedia-keyword-ladder

askalf commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

askalf commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants