Skip to content

feat(search): news adapter (keyless Bing News RSS) + recency-aware fallback default#92

Merged
askalf merged 2 commits into
masterfrom
feat/news-adapter
Jun 12, 2026
Merged

feat(search): news adapter (keyless Bing News RSS) + recency-aware fallback default#92
askalf merged 2 commits into
masterfrom
feat/news-adapter

Conversation

@askalf

@askalf askalf commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Why

The v0.23.0 validation bench scored 5/6. The one failure was the recent golden question: with DDG throttled, the wikipedia fallback returned undated/stale encyclopedia pages that the --since=180d freshness filter culled — the run completed with 1/3 sources. Recency-sensitive questions are exactly the lane hosted competitors are strongest in, and deepdive had no keyless path to dated, fresh sources.

What

  • --search=news — new keyless adapter over the Bing News RSS endpoint. Result links arrive wrapped in a bing.com/news/apiclick.aspx redirect; the adapter unwraps the url param to the real publisher URL (rejecting non-http(s) targets), so deepdive fetches and cites publishers directly. Each snippet is prefixed with the article's YYYY-MM-DD pubDate. 403/429 → SearchRateLimitError (benching + degradation visibility work as for other adapters). Hand-rolled bounded-regex RSS parsing, same stance as the arXiv/DDG adapters; zero new dependencies.
  • Recency-aware fallback default — when --since is set and no explicit --search-fallback/env is given, the default fallback becomes news,wikipedia instead of wikipedia. Explicit flag/env still wins; =none still disables.
  • Help text + README adapter table + fallback docs updated; 18 new tests (711 pass).

Bench evidence (with a disclosure)

recent passes twice with the change: single-question run 6/3 sources at 1.00 citation support ($0.090, 67s), then 5/3 at 0.80 in the full board. Full scoreboard is 6/6 — first perfect board (trajectory: 1/6 → 4/6 → 5/6 → 6/6). Committed as bench/results/2026-06-12-v0.24.0-news-fallback.md.

Disclosure: a concurrent session was editing the planner prompts in the same checkout (now feat/date-grounded-planner), so the bench dist/ contained both changes. They are mechanically disjoint (fallback adapter selection vs. prompt wording) and the news fallback is the direct mechanism behind recent's dated sources, but the 6/6 cannot be attributed to either change alone — the scoreboard file says so, and the board should be re-run at the next release tag.

askalf added 2 commits June 12, 2026 11:02
…allback default

The v0.23.0 bench's one failure was the `recent` question: DDG throttled,
and the wikipedia fallback's undated/stale pages were culled by
--since=180d, leaving 1/3 sources. Recency was the gap.

- New `news` adapter: Bing News RSS (no key). Unwraps the apiclick
  redirect to the real publisher URL (rejecting non-http(s) targets),
  decodes entity-encoded description markup before stripping it, and
  prefixes each snippet with the article's YYYY-MM-DD pubDate so the
  planner/synthesizer can see recency. 403/429 classify as
  SearchRateLimitError. Hand-rolled bounded-regex parsing, same stance
  as the arXiv/DDG adapters; no new dependencies.
- Recency-aware fallback default: with --since set (and no explicit
  --search-fallback/env), the default becomes news,wikipedia so a
  throttled primary degrades into dated, fresh sources. Explicit
  flag/env still wins; =none still disables.
- Bench: recent now passes twice (6/3 sources 1.00 support, then 5/3
  at 0.80 in the full board). Full scoreboard 6/6 — first perfect run;
  committed in bench/results/2026-06-12-v0.24.0-news-fallback.md.
@askalf askalf merged commit d6074c6 into master Jun 12, 2026
5 checks passed
@askalf askalf deleted the feat/news-adapter branch June 12, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant