Skip to content

feat(plan): date-ground the planner and critic prompts#95

Merged
askalf merged 3 commits into
masterfrom
feat/date-grounded-planner
Jun 12, 2026
Merged

feat(plan): date-ground the planner and critic prompts#95
askalf merged 3 commits into
masterfrom
feat/date-grounded-planner

Conversation

@askalf

@askalf askalf commented Jun 12, 2026

Copy link
Copy Markdown
Owner

Why

The planner had no notion of the current date. Recency-sensitive questions got sub-queries anchored to the model's training-time sense of "recent", and with --since set, the post-fetch recency filter then culled most of what those stale queries returned. Bench evidence: the recent golden question (--since=180d) kept 1 source against a 3-source minimum on the v0.23.0 validation board, deterministically.

What

  • plannerSystem() / criticSystem() now carry Today's date: YYYY-MM-DD (injectable PlanContext.now for tests).
  • Event-shaped sub-queries (releases, announcements, news, versions) are told to use absolute dates instead of "latest"/"recent" — search engines match text, not intent.
  • Conceptual/scholarly sub-queries are told to stay timeless. This counterweight rule is load-bearing: the first iteration blanket-anchored everything, and the bench caught the scholarly question's citation support dropping 0.68 → 0.44 (reproduced twice) because bare year tokens distort keyword-matched sources like arXiv/OpenAlex. Scoping the rule restored it.
  • With --since, the prompts disclose the cutoff date and that older sources get dropped, so queries are shaped to the surviving window.
  • agent.ts passes sinceMs at both planQueries/critique call sites.
  • 9 prompt-contract tests in test/plan-prompt.test.mjs.

Bench (before → after, healthy DDG both)

Before = committed v0.23.0 validation; after = bench/results/2026-06-12-v0.25.0-date-grounding.md, run from an isolated worktree at this branch (does NOT include 0.24.0's news fallback — the two changes are complementary and independently measured).

question v0.23.0 this branch
factual-lookup PASS (1.00) PASS (0.96)
technical-deep PASS (0.88) PASS (1.00)
academic PASS (0.68) PASS (0.92)
comparison PASS (1.00) PASS (1.00)
recent FAIL 1/3 sources FAIL 2/3 this run — bistable at the gate: 3/3, 2/3, 4/3, 2/3 across four runs vs the baseline's deterministic 1/3
niche-ops PASS (1.00) PASS (1.00)

Honest read on recent: clearly improved (never 1 source again, 0.75–1.00 citation support) but still flaky against its 3-source minimum on this branch alone. v0.24.0's recency-aware news fallback attacks the same weakness from the source side (its board shows recent 3/3 @ 1.00); merged together they should hold the gate. If recent still flaps on a post-merge board, the next lever is raising the planner's query count for --since runs.

askalf added 3 commits June 12, 2026 10:59
The planner had no notion of the current date, so recency-sensitive
questions got queries anchored to the model's training-time sense of
"recent" - and with --since set, the recency filter then culled most of
what those stale queries returned. Bench evidence: the "recent" golden
question (--since=180d) kept 1 source against a 3-source minimum on the
v0.23.0 validation run.

- plannerSystem()/criticSystem() now carry "Today's date: YYYY-MM-DD"
  (PlanContext.now, injectable for tests).
- Event-shaped sub-queries (releases, announcements, news, versions) are
  told to use absolute dates instead of "latest"/"recent"; conceptual and
  scholarly sub-queries are told to stay timeless - blanket year-anchoring
  measurably hurt the scholarly bench question (citation support
  0.68 -> 0.44, reproduced twice) because bare year tokens distort
  keyword-matched sources like arXiv/OpenAlex.
- When --since is set, the prompts disclose the cutoff date and that older
  sources will be dropped, so queries are shaped to the surviving window.
- agent.ts passes sinceMs at both planQueries/critique call sites.

Spot-checked live: "recent" 1/3 -> 3/3 and 4/3 sources (1.00 citation
support), "academic" recovered to PASS after the rule was scoped. Full
before/after scoreboard to follow once DDG rate-limiting on this IP
cools - the after-run hit HTTP 202 throttling mid-bench.
@askalf askalf merged commit bf1b765 into master Jun 12, 2026
5 checks passed
@askalf askalf deleted the feat/date-grounded-planner branch June 12, 2026 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant