fix: synthesizer closes broad-scope subgoals too aggressively (terminates overnight runs early)

## Summary
The synthesizer is closing subgoals as `confirmed` too aggressively on broad-scope goals. Project 2025 overnight test (2026-05-07): synthesizer marked **all 4 subgoals confirmed after 45 tasks**, terminating what was supposed to be an all-night run after only 13 minutes.

This is the calibration follow-up to #119: subgoal-done tracking is *wired* correctly, but the prompt threshold for declaring `confirmed` is too generous when the goal is `scope_class: broad` or `comprehensive`.

## Reproducer

```bash
uv run research start --skip-intake --local \
  --goal "Project 2025 implementation tracker: identify which specific policy proposals from the Heritage Foundation's Project 2025 document have been adopted, attempted, withdrawn, or remain pending under the current Trump administration. Organize by federal department (DOJ, DOI, EPA, DHS, State, etc.). For each tracked proposal, surface news coverage, public statements, and any pushback or legal challenges. Prioritize primary sources and date-stamp every finding." \
  --max-tasks 1000 --time-cap 10
```

Observed:
- Initial plan: scope_class=broad, 4 subgoals, 16 search tasks ✓
- 45 tasks executed, 1 drain_replan fired ✓
- Synthesis pass: `closed=[1,2,3,4]` — gemma confidently declared every subgoal done after seeing partial evidence on each.
- Loop terminated. Run ended in ~13 minutes.

The closed subgoals were:
1. "Identify core policy pillars and specific proposals from the Project 2025 document"
2. "Map identified policies to their respective federal departments and check for implementation status"
3. "Collect evidence of legal challenges, news coverage, and official public statements"
4. (a fourth one in the same vein)

For a 920-page policy document with proposals across 20+ federal departments, **none of these should be closeable after 45 tasks of crawling**.

## Root cause

`prompts/synthesizer.md` (the v5 trailing-JSON contract from #119) instructs:
- `confirmed` — the findings affirmatively answer the subgoal. Closes it.
- `inconclusive` — findings are insufficient, contradictory, or absent.

The prompt does NOT differentiate by scope. Gemma — which favors decisiveness — defaults to `confirmed` whenever it can write *any* affirmative answer, regardless of completeness. For a `narrow` scope this is fine; for `broad`/`comprehensive` it terminates the run prematurely.

## Acceptance Criteria

- [ ] `prompts/synthesizer.md` adds a scope-aware closure rule: when the plan's `scope_class` is `broad` or `comprehensive`, default subgoal status to `inconclusive` unless ALL of:
  - At least 5 distinct sources cited per subgoal in the corpus collected so far, AND
  - Synthesizer can articulate at least 2 specific examples per subgoal that resolve the question, AND
  - Findings span at least 3 distinct domains/entities the subgoal references (e.g., for "policies across federal departments", findings must touch ≥3 departments)
- [ ] The synthesis context includes the plan's scope_class so the synthesizer knows which threshold to apply.
- [ ] Add a unit test: feed a synthesized 45-task `broad`-scope corpus into the synthesizer and assert `subgoal_status` map values are predominantly `inconclusive`, not `confirmed`.
- [ ] After fix: re-run the Project 2025 goal locally; expect ≥1 subgoal to remain `inconclusive` after the first synthesis, driving drain-replan to fire ≥3 times instead of stopping after 1.

## Files

- `src/research_agent/prompts/synthesizer.md` (the closure-status rules)
- `src/research_agent/orchestrator/synth.py` (pass scope_class into the synthesis context)
- `tests/test_orchestrator_synth.py` (new test for the scope-aware threshold)

## Why this matters

Without this fix, **every overnight run on a broad goal terminates early** — exactly the problem #117/#118/#119 were supposed to solve. The architecture works; the prompt-level calibration is the last gate to "actually runs all night."


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: synthesizer closes broad-scope subgoals too aggressively (terminates overnight runs early) #159

Summary

Reproducer

Root cause

Acceptance Criteria

Files

Why this matters

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fix: synthesizer closes broad-scope subgoals too aggressively (terminates overnight runs early) #159

Description

Summary

Reproducer

Root cause

Acceptance Criteria

Files

Why this matters

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions