askalf
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 4 additions & 3 deletions b/‎CLAUDE.md‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎README.md‎
Lines changed: 39 additions & 7 deletions b/‎README.md‎
Lines changed: 39 additions & 7 deletions
diff --git a/‎package.json‎
Lines changed: 1 addition & 1 deletion b/‎package.json‎
Lines changed: 1 addition & 1 deletion
@@ -9,3 +9,4 @@ dist/
 coverage/
 .nyc_output/
 ~/.deepdive/
+e2e-report.md
@@ -6,6 +6,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 
 ## [Unreleased]
 
+## [0.2.0] — 2026-04-22
+
+Iterative research, parallel fetching, on-disk cache, structured output.
+
+### Added
+- `--deep[=N]` iterative research loop. After the first synthesis a "critic" LLM call reviews the draft, names the gaps, and proposes follow-up queries; the loop re-runs up to N more rounds (bare `--deep` defaults to 2). Critic can declare the answer complete to terminate early. `critique()` / `parseCritique()` are exported from `src/plan.ts` for programmatic use.
+- `--concurrency=N` parallel page fetches (default: 4). The agent now uses a worker-pool pattern (`src/concurrency.ts`) instead of a serial loop, cutting fetch phase wall-time roughly N× on a mix of fast and slow pages.
+- Per-URL on-disk cache at `~/.deepdive/cache/` (configurable via `DEEPDIVE_CACHE_DIR`). Atomic `.tmp` + `rename` writes. TTL default 1 hour (`--cache-ttl-ms=<ms>`). Disable with `--no-cache` or `DEEPDIVE_NO_CACHE=1`. Cache hits skip the browser entirely — an all-cached run never launches Chromium.
+- `--json` output mode. Prints `{question, plan, rounds, sources, answer, usage}` instead of markdown — pipeable into `jq` and other tools. `DEEPDIVE_JSON=1` env var also works.
+- `AgentConfig.browserFactory` — optional factory for injecting a mock `BrowserLike` in tests and custom deployments. The default factory returns a real `BrowserSession`, so existing callers see no behavior change.
+- `AgentResult.rounds` — per-round trace with queries, candidates found, fetches, kept-count, and the critic's verdict.
+- `AgentResult.usage.cacheHits` — how many fetches hit the cache this run.
+- New agent events: `round.start`, `critique.start`, `critique.done`. Existing `fetch.start` / `fetch.done` gained a `cached: boolean` field.
+- Tests: cache round-trip + TTL + atomicity, concurrency cap + ordering + abort, parseCritique, new CLI flag coverage, full agent loop integration test with a mock LLM HTTP server, mock search, and mock browser — 42 new assertions (96 total, up from ~53).
+
+### Changed
+- README — new flag table covering `--deep`, `--concurrency`, `--no-cache`, `--json`; examples for deep mode and JSON piping; library-mode snippet updated to wire cache + deepRounds. Also: demoted promotional "Claude Max / Pro" phrasing to "Claude Max" to match dario's README after the 2026-04-21 Anthropic Pro/CC incident.
+- `AgentConfig` is additive — new required fields `deepRounds` and `concurrency` (non-breaking: `resolveConfig` sets sensible defaults so library callers that go through it pick them up for free).
+
 ## [0.1.0] — 2026-04-21
 
 Initial scaffolding. One-shot research agent: plan → search → fetch → extract → synthesize → cited markdown.
 
@@ -7,16 +7,17 @@ This file is the project-local instructions for Claude Code and similar agents w
 A local research agent. CLI entry point `deepdive`. Given a question, it:
 1. Asks an LLM to decompose the question into 3–5 sub-queries.
 2. Runs each sub-query through a pluggable search adapter.
-3. Fetches each result page through a Playwright-driven headless Chromium.
+3. Fetches each result page through a Playwright-driven headless Chromium (parallelized, optionally cached to `~/.deepdive/cache/`).
 4. Extracts main content, caps per-source word count.
 5. Asks an LLM to synthesize a cited markdown answer.
+6. (Optional, `--deep`) A critic LLM reviews the draft, names the gaps, and the loop re-runs until the critic says done or N rounds elapse.
 
 Every LLM call goes to an Anthropic-compat endpoint. Default target is [dario](https://github.com/askalf/dario) at `http://localhost:3456`. Any Anthropic-compat URL works — Anthropic directly, a self-hosted LiteLLM, another proxy.
 
 ## Architecture principles
 
 - **One runtime dependency.** `playwright` is required because we actually need to render JS-heavy pages. Nothing else. No `axios`, `node-fetch`, `readability`, `jsdom`, `chalk`, `yargs`, `zod`. Node built-ins and hand-rolled code.
-- **Pure decision functions.** Anything with logic goes in a module that can be tested without a browser or an LLM: `parsePlan`, `parseArgs`, `resolveConfig`, `parsePositiveInt`, `extractContent`, `dedupeByUrl`, `parseDuckDuckGoHTML`, `buildSourceTable`, `renderAnswerMarkdown`, `buildSourcePacket`.
+- **Pure decision functions.** Anything with logic goes in a module that can be tested without a browser or an LLM: `parsePlan`, `parseCritique`, `parseArgs`, `resolveConfig`, `parsePositiveInt`, `parseNonNegativeInt`, `extractContent`, `dedupeByUrl`, `parseDuckDuckGoHTML`, `buildSourceTable`, `renderAnswerMarkdown`, `buildSourcePacket`, `cacheKey`, `runConcurrent`.
 - **I/O at the edges.** `cli.ts`, `browser.ts`, `llm.ts`, and the individual search adapters touch the network or disk. Everything else is synchronous over strings and objects.
 - **Events, not prints.** The agent emits structured events via `onEvent`; the CLI renders them. Do not `console.log` from inside `src/agent.ts` or any library module.
 - **Hand-rolled regex parsers are fine** for DDG HTML. If the parser breaks, fix the parser. Do not reach for `cheerio`.
@@ -40,7 +41,7 @@ Every LLM call goes to an Anthropic-compat endpoint. Default target is [dario](h
 ## What not to do
 
 - Don't add a web UI. The product is a CLI; a UI is a separate product.
-- Don't add a research loop (read answer, decide if more searches needed) without putting it behind a `--deep` flag. v1 is one-shot on purpose.
+- Don't promote the research loop to the default path — `--deep` is opt-in for a reason (cost / latency predictability).
 - Don't reach for LangChain / LlamaIndex / anything that turns 300 lines into 3,000 and adds 40 deps.
 - Don't bundle SearXNG. Ship it as an adapter that points at a URL the user provides.
 
 
@@ -1,6 +1,6 @@
 <p align="center">
   <h1 align="center">deepdive</h1>
-  <p align="center"><strong>A local research agent. One command, cited answer.</strong><br>Decomposes your question into sub-queries, runs web searches, fetches pages through a real headless browser, and hands everything to an LLM that writes a cited markdown report. Every LLM call goes through your own router — default target is <a href="https://github.com/askalf/dario">dario</a> at <code>localhost:3456</code>, so synthesis runs on your Claude Max / Pro subscription, your own OpenAI key, or any local model. Any Anthropic-compat endpoint works.</p>
+  <p align="center"><strong>A local research agent. One command, cited answer.</strong><br>Decomposes your question into sub-queries, runs web searches, fetches pages through a real headless browser, and hands everything to an LLM that writes a cited markdown report. Every LLM call goes through your own router — default target is <a href="https://github.com/askalf/dario">dario</a> at <code>localhost:3456</code>, so synthesis runs on your Claude Max subscription, your own OpenAI key, or any local model. Any Anthropic-compat endpoint works.</p>
 </p>
 
 <p align="center"><em>Zero hosted dependencies. MIT. Independent, unofficial, third-party — see <a href="DISCLAIMER.md">DISCLAIMER.md</a>.</em></p>
@@ -12,7 +12,7 @@
 ```bash
 # 1. Have dario running (or any Anthropic-compat endpoint at a local URL).
 #    See: https://github.com/askalf/dario
-dario proxy   # http://localhost:3456, routes to Claude Max / OpenAI / etc.
+dario proxy   # http://localhost:3456, routes to Claude Max, OpenAI, etc.
 
 # 2. Install deepdive.
 npm install -g @askalf/deepdive
@@ -31,17 +31,18 @@ deepdive "how does claude's rate limiter work" --verbose --out=rate-limiter.md
 Under the hood:
 1. **Plan.** LLM decomposes your question into 3–5 searchable sub-queries.
 2. **Search.** DuckDuckGo HTML by default (no API key). Pluggable: `--search=searxng|brave|tavily` with your own endpoint or key.
-3. **Fetch.** Playwright-driven Chromium renders each result page (JS-rendered SPAs included).
+3. **Fetch.** Playwright-driven Chromium renders each result page (JS-rendered SPAs included). Parallelized via `--concurrency`, cached to `~/.deepdive/cache/` (1h TTL by default) so re-running is free.
 4. **Extract.** Boilerplate stripped, main content capped to a word budget.
 5. **Synthesize.** LLM writes the answer with inline `[N]` citations referencing the source list.
+6. **Critique (optional, `--deep`).** LLM reviews its own draft, names the gaps, proposes follow-up queries, loop re-runs until the critic says done or `--deep=N` rounds elapse.
 
 ---
 
 ## Why this exists
 
 Every hosted research tool (Perplexity, OpenAI Deep Research, Gemini Deep Research) sends your queries to someone else's server, charges per query, and gives you no say in which model synthesizes the answer or which sources get read. deepdive is the self-hosted alternative: your machine, your LLM subscription, your model choice, your search backend.
 
-Pair it with [dario](https://github.com/askalf/dario) and every research query routes through your Claude Max / Pro subscription instead of per-token API pricing — a single deep query can be 50k–200k tokens, which is exactly the workload subscription billing was built for.
+Pair it with [dario](https://github.com/askalf/dario) and every research query routes through your Claude Max subscription instead of per-token API pricing — a single deep query can be 50k–200k tokens, which is exactly the workload subscription billing was built for.
 
 ---
 
@@ -84,20 +85,47 @@ Run `deepdive --help` for the full list. The ones you'll actually use:
 | `--results-per-query=<n>` | `5` | Candidates pulled per sub-query |
 | `--max-words-per-source=<n>` | `2000` | Per-source content cap before synthesis |
 | `--timeout-ms=<ms>` | `30000` | Per-fetch timeout |
-| `--out=<path>` | — | Also write markdown to file |
+| `--deep[=<n>]` | off (bare `--deep` = 2) | Critic-driven iterative research: after the initial answer, LLM names gaps and proposes follow-up queries for up to N more rounds |
+| `--concurrency=<n>` | `4` | Parallel page fetches |
+| `--no-cache` | — | Skip the on-disk page cache (default: enabled, 1h TTL) |
+| `--cache-ttl-ms=<ms>` | `3600000` | Cache TTL in ms |
+| `--json` | — | Emit structured JSON (question, plan, rounds, sources, answer, usage) instead of markdown |
+| `--out=<path>` | — | Also write output (markdown or JSON) to file |
 | `--verbose`, `-v` | — | Stream progress events to stderr |
 
-All flags mirror to `DEEPDIVE_*` env vars (e.g. `DEEPDIVE_MODEL`, `DEEPDIVE_MAX_SOURCES`). CLI flags win over env vars.
+All flags mirror to `DEEPDIVE_*` env vars (e.g. `DEEPDIVE_MODEL`, `DEEPDIVE_MAX_SOURCES`, `DEEPDIVE_DEEP_ROUNDS`, `DEEPDIVE_CONCURRENCY`). CLI flags win over env vars.
+
+### Example: deep iterative research
+
+```bash
+deepdive "compare bun's TLS ClientHello to node's" --deep=3 --verbose --out=tls.md
+```
+
+With `--deep=3`, after the first synthesis the critic can run up to three more rounds of "look at the draft, find what's missing, search for it, re-synthesize." Each round can add up to `--max-sources` new pages, so plan the cap. The loop stops early as soon as the critic says the draft is complete.
+
+### Example: JSON output for piping into other tools
+
+```bash
+deepdive "latest CVE in openssl" --json | jq '.sources[] | .url'
+```
 
 ---
 
 ## Library mode
 
 ```ts
-import { runAgent, resolveSearchAdapter, resolveConfig } from "@askalf/deepdive";
+import {
+  runAgent,
+  resolveSearchAdapter,
+  resolveConfig,
+  createCache,
+} from "@askalf/deepdive";
 
 const config = resolveConfig({}, process.env);
 const search = await resolveSearchAdapter(config.searchAdapter, process.env);
+const cache = config.cache.enabled
+  ? createCache({ dir: config.cache.dir, ttlMs: config.cache.ttlMs })
+  : undefined;
 
 const result = await runAgent("how does claude's rate limiter work", {
   llm: config.llm,
@@ -106,10 +134,14 @@ const result = await runAgent("how does claude's rate limiter work", {
   resultsPerQuery: config.resultsPerQuery,
   maxSources: config.maxSources,
   maxWordsPerSource: config.maxWordsPerSource,
+  deepRounds: 2,        // iterate up to 2 extra critic-driven rounds
+  concurrency: 4,       // parallel page fetches
+  cache,
   onEvent: (e) => console.error(e),
 });
 
 console.log(result.markdown);
+console.log("rounds:", result.usage.rounds, "sources:", result.usage.kept);
 ```
 
 ---
 
@@ -1,6 +1,6 @@
 {
   "name": "@askalf/deepdive",
-  "version": "0.1.0",
+  "version": "0.2.0",
   "description": "A local research agent. One command, cited answer. Routes every LLM call through your own proxy (dario, Anthropic-compat, OpenAI-compat). Headless browser + pluggable search + multi-provider LLM — zero hosted dependencies.",
   "type": "module",
   "bin": {
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"name": "@askalf/deepdive",`
`3`		`- "version": "0.1.0",`
	`3`	`+ "version": "0.2.0",`
`4`	`4`	`"description": "A local research agent. One command, cited answer. Routes every LLM call through your own proxy (dario, Anthropic-compat, OpenAI-compat). Headless browser + pluggable search + multi-provider LLM — zero hosted dependencies.",`
`5`	`5`	`"type": "module",`
`6`	`6`	`"bin": {`