Skip to content

Commit 4a53c7e

Browse files
authored
v0.2.0 — --deep iterative loop, concurrent fetching, on-disk cache, --json output (#1)
Adds the four biggest things v0.1.0 was missing: 1. `--deep[=N]` critic-driven iterative research. After the first synthesis an LLM critic reviews the draft, names the gaps, and proposes follow-up queries; the loop re-runs up to N more rounds. Bare `--deep` = 2 rounds. Critic can declare done early. 2. `--concurrency=N` parallel page fetches (default 4) via a new worker-pool in src/concurrency.ts. Cuts the fetch phase wall-time ~N× on a mixed batch of fast and slow pages. 3. Per-URL on-disk page cache at ~/.deepdive/cache/ (1h default TTL, atomic .tmp + rename writes). Cache hits skip the browser entirely — an all-cached re-run never launches Chromium. Disable with --no-cache. 4. `--json` structured output mode: {question, plan, rounds, sources, answer, usage}. Pipeable into jq and other tools. Bonus surface for library users: AgentConfig.browserFactory for injecting a mock BrowserLike, AgentResult.rounds with per-round traces + the critic's verdict, AgentResult.usage.cacheHits, round.start / critique.start / critique.done events, cached: boolean on fetch events. Tests: 42 new assertions. Cache round-trip + TTL + atomicity (6), concurrency cap + ordering + abort (8), parseCritique (8), new CLI flags (8), new config options (7), full agent integration test with a mock LLM HTTP server + mock search + mock browser covering single-pass, deep loop, early critic termination, cache reuse, maxSources cap, and broken-fetch resilience (6). 96 total, all pass. Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>
1 parent 13f546b commit 4a53c7e

18 files changed

Lines changed: 1435 additions & 100 deletions

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ dist/
99
coverage/
1010
.nyc_output/
1111
~/.deepdive/
12+
e2e-report.md

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,25 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66

77
## [Unreleased]
88

9+
## [0.2.0] — 2026-04-22
10+
11+
Iterative research, parallel fetching, on-disk cache, structured output.
12+
13+
### Added
14+
- `--deep[=N]` iterative research loop. After the first synthesis a "critic" LLM call reviews the draft, names the gaps, and proposes follow-up queries; the loop re-runs up to N more rounds (bare `--deep` defaults to 2). Critic can declare the answer complete to terminate early. `critique()` / `parseCritique()` are exported from `src/plan.ts` for programmatic use.
15+
- `--concurrency=N` parallel page fetches (default: 4). The agent now uses a worker-pool pattern (`src/concurrency.ts`) instead of a serial loop, cutting fetch phase wall-time roughly N× on a mix of fast and slow pages.
16+
- Per-URL on-disk cache at `~/.deepdive/cache/` (configurable via `DEEPDIVE_CACHE_DIR`). Atomic `.tmp` + `rename` writes. TTL default 1 hour (`--cache-ttl-ms=<ms>`). Disable with `--no-cache` or `DEEPDIVE_NO_CACHE=1`. Cache hits skip the browser entirely — an all-cached run never launches Chromium.
17+
- `--json` output mode. Prints `{question, plan, rounds, sources, answer, usage}` instead of markdown — pipeable into `jq` and other tools. `DEEPDIVE_JSON=1` env var also works.
18+
- `AgentConfig.browserFactory` — optional factory for injecting a mock `BrowserLike` in tests and custom deployments. The default factory returns a real `BrowserSession`, so existing callers see no behavior change.
19+
- `AgentResult.rounds` — per-round trace with queries, candidates found, fetches, kept-count, and the critic's verdict.
20+
- `AgentResult.usage.cacheHits` — how many fetches hit the cache this run.
21+
- New agent events: `round.start`, `critique.start`, `critique.done`. Existing `fetch.start` / `fetch.done` gained a `cached: boolean` field.
22+
- Tests: cache round-trip + TTL + atomicity, concurrency cap + ordering + abort, parseCritique, new CLI flag coverage, full agent loop integration test with a mock LLM HTTP server, mock search, and mock browser — 42 new assertions (96 total, up from ~53).
23+
24+
### Changed
25+
- README — new flag table covering `--deep`, `--concurrency`, `--no-cache`, `--json`; examples for deep mode and JSON piping; library-mode snippet updated to wire cache + deepRounds. Also: demoted promotional "Claude Max / Pro" phrasing to "Claude Max" to match dario's README after the 2026-04-21 Anthropic Pro/CC incident.
26+
- `AgentConfig` is additive — new required fields `deepRounds` and `concurrency` (non-breaking: `resolveConfig` sets sensible defaults so library callers that go through it pick them up for free).
27+
928
## [0.1.0] — 2026-04-21
1029

1130
Initial scaffolding. One-shot research agent: plan → search → fetch → extract → synthesize → cited markdown.

CLAUDE.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,17 @@ This file is the project-local instructions for Claude Code and similar agents w
77
A local research agent. CLI entry point `deepdive`. Given a question, it:
88
1. Asks an LLM to decompose the question into 3–5 sub-queries.
99
2. Runs each sub-query through a pluggable search adapter.
10-
3. Fetches each result page through a Playwright-driven headless Chromium.
10+
3. Fetches each result page through a Playwright-driven headless Chromium (parallelized, optionally cached to `~/.deepdive/cache/`).
1111
4. Extracts main content, caps per-source word count.
1212
5. Asks an LLM to synthesize a cited markdown answer.
13+
6. (Optional, `--deep`) A critic LLM reviews the draft, names the gaps, and the loop re-runs until the critic says done or N rounds elapse.
1314

1415
Every LLM call goes to an Anthropic-compat endpoint. Default target is [dario](https://github.com/askalf/dario) at `http://localhost:3456`. Any Anthropic-compat URL works — Anthropic directly, a self-hosted LiteLLM, another proxy.
1516

1617
## Architecture principles
1718

1819
- **One runtime dependency.** `playwright` is required because we actually need to render JS-heavy pages. Nothing else. No `axios`, `node-fetch`, `readability`, `jsdom`, `chalk`, `yargs`, `zod`. Node built-ins and hand-rolled code.
19-
- **Pure decision functions.** Anything with logic goes in a module that can be tested without a browser or an LLM: `parsePlan`, `parseArgs`, `resolveConfig`, `parsePositiveInt`, `extractContent`, `dedupeByUrl`, `parseDuckDuckGoHTML`, `buildSourceTable`, `renderAnswerMarkdown`, `buildSourcePacket`.
20+
- **Pure decision functions.** Anything with logic goes in a module that can be tested without a browser or an LLM: `parsePlan`, `parseCritique`, `parseArgs`, `resolveConfig`, `parsePositiveInt`, `parseNonNegativeInt`, `extractContent`, `dedupeByUrl`, `parseDuckDuckGoHTML`, `buildSourceTable`, `renderAnswerMarkdown`, `buildSourcePacket`, `cacheKey`, `runConcurrent`.
2021
- **I/O at the edges.** `cli.ts`, `browser.ts`, `llm.ts`, and the individual search adapters touch the network or disk. Everything else is synchronous over strings and objects.
2122
- **Events, not prints.** The agent emits structured events via `onEvent`; the CLI renders them. Do not `console.log` from inside `src/agent.ts` or any library module.
2223
- **Hand-rolled regex parsers are fine** for DDG HTML. If the parser breaks, fix the parser. Do not reach for `cheerio`.
@@ -40,7 +41,7 @@ Every LLM call goes to an Anthropic-compat endpoint. Default target is [dario](h
4041
## What not to do
4142

4243
- Don't add a web UI. The product is a CLI; a UI is a separate product.
43-
- Don't add a research loop (read answer, decide if more searches needed) without putting it behind a `--deep` flag. v1 is one-shot on purpose.
44+
- Don't promote the research loop to the default path — `--deep` is opt-in for a reason (cost / latency predictability).
4445
- Don't reach for LangChain / LlamaIndex / anything that turns 300 lines into 3,000 and adds 40 deps.
4546
- Don't bundle SearXNG. Ship it as an adapter that points at a URL the user provides.
4647

README.md

Lines changed: 39 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<p align="center">
22
<h1 align="center">deepdive</h1>
3-
<p align="center"><strong>A local research agent. One command, cited answer.</strong><br>Decomposes your question into sub-queries, runs web searches, fetches pages through a real headless browser, and hands everything to an LLM that writes a cited markdown report. Every LLM call goes through your own router — default target is <a href="https://github.com/askalf/dario">dario</a> at <code>localhost:3456</code>, so synthesis runs on your Claude Max / Pro subscription, your own OpenAI key, or any local model. Any Anthropic-compat endpoint works.</p>
3+
<p align="center"><strong>A local research agent. One command, cited answer.</strong><br>Decomposes your question into sub-queries, runs web searches, fetches pages through a real headless browser, and hands everything to an LLM that writes a cited markdown report. Every LLM call goes through your own router — default target is <a href="https://github.com/askalf/dario">dario</a> at <code>localhost:3456</code>, so synthesis runs on your Claude Max subscription, your own OpenAI key, or any local model. Any Anthropic-compat endpoint works.</p>
44
</p>
55

66
<p align="center"><em>Zero hosted dependencies. MIT. Independent, unofficial, third-party — see <a href="DISCLAIMER.md">DISCLAIMER.md</a>.</em></p>
@@ -12,7 +12,7 @@
1212
```bash
1313
# 1. Have dario running (or any Anthropic-compat endpoint at a local URL).
1414
# See: https://github.com/askalf/dario
15-
dario proxy # http://localhost:3456, routes to Claude Max / OpenAI / etc.
15+
dario proxy # http://localhost:3456, routes to Claude Max, OpenAI, etc.
1616

1717
# 2. Install deepdive.
1818
npm install -g @askalf/deepdive
@@ -31,17 +31,18 @@ deepdive "how does claude's rate limiter work" --verbose --out=rate-limiter.md
3131
Under the hood:
3232
1. **Plan.** LLM decomposes your question into 3–5 searchable sub-queries.
3333
2. **Search.** DuckDuckGo HTML by default (no API key). Pluggable: `--search=searxng|brave|tavily` with your own endpoint or key.
34-
3. **Fetch.** Playwright-driven Chromium renders each result page (JS-rendered SPAs included).
34+
3. **Fetch.** Playwright-driven Chromium renders each result page (JS-rendered SPAs included). Parallelized via `--concurrency`, cached to `~/.deepdive/cache/` (1h TTL by default) so re-running is free.
3535
4. **Extract.** Boilerplate stripped, main content capped to a word budget.
3636
5. **Synthesize.** LLM writes the answer with inline `[N]` citations referencing the source list.
37+
6. **Critique (optional, `--deep`).** LLM reviews its own draft, names the gaps, proposes follow-up queries, loop re-runs until the critic says done or `--deep=N` rounds elapse.
3738

3839
---
3940

4041
## Why this exists
4142

4243
Every hosted research tool (Perplexity, OpenAI Deep Research, Gemini Deep Research) sends your queries to someone else's server, charges per query, and gives you no say in which model synthesizes the answer or which sources get read. deepdive is the self-hosted alternative: your machine, your LLM subscription, your model choice, your search backend.
4344

44-
Pair it with [dario](https://github.com/askalf/dario) and every research query routes through your Claude Max / Pro subscription instead of per-token API pricing — a single deep query can be 50k–200k tokens, which is exactly the workload subscription billing was built for.
45+
Pair it with [dario](https://github.com/askalf/dario) and every research query routes through your Claude Max subscription instead of per-token API pricing — a single deep query can be 50k–200k tokens, which is exactly the workload subscription billing was built for.
4546

4647
---
4748

@@ -84,20 +85,47 @@ Run `deepdive --help` for the full list. The ones you'll actually use:
8485
| `--results-per-query=<n>` | `5` | Candidates pulled per sub-query |
8586
| `--max-words-per-source=<n>` | `2000` | Per-source content cap before synthesis |
8687
| `--timeout-ms=<ms>` | `30000` | Per-fetch timeout |
87-
| `--out=<path>` || Also write markdown to file |
88+
| `--deep[=<n>]` | off (bare `--deep` = 2) | Critic-driven iterative research: after the initial answer, LLM names gaps and proposes follow-up queries for up to N more rounds |
89+
| `--concurrency=<n>` | `4` | Parallel page fetches |
90+
| `--no-cache` || Skip the on-disk page cache (default: enabled, 1h TTL) |
91+
| `--cache-ttl-ms=<ms>` | `3600000` | Cache TTL in ms |
92+
| `--json` || Emit structured JSON (question, plan, rounds, sources, answer, usage) instead of markdown |
93+
| `--out=<path>` || Also write output (markdown or JSON) to file |
8894
| `--verbose`, `-v` || Stream progress events to stderr |
8995

90-
All flags mirror to `DEEPDIVE_*` env vars (e.g. `DEEPDIVE_MODEL`, `DEEPDIVE_MAX_SOURCES`). CLI flags win over env vars.
96+
All flags mirror to `DEEPDIVE_*` env vars (e.g. `DEEPDIVE_MODEL`, `DEEPDIVE_MAX_SOURCES`, `DEEPDIVE_DEEP_ROUNDS`, `DEEPDIVE_CONCURRENCY`). CLI flags win over env vars.
97+
98+
### Example: deep iterative research
99+
100+
```bash
101+
deepdive "compare bun's TLS ClientHello to node's" --deep=3 --verbose --out=tls.md
102+
```
103+
104+
With `--deep=3`, after the first synthesis the critic can run up to three more rounds of "look at the draft, find what's missing, search for it, re-synthesize." Each round can add up to `--max-sources` new pages, so plan the cap. The loop stops early as soon as the critic says the draft is complete.
105+
106+
### Example: JSON output for piping into other tools
107+
108+
```bash
109+
deepdive "latest CVE in openssl" --json | jq '.sources[] | .url'
110+
```
91111

92112
---
93113

94114
## Library mode
95115

96116
```ts
97-
import { runAgent, resolveSearchAdapter, resolveConfig } from "@askalf/deepdive";
117+
import {
118+
runAgent,
119+
resolveSearchAdapter,
120+
resolveConfig,
121+
createCache,
122+
} from "@askalf/deepdive";
98123

99124
const config = resolveConfig({}, process.env);
100125
const search = await resolveSearchAdapter(config.searchAdapter, process.env);
126+
const cache = config.cache.enabled
127+
? createCache({ dir: config.cache.dir, ttlMs: config.cache.ttlMs })
128+
: undefined;
101129

102130
const result = await runAgent("how does claude's rate limiter work", {
103131
llm: config.llm,
@@ -106,10 +134,14 @@ const result = await runAgent("how does claude's rate limiter work", {
106134
resultsPerQuery: config.resultsPerQuery,
107135
maxSources: config.maxSources,
108136
maxWordsPerSource: config.maxWordsPerSource,
137+
deepRounds: 2, // iterate up to 2 extra critic-driven rounds
138+
concurrency: 4, // parallel page fetches
139+
cache,
109140
onEvent: (e) => console.error(e),
110141
});
111142

112143
console.log(result.markdown);
144+
console.log("rounds:", result.usage.rounds, "sources:", result.usage.kept);
113145
```
114146

115147
---

package.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"name": "@askalf/deepdive",
3-
"version": "0.1.0",
3+
"version": "0.2.0",
44
"description": "A local research agent. One command, cited answer. Routes every LLM call through your own proxy (dario, Anthropic-compat, OpenAI-compat). Headless browser + pluggable search + multi-provider LLM — zero hosted dependencies.",
55
"type": "module",
66
"bin": {

0 commit comments

Comments
 (0)