v0.2.0 — --deep loop, concurrent fetch, on-disk cache, --json output by askalf · Pull Request #1 · askalf/deepdive

askalf · 2026-04-23T00:16:25Z

Summary

Biggest v0.1.0 gaps, closed:

--deep[=N] critic-driven iterative research — after the first synthesis, an LLM critic reviews the draft, names the gaps, and the loop re-runs up to N more rounds. Bare --deep = 2. Critic can say done early.
--concurrency=N parallel fetches (default 4) via new src/concurrency.ts worker-pool. ~Nx wall-time reduction on the fetch phase.
Per-URL on-disk cache at `~/.deepdive/cache/` (1h TTL, atomic writes). Cache-hit-only runs never launch Chromium. `--no-cache` to disable.
`--json` output mode — `{question, plan, rounds, sources, answer, usage}`. Pipeable.

New surface (library users)

`AgentConfig.browserFactory` — inject a mock `BrowserLike` in tests / custom deployments
`AgentResult.rounds` — per-round trace with queries, candidates, fetches, kept count, critic verdict
`AgentResult.usage.cacheHits`
`AgentConfig.cache` (optional)
`AgentConfig.deepRounds`, `AgentConfig.concurrency`
New events: `round.start`, `critique.start`, `critique.done`; `fetch.start`/`fetch.done` gained `cached: boolean`

Not yet

PDF extraction — deferred to v0.3.0 (deserves its own scope)
Streaming output during synthesis — also v0.3.0 material

Test plan

`npm run build` — clean under `strict: true`
`npm test` — 96 pass, 0 fail (up from ~53)
42 new assertions: cache round-trip/TTL/atomicity (6), concurrency cap/order/abort (8), parseCritique (8), new CLI flags (8), new config options (7), agent integration test (6 — single-pass, deep loop, early-done, cache reuse, maxSources cap, broken-fetch resilience)
Manual verification: mock LLM HTTP server + mock search + mock browser run the full agent pipeline through the deep loop in ~50ms

Risk

`AgentConfig` gained required `deepRounds` and `concurrency` fields. Anyone constructing the config manually in TS will get a type error. Anyone going through `resolveConfig` picks up defaults for free. `CHANGELOG.md` calls this out.

…-json output Adds the four biggest things v0.1.0 was missing: 1. `--deep[=N]` critic-driven iterative research. After the first synthesis an LLM critic reviews the draft, names the gaps, and proposes follow-up queries; the loop re-runs up to N more rounds. Bare `--deep` = 2 rounds. Critic can declare done early. 2. `--concurrency=N` parallel page fetches (default 4) via a new worker-pool in src/concurrency.ts. Cuts the fetch phase wall-time ~N× on a mixed batch of fast and slow pages. 3. Per-URL on-disk page cache at ~/.deepdive/cache/ (1h default TTL, atomic .tmp + rename writes). Cache hits skip the browser entirely — an all-cached re-run never launches Chromium. Disable with --no-cache. 4. `--json` structured output mode: {question, plan, rounds, sources, answer, usage}. Pipeable into jq and other tools. Bonus surface for library users: AgentConfig.browserFactory for injecting a mock BrowserLike, AgentResult.rounds with per-round traces + the critic's verdict, AgentResult.usage.cacheHits, round.start / critique.start / critique.done events, cached: boolean on fetch events. Tests: 42 new assertions. Cache round-trip + TTL + atomicity (6), concurrency cap + ordering + abort (8), parseCritique (8), new CLI flags (8), new config options (7), full agent integration test with a mock LLM HTTP server + mock search + mock browser covering single-pass, deep loop, early critic termination, cache reuse, maxSources cap, and broken-fetch resilience (6). 96 total, all pass.

Clean sweep of the security scan. Each fix is defensive — none of these were exploitable in deepdive's context (search snippets aren't HTML-rendered; URLs are not attacker-controlled sizes) — but silencing CodeQL matters for the paste-able health signal, and the new implementations are genuinely better. Alerts fixed: - #4, #5, #6, #7 js/polynomial-redos — `/\/+$/` and `/#.*$/` end-anchored patterns replaced with non-regex string walks in a new src/url-util.ts (trimTrailingSlashes, stripHashFragment, dedupeKey). Provably linear-time, with a pathological-input test that asserts sub-100ms runtime on 100k trailing slashes. - #3 js/incomplete-url-substring-sanitization — in the DDG redirect unwrap, `u.hostname.endsWith("duckduckgo.com")` would also match `evil-duckduckgo.com`. Tightened to `hostname === "duckduckgo.com" || hostname.endsWith(".duckduckgo.com")`. - #2 js/double-escaping — decodeHtmlEntities was chaining 8 sequential `.replace()` calls, so `&#39;` would double-decode: first pass produces `'`, second pass expands to `'`. Replaced with a single-pass tokenizer that resolves each `&...;` exactly once, so `&#39;` now correctly stays as the literal `'`. Also added support for `'` and `&#xHEX;` forms while I was in there. - #1 js/incomplete-multi-character-sanitization — stripTags matched `/<[^>]+>/g`, which leaves malformed partials like `<scrip` (no closing `>`) in the output. Now does a two-step strip: well-formed tags replaced with a space, then any remaining `<` is also replaced with a space, so no tag-opener character can ever leak downstream. Test coverage added: - test/url-util.test.mjs — 10 assertions including a linear-time assert - test/html-sanitize.test.mjs — 10 assertions; the `<scrip` partial and the double-decode regression are both pinned. 116 tests total, all pass (up from 96). Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>

…never die (#84) The v0.21.0 baseline bench (bench/results/, committed here) made the case: with DuckDuckGo rate-limiting the box's IP, 5/6 default-config questions died with zero sources (exit 3) while the one multi-backend question passed with 12 sources and 0.86 citation support. A default run dying on a single backend's throttle is the #1 reliability gap vs hosted competitors, who never return nothing. - searchFallback now DEFAULTS to "wikipedia" — keyless, reliable, and never shares the primary's failure mode. It only engages when a round's primary searches produced ZERO candidates, so healthy runs are byte-identical to before. "none"/"off" (or blank) disables. - The fallback engaging changes where the answer's sources come from, so the notice now prints to stderr even WITHOUT --verbose — a degraded run can never be mistaken for a normal one. Live proof under the ongoing DDG rate limit: the comparison question that exited 3 minutes earlier now completes via the fallback — 3 sources, 9/9 citations supported, $0.044, with the stderr notice. 682 tests: default resolution, none/off/blank disable, overrides.

askalf merged commit 4a53c7e into master Apr 23, 2026
3 checks passed

askalf deleted the v0.2.0-deep-concurrent-cache branch April 23, 2026 00:47

askalf mentioned this pull request Apr 23, 2026

fix: address 7 CodeQL high-severity alerts #3

Merged

4 tasks

askalf mentioned this pull request Jun 12, 2026

feat(search): default --search-fallback=wikipedia — degrade visibly, never die #84

Merged

askalf mentioned this pull request Jun 15, 2026

synth prompt tuning PARKED: bench can't resolve prompt deltas while factual-lookup hedges nondeterministically (stock 1/3) #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.2.0 — --deep loop, concurrent fetch, on-disk cache, --json output#1

v0.2.0 — --deep loop, concurrent fetch, on-disk cache, --json output#1
askalf merged 1 commit into
masterfrom
v0.2.0-deep-concurrent-cache

askalf commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

askalf commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New surface (library users)

Not yet

Test plan

Risk

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

askalf commented Apr 23, 2026 •

edited

Loading