docs: rewrite README — sovereignty + cost-arbitrage angle#2
Merged
Conversation
The old README opened with the mechanical pipeline (plan → search → fetch
→ extract → synthesize) and treated --deep and the dario pairing as
footnotes. Neither the headline feature of v0.2.0 nor the reason deepdive
exists at all were above the fold.
The rewrite:
- Opens with an outcome statement ("open-source Perplexity for your own
machine, running on your own Claude Max subscription") and names the
specific thing the critic loop gives you that hosted tools cap away.
- Adds a "The point" section that frames deepdive as the consumer
counterpart to dario's routing layer — same product, two pieces. The
deep-research workload is 50k-200k tokens per query, which is the exact
shape Claude Max was priced for and the exact shape per-token API
pricing makes painful.
- Adds a comparison table vs Perplexity / OpenAI Deep Research / Gemini
Deep Research across seven axes (who runs the loop, model choice,
search backend, who sees queries, depth cap, billing path, source).
- Adds a "--deep loop" section with an ASCII diagram and a concrete
cost breakdown so the feature is load-bearing rather than a caveat.
- Adds a realistic sample output so the thing feels real.
- Cuts the pipeline-description and architecture-principles sections
(moved to CLAUDE.md in earlier commits).
- Trims the flag table to "common flags" with a one-line "why" per flag
and defers exhaustive docs to --help.
…ming Revised angle: frame deepdive around what a hosted research tool takes from you and gives in exchange. Two pillars: 1. **Sovereignty.** Hero paragraph starts with "Your machine. Your LLM subscription. Your search backend. Your cited report." — four "your" things that all hosted tools quietly take away. "What you keep" section expands each one. Trust-and-transparency table gains an lsof -i note for verifiable claims. 2. **Cost-arbitrage.** "What you stop paying for" section with an explicit per-query / per-month cost table comparing: per-token Opus, per-token Sonnet, Perplexity Pro, OpenAI Deep Research (ChatGPT Plus), Gemini Deep Research (AI Advanced), and deepdive + dario + Claude Max. The argument — deep research is exactly the workload Max was priced for, so layering a hosted tool on top is paying twice for LLM calls you already bought — is stated directly. --deep section explains why the critic loop is the axis hosted tools cap on (unit economics) and why on your own subscription the only cap is the one you set. That reframes --deep as the load-bearing feature instead of a flag in a table. Dropped the old "vs Perplexity/OpenAI DR/Gemini DR" 7-column comparison table — it was doing the same work the two pillar sections now do, with less focus. Kept the sample output, the 60-second quickstart, the common-flags table, the search adapter matrix, and the caching note.
4 tasks
askalf
added a commit
that referenced
this pull request
Apr 23, 2026
Clean sweep of the security scan. Each fix is defensive — none of these were exploitable in deepdive's context (search snippets aren't HTML-rendered; URLs are not attacker-controlled sizes) — but silencing CodeQL matters for the paste-able health signal, and the new implementations are genuinely better. Alerts fixed: - #4, #5, #6, #7 js/polynomial-redos — `/\/+$/` and `/#.*$/` end-anchored patterns replaced with non-regex string walks in a new src/url-util.ts (trimTrailingSlashes, stripHashFragment, dedupeKey). Provably linear-time, with a pathological-input test that asserts sub-100ms runtime on 100k trailing slashes. - #3 js/incomplete-url-substring-sanitization — in the DDG redirect unwrap, `u.hostname.endsWith("duckduckgo.com")` would also match `evil-duckduckgo.com`. Tightened to `hostname === "duckduckgo.com" || hostname.endsWith(".duckduckgo.com")`. - #2 js/double-escaping — decodeHtmlEntities was chaining 8 sequential `.replace()` calls, so `&#39;` would double-decode: first pass produces `'`, second pass expands to `'`. Replaced with a single-pass tokenizer that resolves each `&...;` exactly once, so `&#39;` now correctly stays as the literal `'`. Also added support for `'` and `&#xHEX;` forms while I was in there. - #1 js/incomplete-multi-character-sanitization — stripTags matched `/<[^>]+>/g`, which leaves malformed partials like `<scrip` (no closing `>`) in the output. Now does a two-step strip: well-formed tags replaced with a space, then any remaining `<` is also replaced with a space, so no tag-opener character can ever leak downstream. Test coverage added: - test/url-util.test.mjs — 10 assertions including a linear-time assert - test/html-sanitize.test.mjs — 10 assertions; the `<scrip` partial and the double-decode regression are both pinned. 116 tests total, all pass (up from 96). Co-authored-by: askalf <263217947+askalf@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The v0.1.0 README read like a CLI manual. `--deep` (the v0.2.0 headline feature) and the dario/Max economics (the reason this exists as a package and not a shell script) were footnotes.
Revised angle
Two pillars, both above the fold:
1. Sovereignty — "What you keep"
Hero paragraph opens with "Your machine. Your LLM subscription. Your search backend. Your cited report." Expands into four things every hosted research tool takes away:
2. Cost-arbitrage — "What you stop paying for"
Explicit per-query / per-month cost table:
Argument: deep-research workload is exactly the token shape Max was priced for. Paying a hosted tool on top is paying twice for LLM calls you already bought.
3. `--deep` reframed
New paragraph in the loop section: "the critic loop is the axis hosted tools cap on, per-query unit economics force them to ship fixed depth, on your own subscription the only cap is the one you set." Makes `--deep` load-bearing instead of just a flag.
Dropped
The old 7-column comparison table (Perplexity / OpenAI DR / Gemini DR vs deepdive across model choice, search backend, etc.). It did the same work the two pillar sections now do, less focused.
Test plan