Skip to content

feat: streaming synthesis#7

Merged
askalf merged 1 commit into
masterfrom
feat/streaming-synth
Apr 23, 2026
Merged

feat: streaming synthesis#7
askalf merged 1 commit into
masterfrom
feat/streaming-synth

Conversation

@askalf

@askalf askalf commented Apr 23, 2026

Copy link
Copy Markdown
Owner

Summary

A deep query's final synthesis is a single 30–60s LLM call. Until now the user stared at a frozen terminal for the whole duration, then got the full answer at once. Now tokens land on stdout as the model writes them.

When it's on

On by default when ALL of:

  • Not --json (JSON is buffered into the envelope)
  • Not --deep (intermediate rounds would print multiple full drafts back-to-back)
  • stdout.isTTY (piped output shouldn't interleave with progress events)
  • Not --no-stream (explicit opt-out)

Off otherwise. DEEPDIVE_FORCE_STREAM=1 env bypasses the TTY check for CI/testing.

CLI output structure under streaming

  1. Write # {question}\n\n up front
  2. Stream answer tokens straight to stdout
  3. After synthesis, write the ## Sources\n... block

If --out is also set, the full markdown (re-rendered from the buffered result) goes to the file, so the file stays atomically complete.

Implementation

  • src/llm-stream.ts — new callLLMStream with async-generator SSE parser. Reuses the existing retry helper for the initial connect; mid-stream failures propagate (can't undo already-emitted tokens).
  • src/synthesize.ts — optional onToken param; streams when set, buffered callLLM path otherwise.
  • src/agent.ts — new AgentConfig.onSynthesizeToken?: (chunk, round) => void hook.
  • src/cli.ts — picks streaming mode, writes header + sources around the stream.
  • src/config.tsstreamEnabled derived once; auto-off for JSON/deep/env-opt-out.

Test plan

  • npm run build — clean under strict: true
  • npm test — 180 pass (up from 164), 0 fail
  • 16 new assertions: parseBlocks (6: single-line, multi-line data folding, leading-space stripping, empty/comment blocks, [DONE] sentinel, malformed-JSON drop), parseSSE (4: multi-frame, chunk boundary splits, trailing-event-without-blank-line, CRLF endings), callLLMStream integration (4: token order + text aggregation + usage parsing, 500→200 retry, 401 does-not-retry, non-text_delta events ignored), plus CLI --no-stream and config derivation matrix

… them

A deep query's final synthesis is a 30–60s single LLM call. Until now the
user stared at a frozen terminal for the whole duration, then got the
entire answer at once. Now the tokens stream to stdout as soon as they
arrive.

Turned on by default when ALL of:
  - not --json (JSON is buffered into the envelope)
  - not --deep (intermediate rounds would print multiple full drafts
    back-to-back)
  - stdout.isTTY (piped output shouldn't interleave with progress events)
  - not --no-stream (explicit opt-out)

Turned off otherwise. Env-var DEEPDIVE_FORCE_STREAM=1 bypasses the TTY
check for CI-style testing.

CLI output structure under streaming:
  1. Write "# {question}\n\n" up front
  2. Stream answer tokens straight to stdout as they arrive
  3. After synthesis completes, write the "## Sources\n..." block
If --out is also set, the full markdown (re-rendered from the buffered
result) goes to the file so the file remains atomically complete.

Implementation:
  - src/llm-stream.ts: callLLMStream + parseSSE async generator + pure
    parseBlocks frame parser. Reuses the existing retry helper for the
    initial connect; mid-stream failures propagate.
  - src/synthesize.ts: added optional onToken param; passes through to
    streaming variant when set, falls back to buffered callLLM otherwise.
  - src/agent.ts: added AgentConfig.onSynthesizeToken — forwards chunks
    with the current round number.
  - src/cli.ts: picks streaming mode, writes header + sources around the
    stream.
  - src/config.ts: streamEnabled derived once and auto-off for json/deep/
    env opt-out.

Tests: 16 new assertions (180 total, up from 164 pre-branch, 96 before
the production-grade track started).
- parseBlocks (6): single-line, multi-line data folding (spec-compliant),
  leading-space stripping, empty/comment blocks, [DONE] sentinel,
  malformed-JSON drop
- parseSSE (4): multi-frame stream, chunk boundary splitting, trailing
  event without blank line, CRLF line endings
- callLLMStream integration (4): token order + full-text aggregation +
  usage parsing, 500-retry-then-succeed, 401-does-not-retry,
  non-text_delta events ignored gracefully
- CLI/config (2): --no-stream flag, streamEnabled derivation matrix
@askalf askalf enabled auto-merge (squash) April 23, 2026 01:23
@askalf askalf merged commit a942e41 into master Apr 23, 2026
4 checks passed
@askalf askalf deleted the feat/streaming-synth branch April 23, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant