Skip to content

docs(atlas): a from-scratch design for remote terminals#1398

Draft
srid wants to merge 2 commits into
masterfrom
docs/remote-terminals-from-scratch
Draft

docs(atlas): a from-scratch design for remote terminals#1398
srid wants to merge 2 commits into
masterfrom
docs/remote-terminals-from-scratch

Conversation

@srid

@srid srid commented Jun 18, 2026

Copy link
Copy Markdown
Member

A clean-slate design for kolu remote terminals, written after rejecting #1385 — the "kolu dials remotes" work — as a single 133-file PR. Rather than finish that design in safe layers, this note redesigns from the goal and dissolves #1385's bug classes by construction instead of patching them. A plan-of-record Atlas note (status: proposed), sibling of kaval-sessions — no code changes.

The core model

One thesis: the host is the sole source of truth for its own live terminals, and kolu persists only a dial-list (Set<hostId>), never a remote-terminal record. A remote tile is a local TerminalEndpoint bound to ssh — local vs remote is only the transport.

  • Existence from one atomic pull — boot = restore = reconnect: dial → await kaval's atomic terminal.list (never the watcher's incremental mirror) → synchronously seed terminalMetadata → project.
  • Persist only a dial-list — nothing to prune; a remote terminal can never enter the saved session.
  • Endpoint first-classlocation non-optional; a dropped host is a compile error.

The honest correction (what the adversarial review caught)

"The first slice needs no host agent; the DAG just runs in kolu-server" is provably wrong — the git/PR/agent providers read kolu-server's own fs/os.homedir(), so in-server they inspect the wrong machine. Awareness is split: PTY-tap signals (cwd/title/foreground) run in-server; git/PR/agent run host-side — exactly why the watcher exists (renamed the "host agent", minus PTY-forwarding).

Four bug classes, dissolved by construction

#1385 bug class Why it's now unreachable
cold-boot re-adoption race existence = kaval's atomic terminal.list + sync seed; persist only a dial-list
env leak (restricted PATH) spawn fully-specified; host login PATH captured host-side
remote-repo prune prune-stat via endpointFor(record.hostId).fs, tolerant of a down host
restore-to-wrong-endpoint location non-optional; restore = dial + adopt, never read-record-and-spawn

Twelve risk-isolated slices (honest about cost, not "thin")

P1 location non-optional · P2 dial-list · P2b host-aware feed — cheap local-only bug-proofing, do firstP4a kaval-direct shell (ssh, no watcher) → P4w harden kolu-watcher + its Nix closure — the FAT cost centre, parallel to P4aP3 boot via kaval's atomic list → P4c consume the watcher over a net-new loopback harnessP4d watcher over ssh → P5 health · P6 host picker · P7 Code-tab fs/git · P9 multiplex.

Loopback is ssh-free but NOT Nix-free. getHostSession always runs nix-store --realise before any link (even for localhost), and remote.ts hardwires it — so the loopback harness (P4c) is net-new seam work, and going remote (P4d) keeps real risk loopback can't prove: cross-arch .drv, ssh reconnect/recheck/ControlMaster, nix copy over ssh-ng. Caveat: P1–P4d map to commits on the still-open #1385, not merged trunk, so "merges independently / stays green" is a per-slice goal to re-validate.

Preview

Self-contained Atlas page (Code tab: docs/atlas/dist/remote-terminals-from-scratch.html):

https://htmlpreview.github.io/?https://github.com/juspay/kolu/blob/docs/remote-terminals-from-scratch/docs/atlas/dist/remote-terminals-from-scratch.html

Generated by Claude Code (model claude-opus-4-8).

srid added 2 commits June 17, 2026 23:28
A clean-slate design for kolu remote terminals, written after rejecting
PR #1385 (the "kolu dials remotes" work) as a single 133-file PR. Instead of
finishing that design in layers, it redesigns from the goal:

- the HOST is the sole source of truth for its own live terminals; kolu
  persists only a dial-list (Set<hostId>), never a remote-terminal record;
- endpoint/location is first-class and non-optional (local is a real value);
- awareness is split honestly — PTY-tap signals (cwd/title/foreground) run
  in-server off kaval's taps, but git/PR/agent awareness MUST run host-side
  (its providers read the host's fs/home), which is exactly why the watcher
  exists;
- the four #1385 bug classes (cold-boot re-adoption race, env leak,
  remote-repo prune, restore-to-wrong-endpoint) are dissolved by
  construction, not patched.

Ships as ten thin, independently-mergeable vertical slices; the first remote
PR puts a durable shell on the canvas with no host agent. Sibling of the
kaval-sessions plan-of-record; status: proposed.
…nest cost

Fold the design refinements debated on the PR, verified against the real
kolu-watcher code:

- "thin slices" → "risk-isolated slices" + an honest cost scorecard; the
  watcher (P4w) is the fat cost centre, not a thin slice.
- Split the old fat P4b into P4w (harden packages/kolu-watcher + its Nix
  closure; parallel with P4a — it imports nothing from the PTY endpoint),
  P4c (consume over a net-new loopback harness), P4d (over ssh).
- Honest loopback framing: the harness is ssh-free but NOT Nix-free
  (getHostSession always nix-realises, even for localhost); the swap point is
  net-new (remote.ts hardwires getHostSession); the consumption logic already
  ships against a mock. Going remote keeps real risk (cross-arch .drv,
  reconnect/recheck/ControlMaster, nix copy over ssh-ng) loopback can't prove.
- Existence comes from kaval's atomic terminal.list, never the watcher's
  incremental mirror — that removes the settle window's cause, not re-times it.
- P1 reverses #1385's shipped optional location (a change + migration).
- Caveat: P1–P4d are commits on the still-open PR #1385, not merged trunk.

Twelve phases now; DAG + claims adversarially verified.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant