P3: kolu dials remote hosts over ssh (kolu-watcher + N-endpoint registry)#1385
P3: kolu dials remote hosts over ssh (kolu-watcher + N-endpoint registry)#1385srid wants to merge 64 commits into
Conversation
…on on the terminal record
Foundation for P3 (kaval-sessions: kolu dials remotes). All additive/optional
so old sessions deserialize as local and no wire enum is touched:
- TerminalCreateInputSchema.hostId (target host; omitted = local)
- PtySpawnOpts.hostId (lifecycle resolves the owning endpoint from it)
- ServerPersistedTerminalFieldsSchema.location {hostId} (persisted so
kill/attach/restore route back to the same endpoint)
ENDPOINT_STATES / DaemonStatusSchema deliberately unchanged — ssh health will
surface as client-only projected states, keeping the wire enum (and drishti) untouched.
Completes the additive contract (Step 1). The 4 Code-tab fs/git stream inputs (gitStatus/gitDiff/fsListAll/fsReadFile in kolu-git/schemas) gain an optional hostId so the server can route a remote terminal's fs/git over its ssh endpoint; omitted ⇒ local ⇒ byte-identical for every existing client. Adds a hermetic round-trip test proving pre-P3 records/inputs still parse (no location ⇒ local) and P3 ones carry hostId end to end. Doc note disambiguates the client TerminalLocation projection from the new record-level location.hostId.
So kolu-watcher (P3) can run the SAME per-terminal provider DAG host-side without dragging in the full kolu-server (DB/publisher/surfaceCtx/browser surface). providers.ts already imported only ../log.ts + ./agentRecency.ts and is parameterized over ProviderChannels/ProviderHooks/ProviderRecord, so this is a move, not a rewrite: - New leaf package packages/dag (@kolu/terminal-dag): providers.ts + agentRecency.ts moved verbatim (git-tracked renames). - The one server coupling — the module-level pino `log` — is now INJECTED via ProviderHooks.log, so each host supplies its own logger (kolu-server keeps its serverId context; kolu-watcher will supply the watcher's). The DAG body is otherwise unchanged. - local.ts changes only its import path (./providers.ts → @kolu/terminal-dag) and passes `log` in makeHooks — the local path is byte-identical. - Wired into the workspace, the server's deps, and default.nix's src fileset. Verified: full-tree typecheck + biome clean; the moved agentRecency unit test passes in its new home. The DAG's end-to-end exercise lands in Step 4 (kolu-watcher in-process kaval + tmp-repo test), where startProviders runs against real taps with a proper lifecycle.
…metadata) The single typed surface a kolu-watcher serves over one ssh stdioLink and kolu-server's RemoteTerminalEndpoint mirrors. One flat defineSurface composing three concerns: - the ENTIRE pty-host surface, absorbed verbatim by spreading ptyHostSurface.spec (so the schemas can never drift from kaval's) — kolu-watcher forwards these verbs/taps to the host-local kaval as an ordinary client (the 'serve, don't raw-relay' decision: one serveOverStdio server, kaval stays a separate durable daemon); - fs/git as one-shot procedures (git.*/fs.*) + repo/file change streams — a 1:1 projection of the TerminalEndpointFs/Git interfaces onto the wire; - the terminalMetadata collection, value schema pinned to kolu-common's so the small-N mirror cannot drift. New leaf package packages/kolu-watcher (wired into the workspace + default.nix fileset). Hermetic contract test: composes without collision, metadata schema identity, absorbed verbs/streams present. typecheck + biome clean.
…terminal-dag kolu-watcher (P3) must serve the remote host's fs/git from the SAME kolu-git impl kolu-server uses locally — so factor LocalTerminalEndpoint's localFs/localGit and the GitResult→ORPCError unwrap into @kolu/terminal-dag as makeFsGit(log) + unwrapGit, consumed by both. local.ts now destructures makeFsGit(log) (behavior identical — same kolu-git calls); server/unwrapGit.ts re-exports so router.ts's import path is unchanged. typecheck + biome clean.
…tdio The host-resident heart of P3. kolu-server runs `ssh <host> kolu-watcher --stdio`; the watcher serves ONE watcherSurface over that link by composing: - kavalClient.ts — adopt-or-spawn the host-local kaval, return a TYPED ptyHostSurface client (the 'serve, don't relay' decision: the watcher is a kaval CLIENT it forwards to, not a frontDaemonOverStdio byte relay; it can't reExecAsDetachedDaemon — that's the watcher, not kaval — so it spawns the kaval binary nix bakes, with a version handshake). - server.ts — buildWatcherServer: every absorbed pty verb/tap FORWARDS to the kaval sub-client; fs/git served from the host filesystem via the shared makeFsGit; terminalMetadata produced by running kolu's provider DAG host-side (fresh per build) and published into the served collection kolu-server mirrors. Provider lifecycle mirrors LocalTerminalEndpoint (spawn starts the DAG, kill/ exit tears it down); terminals already alive on an adopted kaval are picked up. - streamBridge.ts — forward/bridge/tick adapters between kaval client streams, the DAG's channels, and the watcher's served streams (abort = expected teardown). - bin.ts — the --stdio entrypoint (serveOverStdio; pino logs to fd 2 since stdout is the wire); buildId.ts — the nix-baked staleKey/commit identity. Hermetic in-process test (server.test.ts): a FAKE kaval (stubbed ptyHostSurface, no PTY) + directLink proves spawn forwards, the DAG starts + mirrors metadata, terminalAttach forwards snapshot-first, and fs/git serve a real tmp git repo — no ssh, no nix, no spawned shell. (Run by CI; not locally — it exercises the DAG.) Full-tree typecheck + biome clean.
Nix-native from the get-go — kolu-server provisions the watcher closure over ssh
exactly like kaval:
- default.nix: a `kolu-watcher` runCommand wrapper (node --import tsx loader, so
SIGTERM reaches it) baking KOLU_WATCHER_BUILD_ID/COMMIT_HASH + KOLU_WATCHER_KAVAL_BIN
(the kaval it spawns when none is serving); git on PATH for kolu-git. `watcherSrc`
+ `watcherBuildId` (the package's own hashed src, kaval's narrow discipline).
- koluBin now bakes KOLU_WATCHER_BUILD_ID + KOLU_WATCHER_AGENT_DRVS_JSON and adds
openssh + nix to PATH — kolu-server itself now dials + provisions remotes.
- flake.nix: `watcherDrvBySystem` + `koluWatcherAgentDrvsJson` threaded into
koluBySystem (the kavalDrvBySystem analog; same IFD-free unsafeDiscardStringContext).
- buildId.closure.test.ts: the watcher's OWN closure guard — INVERSE of kaval's
(kolu app deps are allowed; the guard catches kolu-server/client sneaking in).
Trailing-slash allow-list entry catches @kolu/surface-{app,daemon-supervisor,
nix-host} too. reached == nix-hashed (no drift / dead code). Passes.
Verified: nix eval of .#kolu-watcher.drvPath + .#default.drvPath (no IFD cycle);
typecheck + biome clean. (Cross-arch BUILD runs in CI.)
…+ boot The kolu-server consumer side of P3 — kolu-server now dials remotes: - RemoteTerminalEndpoint (terminalEndpoint/remote.ts): the second TerminalEndpoint impl, over a HostSession<watcherContract>. PTY verbs/attach/kill forward to the watcher; fs/git one-shots forward + change-subs re-serve from its repoChange/ fileChange streams; a mirror bridge (the remote-process-monitor bridgeAgentToParent pattern — makeClientCursor re-pump + mirrorRemoteCollection) pumps the watcher's terminalMetadata into kolu-server's registry + browser surface, ADOPTING terminals the watcher already has. onState → publishDaemonStatus (mapped onto the existing wire enum, no widening → no drishti). Spawn policy reuses composeSpawnInput against the WATCHER's system.info, so a remote shell opens in the host's environment. - endpoint registry (terminalEndpoint/registry.ts): endpointFor(hostId) — local singleton for absent/'local' (byte-identical to pre-P3), one cached RemoteTerminalEndpoint per configured host. Reads the local singleton lazily so no #1005 TDZ regression. - host registry (hosts/registry.ts + watcherDrv.ts): KOLU_HOSTS_JSON → {hostId,host}; resolveDrvPath = resolveSystem → KOLU_WATCHER_AGENT_DRVS_JSON (kaval-tui pattern). - threading: createTerminal/killTerminal/killAllTerminals (terminals.ts), attach + create (router.ts), the 4 fs/git streams (surface.ts) all route through endpointFor(hostId). A sub-terminal inherits its parent's host. - boot (index.ts): dial every configured host at startup (publishes daemonStatus + adopts running terminals); a down host folds into the session reconnect loop, never blocks boot. Hermetic tests (all mocked, no ssh): remote.test.ts (fs/git forward, sync-shadow spawn, onState→daemonStatus mapping), registry.test.ts (dispatch + caching), hosts/registry.test.ts (config parse/validation). Full-tree typecheck + biome clean.
…scoped create The client side of P3 — remote terminals are now creatable + legible: - useDaemonStatus.ts: the singleton daemonStatus subscription drops its explicit keys so it tracks EVERY host the server publishes (local + each dialed remote). ClientDaemonState = DaemonState | 'provisioning' | 'unreachable' — two CLIENT-ONLY projections (the wire enum, and drishti, stay untouched); clientDaemonState(hostId) maps a remote host's ssh lifecycle onto them (dialing→provisioning, down→unreachable) while local passes through. Every accessor (downState/daemonWarming/refuseIfWarming/…) gains an optional hostId defaulting to LOCAL_HOST — so the whole-canvas DegradedCanvas gate stays local-only (a remote host down never blanks the canvas). - host-scoped create: handleCreate(cwd, initial, hostId) threads hostId into terminal.create + refuseIfWarming(hostId) (a remote terminal waits for ITS host); a sub-terminal inherits its parent's host. Threaded through the ActionContext seam. - host chip (TerminalMeta.tsx): a per-tile chip shown ONLY for a remote terminal (absence = local), naming the machine with a 7px health dot colored from that host's projected state (green connected · amber provisioning · red unreachable). The per-tile unreachable treatment, NOT a whole-canvas one. - command palette: a 'Connect to host…' leaf in the New terminal group — type an ssh target and a terminal opens there (the server dials any host verbatim; KOLU_HOSTS_JSON provides named aliases). Hermetic test: ClientDaemonState coverage + the remote projection + activeHostIds. Full-tree typecheck + biome clean. (Visual polish verified by the evidence step.)
- Unreleased changelog entry: kolu dials remote hosts over ssh (Connect to host, durable remote terminals, host chip, kolu-watcher). PR link backfilled at §3. - packages/kolu-watcher/README.md: why it's separate from kaval, the serve+forward composition, the nix provisioning story. - packages/dag/README.md: the extracted provider DAG + makeFsGit, why a package.
…R link
P3 (kolu dials remotes) flipped to shipped <PrLink pr={1385}/> in the phases
table + status block, documenting the kolu-watcher process, the option-(a)
serve+forward composition, the N-endpoint registry, and that the wire enum stayed
put (no drishti PR). Backfilled the changelog entry's pr={1385}. dist/ rebuilt
+ staged.
Add endpointForTerminal(id) to the endpoint registry and route killTerminal and attach through it, so the terminal->host resolution rule lives in one place instead of being re-spelled at each id-keyed call site. Agreed by the lowy ⇄ hickey lens debate (finding lowy-2, raised by lowy). Not pushed or merged.
Remove the unused remoteEndpointFor wrapper (zero callers; its docstring described a boot wiring that calls endpointFor directly) and the unsafe cast it carried. Agreed by the lowy ⇄ hickey lens debate (finding lowy-4, raised by lowy). Not pushed or merged.
… unit Replace onRemoteMeta's field-by-field copy of the 7 server-owned fields with a loop over SERVER_META_KEYS (the TerminalServerMetadata partition minus location, derived off the schema), so a new server/live field rides the mirror for free. Add a round-trip test pinning the key set to the schema keys minus location. Agreed by the lowy ⇄ hickey lens debate (finding hickey-1, raised by hickey). Not pushed or merged.
Replace the bare hostId === "local" literal in hostConfigFor with the imported LOCAL_HOST_ID constant, so hostConfigFor and endpointFor decide "is this the local host" on one shared value rather than two independent spellings. Agreed by the lowy ⇄ hickey lens debate (finding hickey-2, raised by hickey). Not pushed or merged.
Lift the InitialTerminalMetadata guarded if-chain into one applyInitialMetadata(meta, initial) in metadata.ts and call it from both LocalTerminalEndpoint.spawnPty and RemoteTerminalEndpoint.seedMeta, so a future initial field can't be seeded on one host and dropped on the other. parentId stays on the local path (not part of InitialTerminalMetadata). Agreed by the lowy ⇄ hickey lens debate (finding hickey-4, raised by hickey). Not pushed or merged.
…thClient Replace RemoteTerminalProxy.call and RemoteTerminalEndpoint.call — byte-identical acquire/try/finally/release wrappers — with one module-scope withClient(session, fn), so the ref-counted lease discipline lives in exactly one place. Agreed by the lowy ⇄ hickey lens debate (finding hickey-5, raised by hickey). Not pushed or merged.
Move the tap-pump primitive (carrying local.ts's per-event try/catch fence) into @kolu/terminal-dag and have both LocalTerminalEndpoint and kolu-watcher import it; delete the un-fenced copy in kolu-watcher/streamBridge.ts. The watcher's taps can no longer be silently frozen by one throwing event, and the two copies can't drift. Agreed by the lowy ⇄ hickey lens debate (finding hickey-6, raised by hickey). Not pushed or merged.
Add initialServerMeta(cwd, { pr }) to @kolu/terminal-dag, have kolu-watcher call it with pr: absent and kolu-server's createMetadata compose it with pr: pending. The new-terminal default record's field set now lives once; the deliberate local-vs-watcher PR-kind difference is one explicit argument instead of two forked literals.
Agreed by the lowy ⇄ hickey lens debate (finding hickey-7, raised by hickey). Not pushed or merged.
CI's ci::unit caught it (the in-process watcher test can't run locally — it spawns the DAG): the fake kaval returned id 'fake-terminal', but the watcher's terminalMetadata collection is keyed by TerminalIdSchema (uuid), so the keys-stream event-iterator validation rejected the non-uuid key. In real use kolu-server mints a uuid; the fixture now echoes crypto.randomUUID(). The watcher impl is sound — the other 3 watcher tests + ci::nix (both platforms) passed.
Address codex's six findings on the P3 remote-host endpoint. F1 (blocking, FIXED): remote PTY spawn used the local-host-only `composeSpawnInput`, which forwards THIS process's env (`cleanEnv()`) and treats `system.info` only as a fallback — so a remote shell could launch with the local machine's SHELL/HOME/PATH. Added `composeRemoteSpawnInput` that builds the env FRESH from the host's facts (`info.shell`/`info.home` authoritative, `info.path` → PATH), with nothing from this process's env leaking across. remote.ts now uses it. Locked by spawnInput.test.ts. F2 (blocking, FIXED): remote `attach` didn't gate on the proxy's `ready`, so a tile attaching off the sync shadow raced the (slow, post-provision) spawn and could hit a missing PTY. Now awaits `getTerminal(id)?.handle.ready` first, mirroring LocalTerminalEndpoint. Test added. F3 (blocking, FIXED): a kill landing while spawn was in flight left a live remote PTY with no local owner — `spawnAndWire` still marked ready and never killed the orphan. Now checks `getTerminal(id)` after spawn returns; if gone, marks the proxy failed, kills the remote PTY (logging kill failures), and returns without marking ready. Test added. F4 (blocking, FIXED): the client never threaded the terminal's host id into Code-tab fs/git stream inputs, so remote terminals fell back to the local endpoint. CodeTab now derives `hostId` from `meta.location.hostId` and passes it to gitStatus/gitDiff/fsListAll, and through BrowseFileDispatcher to fsReadFile. F5 (major, PARTIAL): the binary-preview `url` is served by the local file route off kolu-server's filesystem, so a remote file would 404 or serve an unrelated same-path local file. No watcher-proxying preview route exists yet, so fail CLOSED: `fsReadFile` now throws a typed NOT_IMPLEMENTED for a remote binary-previewable file (Code-tab toasts it) instead of minting a wrong-filesystem URL. Text reads already route over the watcher. Added `isRemoteHost` to registry.ts as the single local-vs-remote predicate. F6 (major, FIXED): `subscribeChange` awaited `session.acquire()` BEFORE the try in a floated task, so an unreachable-host rejection became an unhandled rejection that exits the server. Acquire is now inside the try with an `acquired` flag gating release. All addressed findings fixed except F5 (partial: safe degradation, full remote-preview route deferred). Server unit tests (remote + spawnInput) pass; typecheck clean across packages.
…ublish
/simplify pass (the two net-simplifications; the rest were lens-adjudicated dups,
intentional seeds, or premature micro-opts):
- forwardStream/tickStream now accept an undefined signal and guard internally,
removing the 8x `signal ?? new AbortController().signal` fallback from the watcher
surface stream sources.
- the terminalMetadata change-detection `{ ...meta }` copy moved into publishMeta
(the collection's concern) — dropped from the 4 publish call sites.
code-police no-silent-error-swallowing: bridge()'s cursor.next() catch broke out with no log, so a real failure read like a clean shutdown. Log at error with the host + err before exiting the pump.
⚖️ Lowy ⇄ Hickey lens debate✅ Consensus after 2 round(s) · lowy + hickey · base Independent findings: lowy=7, hickey=8 Applied (10)
Agreed — no change (5)
|
Codex ⇄ Claude debate✅ Consensus after 2 round(s) · codex reviewed at Round 1codex — approved: The branch has several serious remote-host correctness issues. The shape of the watcher/endpoint split is understandable, but the current implementation loses key local lifecycle guarantees and does not consistently thread host identity through the client/server boundary, so remote terminals can spawn with local machine assumptions, race their first attach, leak orphan PTYs, and serve Code-tab data from the wrong filesystem. Findings:
claude — Reviewed codex's six findings against the P3 remote-host endpoint. All six are valid; fixed five fully and one (F5) partially with a safe fail-closed degradation. F1: remote spawn used the local-only composeSpawnInput (which forwards this process's cleanEnv() and only falls back to system.info), so a remote shell could open with the local machine's SHELL/HOME/PATH; added composeRemoteSpawnInput in ptyHost/index.ts that builds env fresh from host facts (info.shell/home authoritative, info.path → PATH) with no local-env leak, and pointed remote.ts at it. F2: remote attach now awaits getTerminal(id)?.handle.ready before opening the watcher stream (mirrors local). F3: spawnAndWire now detects a kill-during-spawn (entry gone), marks the proxy failed, kills the orphaned remote PTY, and returns without marking ready. F4: CodeTab derives hostId from meta.location.hostId and threads it into gitStatus/gitDiff/fsListAll plus through BrowseFileDispatcher to fsReadFile. F5 (partial): no watcher-proxying preview route exists, so fsReadFile now throws a typed NOT_IMPLEMENTED for a remote binary-previewable file instead of minting a wrong-filesystem URL — codex's own fallback suggestion; full remote-preview byte-streaming route deferred. F6: subscribeChange's session.acquire() moved inside the try (with an acquired flag gating release) so an unreachable-host rejection in the floated task can no longer crash the server. Added focused unit tests for F1/F2/F3 (spawnInput.test.ts + remote.test.ts). Typecheck clean across all packages; server remote + spawnInput suites pass (13 tests). Two biome warnings on the touched server files are pre-existing at the base commit (unmodified import lines). Note: a pre-existing unrelated failure in adopt.test.ts (SavedTerminalSchema gained 'location' but the test sentinel doesn't list it) exists on the branch independent of this round and was left for the P3 step-1/step-9 owner.
commit: Round 2codex — approved: CLAUDE addressed the six findings from the prior round. The remote spawn path now uses host-authoritative spawn input, remote attach now waits for the sync shadow to resolve, the killed-during-spawn orphan PTY case is cleaned up, Code-tab fs/git calls carry host identity through to the server, remote binary previews now fail closed instead of serving a local-filesystem URL, and remote change subscriptions no longer float an unhandled acquire rejection. Findings:
codex on the rebuttal: F1-F4 and F6 were fixed as raised. For F5, I accept the scoped degradation: returning claude — (no author turn this round) (no author turn — the debate ended this round) |
👮 Code-policeRan with
|
The schema-key guard (the #1275 lossy-adoption class) correctly caught Step 1's additive location field: 'the sentinel covers EVERY persisted SavedTerminal key' asserts the adopt sentinel has every persisted key, and location was missing. Adding it also makes the round-trip test verify location is carried through adoption (it is — adoptedMeta spreads the whole record). CI's ci::unit caught it.
🧪 CI metrics — leased pool boxThe x86_64-linux lane ran on
(no per-recipe timing in Pool status (8 boxes)
Posted by |
EvidenceP3's headline behavior — kolu-server dials a remote host, spawns a terminal there, and mirrors its fs/git + metadata back into the browser like a local one — is proven here by CI + a hermetic end-to-end composition test, not by stubs. The one thing that genuinely needs live remote infrastructure (a real ssh dial of a second host, to screenshot the host chip on a remote tile) is called out honestly below as the natural follow-up capture. CI is green on both platforms (the comprehensive validation)
The visible surface (and why the live capture is a follow-up)The new on-screen affordances are the command palette's "Connect to host…" leaf (in the New terminal group) and the per-tile host chip (machine name + a green/amber/red health dot, rendered only for remote terminals — absence = local). Capturing the chip on a real remote tile requires a configured, reachable ssh host that kolu-server can provision |
Carry id + label on the HostProgress shape and have both the chip-button tooltip and the popover header read them from that one object, instead of assembling "${hostId} — ${label}" at three independent sites. A label-copy change now lands in one place. The popover takes the whole progress shape, dropping its separate hostId/state props.
Agreed by the lowy ⇄ hickey lens debate (finding hickey-5, raised by hickey). Not pushed or merged.
F1 (fixed): the chip's click-popover gate (hasLog) keyed only on lines.length, but markConnected() does NOT clear progressLines and mapDaemonStatus publishes progress for connected too — so a connected host stayed cursor-pointer/clickable and reopened its stale dial log, contradicting the deliberate "connected host drops progress UI" intent. Gated the trigger on (warming || failed) && lines.length (canOpen) — the same lifecycle gate the inline hints use — and added a createEffect that closes the popover the moment canOpen goes false (a dial completing while the popover is open no longer strands the old log). F2 (partial): kept the <button> element (deliberate — keyboard a11y + popover trigger + the data-testid=terminal-host-chip e2e contract), but when there's no actionable log the chip is now tabindex=-1 + aria-disabled, so keyboard/AT users aren't handed a focusable no-op. Disputed dropping the button or making the "No activity yet" fallback reachable: the log is intentionally lifecycle-gated, not shown on a steady connected host. Added useHostProgress gate tests (incl. connected-with-retained-progress reports neither warming nor failed) in useDaemonStatus.test.ts. typecheck + biome clean; just fmt run.
The /simplify pass over the dial-progress feature (base 574ff09). Four reviewer findings all pointed at the same theme — the prepared HostProgress shape should carry every display fact so call sites re-derive nothing: - carry `state`, `tone`, and `inLifecycle` on HostProgress (computed once in useHostProgress from the presentation row), so: - HostProgressButton drops its redundant `state` prop and stops re-deriving the dot tone with a second DAEMON_STATE_PRESENTATION lookup — it reads the one `progress` object for `data-host-state` (state), the dot (tone), and the show/open gate (inLifecycle); - the `warming || failed` disjunction, previously spelled at four call sites (the two inline hints + canOpen + showLatest), is named once as `inLifecycle`. - collapse the two copy-paste inline hint <Show> blocks in HostChip into one — they differed only by gate (warming vs failed) and the pulse; now one <Show when={inLifecycle && latest}> with `classList={{ "animate-pulse": warming }}`. - HostChip drops its own clientDaemonState call (the state now rides progress). Skipped (noted): producer-side tag strip in surface-nix-host (the deeper "typed origin" fix is well outside this diff + touches the drishti contract); deriving label/id (defensible convenience). No behaviour change — the chip's e2e contract (data-testid, data-host-state, hostId text) is untouched.
code-police `no-thin-wrapper-functions`: `hostProgress(hostId)` was exported
but its whole body was `daemonStatusFor(hostId)?.progress ?? []` with exactly
one call site (`useHostProgress`). Inlined that body at the call site (folding
its doc into a comment) and repointed `stripProgressTag`'s `{@link}` at
`useHostProgress`, the function that now applies it.
⚖️ Lowy ⇄ Hickey lens debate✅ Consensus after 1 round(s) · lowy + hickey · base Independent findings: lowy=2, hickey=3 — all agreed
The change surfaces a remote host's dial-progress on the host chip. The debate converged on collapsing the progress concept into one prepared shape and deriving every display fact from the presentation table, rather than re-deriving it at each leaf view.
Both lenses agreed per-finding; fixes committed locally and now pushed. The human reviews + merges. |
Codex ⇄ Claude debate✅ Consensus after 2 round(s) · codex reviewed at
Round 1codex — approved: The data plumbing is mostly coherent and the server-side progress propagation is covered, but the client trigger does not actually enforce the stated mid-lifecycle-only behavior. A connected remote can retain and reopen stale dial logs, and the chip is also left as a focusable no-op button when there is no log to show. I did not modify anything; Findings:
claude — Addressed both of codex's minor findings on the host dial-progress chip. F1 (fixed): the click-popover trigger gated only on lines.length, but markConnected() never clears progressLines and mapDaemonStatus publishes progress for the connected state too — so a connected host stayed clickable and could reopen its stale provisioning log, contradicting the deliberate "connected host drops progress UI" intent. Re-gated the trigger on (warming || failed) && lines.length (canOpen) — the same lifecycle gate the inline hints already use — and added a createEffect that closes the popover the instant canOpen goes false, so a dial that completes mid-popover no longer strands the log. F2 (partial): kept the (deliberate: keyboard a11y, popover trigger, and the data-testid=terminal-host-chip e2e contract) but set tabindex=-1 + aria-disabled when there's no actionable log, so keyboard/AT users aren't handed a focusable no-op; disputed dropping the button or making the "No activity yet" fallback reachable, since the log is intentionally lifecycle-gated. Added 4 useHostProgress gate tests (incl. connected-with-retained-progress reports neither warming nor failed). typecheck + biome clean across all packages; just fmt run; all 12 useDaemonStatus tests pass.
commit: Round 2codex — approved: CLAUDE's follow-up resolves both prior findings. The trigger now shares the same lifecycle gate as the inline progress hints and closes itself when that gate drops, so connected hosts cannot reopen stale retained logs. For the inactive-button concern, keeping the element as a button is acceptable because it is no longer keyboard-focusable and is exposed as disabled when it has no actionable log. Findings:
codex on the rebuttal: F1: Accepted. The new claude — (no author turn this round) (no author turn — the debate ended this round) |
👮 Code-policeRan with
Rules —
|
The simplify pass shortened showLatest's body to inLifecycle, which fits on one line under the repo's pinned biome — local npx biome (a different version) left it wrapped, so ci::fmt flagged it. Formatted via nix develop's biome.
🧪 CI metrics — leased pool boxThe x86_64-linux lane ran on
Pool status (8 boxes)
Posted by |
The dial-progress ring already reached the client, but it only rendered as a
9px, max-16ch-truncated one-liner on the title-bar host chip (plus a log hidden
behind a click) — and not at all on mobile, where TerminalMetaCompact has no
chip. Meanwhile the tile BODY — the xterm the user is actually staring at for
the ~minute of a cold dial — was a blank black rectangle: zero feedback, looks
hung. That's "where's the client-side progress?".
Add HostDialOverlay: a connecting screen painted over the not-yet-attached
xterm (last child of Terminal's relative wrapper, so DOM order layers it with
no explicit z-index near the resize handle). It reads the SAME useHostProgress
ring the chip does — "Connecting to {host}…" (or "Couldn't reach {host}" on
failure) plus the full, untruncated, auto-scrolling provisioning log — and
unmounts the instant the host leaves its dial lifecycle (inLifecycle false →
connected), so the live terminal shows straight through. No new server plumbing.
Because it lives in the body it also renders on mobile (fixing the no-chip gap),
and a failed dial now shows WHY in-place instead of a silent vanish. The chip
stays as compact title-bar status (dot + name + on-demand log popover); its
redundant, illegible inline hint is dropped now that the body is the prominent
progress surface.
The remote kolu-watcher runs the SAME provider DAG as kolu-server, whose `pr`
provider shells out to `gh pr view` (via KOLU_GH_BIN). The watcher wrapper set
git (for kolu-git) but not gh, so on every poll the provider threw "KOLU_GH_BIN
is not set", classifyGhError fell through to `unknown`, and it logged a level-50
error on the remote — e.g.
[host:pureintent remote] gh pr view: unknown error
err: KOLU_GH_BIN is not set. Run kolu through the Nix wrapper…
`gh pr view` resolves via the repo's git remote-tracking, so it must run
host-side (the remote has the repo), not server-side. Pin KOLU_GH_BIN on the
watcher wrapper exactly as kolu-server does — gh joins the watcher's closure, so
it's present after the cold-dial copy. Where the remote gh is authed PR status
now works; where it isn't, gh's stderr classifies as the KNOWN `not-authenticated`
(a graceful, lower-level unavailable with actionable UI), not an `unknown` error.
Remote gh auth itself (forwarding the local token) is a separate follow-up.
Merging master (#1384) made `ci::biome` fail on warnings, not just errors. Three P3 files carried warnings the old gate ignored: - server.test.ts, remote.ts: `watcherSurface` is used only as a type → import type - surface.ts: `ORPCError` was unused (only referenced in comments) → dropped, keeping the used `implement`.
Adversarial review of the connecting-screen overlay found two real defects: - The `z-20` overlay sat over the xterm with default pointer-events, so in the brief window where `inLifecycle` flips false and Solid's `<Show>` tears the node down asynchronously, it could swallow a keystroke meant for the now-live terminal. It's purely visual (header + log, nothing to click) → mark it `pointer-events-none`; input always reaches the xterm and the race is gone. - The autoscroll `createEffect` lived in DialScreen, outside the `<Show>`, so on the first render `logRef` was unbound and the first line never pinned. Split the body + effect into `DialBody`, which mounts only under the `<Show>` — its ref is bound before its effect first runs. Also corrected the stale file-header claim of "no explicit z-index" (it carries z-20, contained by Terminal's `isolate` wrapper).
🧪 CI metrics — leased pool boxThe x86_64-linux lane ran on
Pool status (8 boxes)
Posted by |
Session restore dropped the saved `location.hostId`: `handleCreate(cwd, meta)` was called without the 3rd `hostId` arg, so every remote terminal restored as a LOCAL shell — the durable remote kaval was abandoned and the "reconnect where you left off" promise died across a kolu restart. Three coordinated changes: - Boot (index.ts): dial not just the configured hosts but every host that owns a terminal in the SAVED SESSION, so a durable remote kaval's terminals adopt back (you rejoin them) before the client reaches restore — even for an ad-hoc host that isn't in KOLU_HOSTS_JSON. - Restore (useSessionRestore.ts): thread `t.location?.hostId` into handleCreate so a restored remote terminal spawns on the right endpoint. - Dedup guard: skip any saved terminal already live in the store. A durable daemon (local kaval across an update, or a re-dialed remote host) re-adopts by id; since adoption preserves the id and restore mints fresh ids, "already live by saved id" ⇒ adopted ⇒ skip — otherwise the boot-dial adoption would race restore and spawn a duplicate shell beside the adopted one.
The dock rows gave no sign of which terminals live on a remote host. Add a shared `RemotePip` (RowPips) — the machine name plus that host's health dot (green connected · amber provisioning · red unreachable), mirroring the tile title bar's HostChip — rendered on line 2 beside the PR pip in both the desktop dock and the mobile drawer. Renders nothing for local terminals.
…host (watcher 1.1) P3 remote-parity contract trio — three host-side concerns that previously always ran on kolu-server now follow a terminal to its owning host: - worktree: git.worktreeCreate/worktreeRemove gain an optional hostId; the router resolves the owning endpoint via endpointFor(hostId) and the remote endpoint forwards to the watcher's new git.worktreeCreate/worktreeRemove procedures, so a remote tile's worktree lands on the host the repo lives on. - file upload: pasteImage/uploadFile write through the terminal's endpoint (TerminalEndpointFs.writeFile) — local to kolu-server's scratch, remote to the host's per-terminal scratch via the watcher's new fs.writeFile — so the bracketed-paste path resolves where the PTY actually runs. - kaval identity: the remote endpoint queries the absorbed system.version once on connect and folds the remote kaval's identity + startedAt onto every published daemonStatus; the IdentityRail renders one kaval column per host (activeHostIds), and the dialog/update-chip/restart are gated to LOCAL_HOST (a remote kaval is pinned to its watcher's Nix build). WATCHER_CONTRACT_VERSION 1.0 -> 1.1 (additive). There is no runtime skew gate on this version today (the dial validates kaval's PTY_HOST_CONTRACT_VERSION, not this) — a stale watcher fails the new procedures per-call with a clear error, and a normal re-dial re-realises the watcher from the server's drv map, picking up 1.1. makeFsGit now returns BaseFs/BaseGit (the shared repo-keyed reads); each consumer adds its host-binding writeFile/worktree methods to complete the widened TerminalEndpointFs/TerminalEndpointGit interfaces.
…er's A remote terminal's shell sourced the user's rc (the prompt rendered) but its PATH was the kolu-watcher's restricted Nix PATH (node+git+gh): the watcher is launched over a NON-login ssh session, and kaval's system.info reported the watcher process's own PATH, which composeRemoteSpawnInput hands the spawned shell. So user-profile tools (zoxide, …) weren't found — `cd` fell through to command-not-found. The watcher now captures the user's LOGIN PATH once at startup — runs the login shell (`<shell> -l -c 'echo "$PATH"'`, 5s-bounded, falls back to the watcher PATH on any failure) so /etc/profile + the user's profile build the real PATH incl. the nix profile dirs — and serves THAT as system.info.path. Contained to the watcher; no change to local kaval or kaval-tui. NEEDS live-remote validation: dial a real host and confirm `echo $PATH` in the PTY carries ~/.nix-profile/bin and that zoxide resolves.
…he feed The activity feed silently dropped remote terminals on two fronts. 1. getRecentRepos pruned every entry whose repoRoot failed existsOnDisk against the LOCAL filesystem. A remote terminal's repoRoot lives on the remote host, so it was never on this machine's disk and vanished from the recent-repos feed on the first read. Entries now carry an optional `hostId` (absent ⇒ local); the prune only stat-checks LOCAL entries and keeps remote ones unconditionally (we can't cheaply stat a remote host's fs from here). 2. trackRecentRepo / trackRecentAgent were only wired from LocalTerminalEndpoint's provider DAG. RemoteTerminalEndpoint runs no DAG of its own (it lives in the watcher), so remote activity was never tracked. The metadata mirror (onRemoteMeta) now tracks the remote repo + agent on CHANGE — matching the local git/agent providers — stamped with the dialed hostId. The MRU dedup key for both feeds now folds in hostId so the same repoRoot (or agent command) on different hosts — or local vs remote — stays a distinct entry instead of collapsing onto one another. `hostId` is an additive optional field: existing state files deserialize with it absent (⇒ local), so no migration / schema-version bump is needed.
Adversarial review of the upload-to-remote path found real issues: - LEAK (blocker): the watcher never removed a terminal's scratch dir on exit (the local endpoint does via cleanupTerminalScratch). Added cleanupWatcherFile wired into stopProviderLayer, so pasted files don't accumulate on the host. - WORLD-READABLE (blocker): the watcher scratch root was a predictable shared path (.../kolu-watcher-scratch) created with default perms — on a multi-user host with a /tmp fallback, one user could read another's pasted secrets. Now per-USER (uid-scoped name) and every dir is mkdir'd mode 0700; same 0700 made explicit on kolu-server's scratch subdir. - TRAVERSAL (major): a belt-and-suspenders guard rejects a terminalId with a separator/`..` (already UUID-validated on the wire, but defend the fs too). - SIZE (major): a defense-in-depth size cap at the watcher's fs.writeFile. - DRIFT (major): sanitizeUploadName was DUPLICATED in kolu-server and the watcher (security-sensitive, could drift). De-duplicated into the one browser-safe shared module (kolu-common/upload, beside the size guards) — single source; both import it. Test moved to upload.test.ts.
🧪 CI metrics — leased pool boxThe x86_64-linux lane ran on
Pool status (8 boxes)
Posted by |
🔌 Remote-terminal feature parity (audit + fixes)Audited every "local feature" against remote terminals (7-agent sweep), then fixed all gaps. Each fix was adversarially reviewed; the review caught real bugs (incl. two security blockers) that are also fixed here.
Verification: CI green both platforms on Notes / follow-ups (out of scope here): remote |
A remote terminal's tile (and its connecting-screen overlay) didn't appear for the whole dial — several seconds on a cold dial. Cause: the client renders a tile only once that terminal has a `terminalMetadata` entry (terminalIds = meta.keys().filter(has-metadata)), but remote spawnPty published the terminal to the LIST and never seeded its metadata — that arrived only via the mirror, AFTER the dial connected. So the tile, and the progress UI that rides it, waited the entire dial. spawnPty now upserts the seed metadata (location.hostId + cwd) synchronously, so the tile renders at once and the connecting screen shows during the dial — same as a local terminal appears instantly. The mirror's onRemoteMeta enriches the same entry (git/agent/…) once connected. Covers fresh dials AND cold session restores (both go through spawnPty).
…ess too Warm boot-adoption now reaches tile + connecting-overlay parity with a fresh dial. A durable remote host's saved terminals are pre-registered as SHADOW tiles at boot — BEFORE the metadata mirror connects — so each adopted tile and its connecting screen render DURING the re-dial, not in the multi-second gap after connect. The gap was structural: a fresh/cold spawn seeds terminalMetadata synchronously (the client renders a tile only once a terminal has a metadata entry), but warm adoption arrived solely via the mirror's onRemoteMeta, which fires only after connect — so an adopted tile blinked in late, with no connecting screen during the dial. - RemoteTerminalEndpoint.preRegisterSavedTerminals: for each saved terminal (skipping any already adopted by a racing mirror), seed its metadata from the saved record (cwd + client-persisted fields) and register a RemoteTerminalProxy whose `ready` is deliberately left UNRESOLVED — a shadow. The seed is upserted to terminalMetadata so the tile + overlay paint immediately; attaches/writes block on `ready` until the live PTY is confirmed. - markReady-on-confirm: onRemoteMeta releases a pre-registered shadow's ready gate (and drops it from the set) the first time the mirror delivers its metadata — the entry already exists, so it takes the enrich path, not the adopt branch. - orphan reconciliation: the bridge reconciles shadows against the watcher's FIRST live snapshot (gated once per mirror connect via a new optional mirrorRemoteCollection onSnapshot hook). A shadow the watcher doesn't list is an orphan — its shell exited while kolu was down — so its dead tile is removed rather than left with a stuck connecting overlay. - index.ts boot dial groups the saved session's remote terminals by host and pre-registers them on each dialed RemoteTerminalEndpoint. onSnapshot is optional/back-compatible (the remote-process-monitor example consumer is unaffected).
🧪 CI metrics — leased pool boxThe x86_64-linux lane ran on
Pool status (8 boxes)
Posted by |
kolu-server can now dial a remote host and run terminals on it over ssh — the full P3 of the kaval-sessions plan.
kaval-tui --host(P2) reached a remote kaval from the shell; P3 brings that reach into the kolu app itself: open a terminal on another machine from the command palette's Connect to host…, with its files + git mirrored into the Code tab like a local one, the remote session durable across the link, and a per-tile host chip carrying the host's health.The shape
A remote host runs two processes side by side, dialed over one ssh connection:
watcherSurfaceover the ssh stdio: it absorbs the pty-host verbs/taps and forwards them to the host-local kaval (aserveOverStdioserver that is a client of kaval's socket — not a raw relay), plus serves fs/git + aterminalMetadatacollection.RemoteTerminalEndpointconsumes that surface, mirroring the watcher's metadata into the browser surface (theremote-process-monitorbridgeAgentToParentpattern) and adopting terminals the host already has.How it's built (commit by commit)
hostIdon create + the 4 fs/git stream inputs;location {hostId}on the terminal record. All optional → old sessions are local, the wire enum is untouched (no drishti).makeFsGitextracted into a package both kolu-server and kolu-watcher import — the local path stays byte-identical (only the logger is now injected).kavalClient(adopt-or-spawn the host kaval) · server (DAG host-side, forward pty, serve fs/git) ·bin· its own nix package + staleKey + closure-allow-list test + drv-map.TerminalEndpointimpl over aHostSession<watcherContract>;endpointFor(hostId)dispatch; the ssh-host registry (KOLU_HOSTS_JSON); boot dials each configured host.daemonStatus; client-onlyprovisioning/unreachableprojections; host-scoped create; the per-tile host chip; a Connect to host… palette leaf.Notes for review
The live ssh dial (provisioning a real remote, the browser updating from a remote host's mirror) is validated by the evidence step, not by stubbing — nothing is stubbed.
Try it locally
🤖 Generated with Claude Code
Generated by
/beon Claude Code (modelclaude-opus-4-8).