-
Notifications
You must be signed in to change notification settings - Fork 2
feat(runtimed): SSH remote runtimes #1334
Description
Summary
Run notebook kernels on remote machines over SSH. The agent subprocess connects back to the daemon via a tunneled Unix socket — same protocol, same CRDT-driven execution, just over a network.
Architecture
LOCAL MACHINE REMOTE MACHINE
┌──────────────────────┐ ┌──────────────────────────┐
│ Desktop App (Tauri) │ │ runtimed agent │
│ ┌──────┐ ┌───────┐ │ │ │
│ │ WASM │ │ Relay │ │ │ Connects back to daemon │
│ └──────┘ └───┬───┘ │ │ via SSH tunnel (Unix │
└────────────────│─────┘ │ socket forwarded) │
│ Unix socket │ │
▼ │ Watches RuntimeStateDoc │
┌──────────────────────┐ SSH tunnel │ for queued executions │
│ runtimed (daemon) │◄═══════════════► │ │
│ │ (socket forward) │ Writes outputs back │
│ NotebookRoom │ │ via Automerge sync │
│ - RuntimeStateDoc │ │ │
│ - execution queue │ │ ┌───────────────────┐ │
│ - blob store (local)│ │ │ Python/Deno │ │
└──────────────────────┘ │ │ kernel process │ │
│ └───────────────────┘ │
└──────────────────────────┘
What's already in place
The agent subprocess architecture is fully shipped (#1333, #1431, #1433, #1449):
- Agent connects to daemon socket as a regular Automerge peer
- Execution is CRDT-driven (coordinator writes queue entries, agent watches)
- Agent restarts kernels internally via
RestartKernelRPC - Agent provenance (
agent_idin RuntimeStateDoc +current_agent_idon room) - Disconnection resilience — Automerge sync converges after reconnection
What SSH needs
-
SSH tunnel for socket forwarding — forward the daemon's Unix socket to the remote machine so the agent can connect as if local.
-
Remote agent deployment — copy or install
runtimedbinary on the remote machine. Could usescp+chmod, or assume it's pre-installed. -
Agent-side env resolution — remote agents can't use the coordinator's env pool (different filesystem). The agent needs to resolve environments locally. The
uv:pyprojectpath already works this way (uv runresolves at runtime). Other env sources need anEnvSpec-based protocol. -
Blob upload for remote kernels — see section below.
-
Connection lifecycle — handle SSH disconnection gracefully. The agent keeps executing and syncs outputs when reconnected (the CRDT architecture already supports this).
Key simplification
The coordinator doesn't know or care if the agent is local or remote. It's just a peer on the socket. SSH is a transport concern, not an architecture change.
Blob upload: local and remote
The blob server runs HTTP for reads (GET /blob/{hash}). For writes (kernel producing large binary data like parquet), we need an upload path. Approaches to evaluate:
- Blob channel via daemon socket — use the existing
Handshake::Blobchannel on the Unix socket. Authenticated by socket access. For remote, relayed through the SSH tunnel. - Agent-relayed upload — kernel sends data to its parent agent process, agent writes to blob store directly. For remote, agent relays back over the tunnel.
- Direct filesystem write — kernel writes content-addressed files to blob store path directly. Simplest for local. For remote, agent relays.
- Authenticated HTTP POST — only if we add authentication to the blob server (not the default unauthenticated GET-only server).
The right approach depends on the security model and whether we want kernels to have direct blob store access or go through an authenticated channel.
References
- feat(runtimed): process-isolated runtime agent with blob relay #1333 — Process-isolated runtime agent (shipped)
- feat(runtimed): agent as Unix socket peer with CRDT-driven execution #1431 — Agent as Unix socket peer
- Runtime agent sandboxing: contain kernel processes with OS-level isolation #1307 — OS-level kernel sandboxing