Skip to content

feat(runtimed): SSH remote runtimes #1334

@rgbkrk

Description

@rgbkrk

Summary

Run notebook kernels on remote machines over SSH. The agent subprocess connects back to the daemon via a tunneled Unix socket — same protocol, same CRDT-driven execution, just over a network.

Architecture

LOCAL MACHINE                              REMOTE MACHINE

┌──────────────────────┐                   ┌──────────────────────────┐
│  Desktop App (Tauri) │                   │  runtimed agent          │
│  ┌──────┐  ┌───────┐ │                   │                          │
│  │ WASM │  │ Relay │ │                   │  Connects back to daemon │
│  └──────┘  └───┬───┘ │                   │  via SSH tunnel (Unix    │
└────────────────│─────┘                   │  socket forwarded)       │
                 │ Unix socket             │                          │
                 ▼                         │  Watches RuntimeStateDoc │
┌──────────────────────┐  SSH tunnel       │  for queued executions   │
│  runtimed (daemon)   │◄═══════════════►  │                          │
│                      │  (socket forward) │  Writes outputs back     │
│  NotebookRoom        │                   │  via Automerge sync      │
│  - RuntimeStateDoc   │                   │                          │
│  - execution queue   │                   │  ┌───────────────────┐   │
│  - blob store (local)│                   │  │ Python/Deno       │   │
└──────────────────────┘                   │  │ kernel process    │   │
                                           │  └───────────────────┘   │
                                           └──────────────────────────┘

What's already in place

The agent subprocess architecture is fully shipped (#1333, #1431, #1433, #1449):

  • Agent connects to daemon socket as a regular Automerge peer
  • Execution is CRDT-driven (coordinator writes queue entries, agent watches)
  • Agent restarts kernels internally via RestartKernel RPC
  • Agent provenance (agent_id in RuntimeStateDoc + current_agent_id on room)
  • Disconnection resilience — Automerge sync converges after reconnection

What SSH needs

  1. SSH tunnel for socket forwarding — forward the daemon's Unix socket to the remote machine so the agent can connect as if local.

  2. Remote agent deployment — copy or install runtimed binary on the remote machine. Could use scp + chmod, or assume it's pre-installed.

  3. Agent-side env resolution — remote agents can't use the coordinator's env pool (different filesystem). The agent needs to resolve environments locally. The uv:pyproject path already works this way (uv run resolves at runtime). Other env sources need an EnvSpec-based protocol.

  4. Blob upload for remote kernels — see section below.

  5. Connection lifecycle — handle SSH disconnection gracefully. The agent keeps executing and syncs outputs when reconnected (the CRDT architecture already supports this).

Key simplification

The coordinator doesn't know or care if the agent is local or remote. It's just a peer on the socket. SSH is a transport concern, not an architecture change.

Blob upload: local and remote

The blob server runs HTTP for reads (GET /blob/{hash}). For writes (kernel producing large binary data like parquet), we need an upload path. Approaches to evaluate:

  • Blob channel via daemon socket — use the existing Handshake::Blob channel on the Unix socket. Authenticated by socket access. For remote, relayed through the SSH tunnel.
  • Agent-relayed upload — kernel sends data to its parent agent process, agent writes to blob store directly. For remote, agent relays back over the tunnel.
  • Direct filesystem write — kernel writes content-addressed files to blob store path directly. Simplest for local. For remote, agent relays.
  • Authenticated HTTP POST — only if we add authentication to the blob server (not the default unauthenticated GET-only server).

The right approach depends on the security model and whether we want kernels to have direct blob store access or go through an authenticated channel.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    daemonruntimed daemon, kernel management, sync serverenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions