fix(fs): cross-worker snapshot loading for session fork#172
Merged
Conversation
Loading a heap + filesystem snapshot created by a *different* worker (the session-fork case: read source snapshots, seed a new worker) hit three issues. Fixed so a worker can restore another worker's heap and fs from shared S3. 1. Isolate runtime. execute_module drove the V8 event loop on the ambient multi-thread runtime. deno_core's op driver schedules every pending async op via deno_unsync::spawn, which asserts a current-thread runtime; any op that stayed pending (e.g. an fs blob fetched from S3 on a cold cache) aborted the process. The isolate now always runs on a dedicated current-thread runtime. 2. S3 client affinity. The S3 client's connection pool / IO reactor lives on the runtime it was built on (the server's main runtime). Calls issued from the isolate's current-thread runtime never progressed. S3HeapStorage now captures that runtime handle and dispatches every call onto it. 3. Lazy in-op fs fetch. fs reads pull chunks on demand from inside the op, on the isolate runtime, which cannot await the blob backend's remote I/O. Added HeapStorage::warm/contains and FsStore::prefetch; build_fs_mount now warms the mounted tree's blobs into the node-local cache on the main runtime before the isolate runs, so in-op reads are pure local-cache hits. warm() skips already-local blobs cheaply, so same-worker sessions are unaffected. Verified on a coordinator + 2 learners + MinIO: a learner restores another learner's heap and fs from S3; a fork accumulates its own changes across runs while the source is unchanged (copy-on-write isolation). Note: a debug-only assertion in the deno_core fork's serialize_for_snapshotting fires when re-snapshotting a cross-loaded isolate. It is compiled out in release (the snapshot is valid — verified end-to-end on a release build), so production is unaffected; run a release binary for local fork testing. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…O) setup PR #171 added a configurable S3 endpoint + path-style addressing so the S3 backend can target S3-compatible stores, but only AWS_ENDPOINT_URL was documented. Document AWS_S3_FORCE_PATH_STYLE (required by MinIO et al.), add a "Use an S3-compatible store" how-to, and correct the client-init description (no longer plain load_from_env). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ss-worker fs Reverts the per-execution current-thread runtime change. Running the isolate on its own runtime broke async ops that await resources bound to the server's main runtime — notably mcp.callTool (child MCP clients) deadlocked (MCP Tool Calling E2E hung). The prefetch (build_fs_mount warms the mounted tree into the local cache on the main runtime) already makes in-op fs reads local, so no pending remote I/O happens inside the isolate and the original deno_unsync abort no longer triggers — without changing the isolate's runtime. Verified: mcp.callTool round-trips again, and cross-worker fork still restores heap + fs (VAR=heap-7 FILE=fs-7) on a release build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
r33drichards
added a commit
to r33drichards/open-agents
that referenced
this pull request
Jun 16, 2026
Add a "Duplicate" action to mcp-js sessions that creates a new session seeded from the source's V8 heap AND content-addressed filesystem, so the fork starts with the source's accumulated state and then diverges copy-on-write. - API: POST /api/sessions/[id]/fork reads the source's latest heap+fs snapshot ids from its running worker and creates a new session carrying them as a `forkSource` marker; optional `copyMessages` clones the source's latest chat history. - Provisioning: buildMcpJsSandboxState seeds a forked worker from forkSource once (a no-op run mounting the source heap+fs), then clears the marker so later restores don't reset the fork. baseUrl is left empty initially so isSandboxActive doesn't skip provisioning. - DB: sessions.parentSessionId (fork lineage) + forkSessionWithChat; the sidebar list now carries a lightweight sandboxType (from sandbox_state JSON). - UI: a Duplicate dropdown (sandbox-only / with chat history), shown only for mcp-js sessions, wired through useSessions.duplicateSession. - mcp-js client: run_js now sends/returns `fs`; McpJsState gains `forkSource`; McpJsSandbox.getState() now includes the `type` discriminant (it was dropping it, corrupting the persisted sandbox state). Requires the mcp-js fork fixes (r33drichards/mcp-js#172) for cross-worker snapshot loading; verified end-to-end against a local cluster + MinIO (release binary): a fork inherits the source's heap+fs, accumulates its own changes across runs, and the source is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Loading a heap + filesystem snapshot created by a different worker — the session-fork case (read a source session's snapshots, seed a fresh worker from them) — failed three ways. This makes it work: a worker can restore another worker's heap and fs from the shared S3 blob store.
Fixes
Isolate runtime.
execute_moduledrove the V8 event loop on the ambient multi-thread runtime. deno_core's op driver schedules every pending async op viadeno_unsync::spawn, which asserts a current-thread runtime; any op that stayed pending (e.g. an fs blob fetched from S3 on a cold cache) aborted the process. The isolate now always runs on a dedicated current-thread runtime.S3 client runtime affinity. The S3 client's connection pool / IO reactor lives on the runtime it was built on (the server's main runtime). Calls issued from the isolate's current-thread runtime never progressed.
S3HeapStoragenow captures that handle and dispatches every call onto it.Lazy in-op fs fetch. fs reads pull chunks on demand from inside the op, on the isolate runtime, which cannot await the blob backend's remote I/O. Added
HeapStorage::warm/containsandFsStore::prefetch;build_fs_mountwarms the mounted tree's blobs into the node-local cache on the main runtime before the isolate runs, so in-op reads are pure local-cache hits.warm()skips already-local blobs cheaply, so same-worker sessions are unaffected.Verification
Coordinator + 2 learners + MinIO. A learner restores another learner's heap and fs from S3; a fork accumulates its own changes across runs (
VAR=heap-7+mod FILE=fs-7+moreon re-run) while the source is unchanged (VAR=heap-7 FILE=fs-7) — copy-on-write isolation. 99 lib tests pass.Note on debug builds
A debug-only assertion in the deno_core fork's
serialize_for_snapshotting(by_name.len() == handles.len()) fires when re-snapshotting a cross-loaded isolate. It is compiled out in release, and the resulting snapshot is valid (verified end-to-end on a release build), so production is unaffected. Run a release binary for local fork testing, or relax that assertion in the fork for debug builds.🤖 Generated with Claude Code