Replace bwrap+socat with vendored srt-launcher on Linux#272
Open
dylan-conway wants to merge 14 commits into
Open
Replace bwrap+socat with vendored srt-launcher on Linux#272dylan-conway wants to merge 14 commits into
dylan-conway wants to merge 14 commits into
Conversation
A single statically-linked binary that provides the Linux sandbox
primitives we currently get from three external programs:
bwrap -> srt-launcher run namespace + filesystem isolation + exec
apply-seccomp -> srt-launcher run unix-socket-block seccomp before exec
socat -> srt-launcher run in-sandbox TCP<->Unix relays
srt-launcher relay host-side bridge to an external proxy
srt-launcher connect ssh ProxyCommand HTTP CONNECT helper
`run` does one unshare(USER|PID|NS[|NET]), forks PID 1, sets up the mount
namespace via the same pivot_root + bind-remount sequence bubblewrap uses,
forks the proxy relays (with PR_SET_DUMPABLE=0, so the seccomp'd worker
can't ptrace them), forks the worker, applies the baked-in BPF filter,
and execs. There is no nested PID namespace: PID 1 and the relays are our
own forks with DUMPABLE=0 from the start, which is what apply-seccomp's
nested layer existed to achieve when bwrap's init was the thing to hide.
The mount module is written against raw libc so it reads side-by-side
with bubblewrap's bind-mount.c / bubblewrap.c for review. The BPF filter
is the existing seccomp-unix-block.c output, embedded via include_bytes!.
Builds with `npm run build:launcher` (rustup + musl target + gcc +
libseccomp-dev for the build-time BPF generator). ~425 KB stripped per
arch.
apply-seccomp.c and vendor/seccomp/build.ts are removed; the BPF
generator (seccomp-unix-block.c) stays as a build-time dependency.
The Linux sandbox no longer depends on bubblewrap or socat. The
vendored srt-launcher binary (previous commit) covers all three roles:
namespace/filesystem isolation, the in-sandbox proxy relays, and the
unix-socket-block seccomp filter.
Functional changes:
- The HTTP and SOCKS proxy servers listen on a Unix socket on Linux
(in a private mode-0700 mkdtemp directory). The in-sandbox relay
connects to that socket directly via a bind-mount; there is no
host-side TCP loopback hop or bridge process for the internal-proxy
case. macOS keeps its TCP listener.
- When network.httpProxyPort / socksProxyPort points at an external
proxy, `srt-launcher relay` runs on the host as a Unix-to-TCP bridge
to that port. The relay sets PR_SET_PDEATHSIG and signals readiness
on stdout, so cleanup is `kill('SIGKILL')` with no exit-event
monitoring.
- wrapCommandWithSandboxLinux emits a single
`srt-launcher run [opts] -- bash -c '<cmd>'` argv. No shell glue
between the sandbox layer and the user command.
- GIT_SSH_COMMAND uses `srt-launcher connect` as ssh's ProxyCommand
(replaces `socat - PROXY:...`). The launcher validates host/port
before building the CONNECT request line.
Config / API:
- `bwrapPath`, `socatPath`, and `seccomp` config fields are removed.
A `launcher: { path?, argv0? }` field covers the override and
multicall cases.
- `getProxyPort()` / `getSocksProxyPort()` return undefined on Linux
when the internal proxy is used (it has no TCP port).
- `cleanupBwrapMountPoints` -> `cleanupSandboxMountPoints` (the
mechanism is intrinsic to bind-mount deny, not to bwrap).
CI: build:seccomp -> build:launcher; install rustup + musl target +
musl-tools instead of bubblewrap + socat.
The bind-mount flag surface (--bind / --ro-bind / --tmpfs / --proc /
--dev) and the per-path mount semantics are unchanged from the bwrap
path, so generateFilesystemArgs() and the symlink/ancestor handling
that feed it are kept as-is.
…_char; macOS test gate - mount.rs: in a userns, mounts inherited from the parent ns are MNT_LOCKED and MS_REMOUNT returns EPERM even when only adding restrictions. Tolerate that for submounts (the kernel won't let the sandbox loosen them either, so skipping is no weaker than the host); EPERM on the bind root itself stays fatal. Surfaced on the GitHub runner via /proc/sys/fs/binfmt_misc — bwrap didn't hit this because the apt-packaged bwrap is setuid and ran as real root. - mount.rs: ttyname_r buffer uses libc::c_char (u8 on aarch64, i8 on x86_64) instead of i8. - srt-launcher.test.ts: gate the vendored-binary-resolution test on Linux (the binary isn't built or shipped on macOS).
CLI surface: --unshare-pid, --new-session, --die-with-parent, and --chdir are removed. The first three were always passed and gate correctness properties (PID-1/reaper architecture, TIOCSTI defense, orphan prevention) that are not tunables; they're now applied unconditionally. --chdir was never emitted (the launcher captures and restores the spawn-time cwd itself). unsafe reductions: - die_errno! delegates to die! via io::Error::last_os_error; errno_str() and the raw __errno_location() reads are gone - FORWARD_TARGET: static mut -> AtomicI32 (Relaxed) - libc::isatty -> stdin().is_terminal() - CStr::from_ptr on the ttyname buffer -> CStr::from_bytes_until_nul - libc::setenv loop -> env::set_var - ready-fd write+close -> File::from_raw_fd().write_all() - mount() returns Result<(), i32> so callers don't re-read errno Dead code: - bitflags_lite! macro replaced with enum BindKind (the variants are never combined) - parse_tcp_target's "localhost" arm (callers always pass 127.0.0.1) - connect's --proxy default (caller always passes it; now required) - relay_fork return value, Clone derives on RelaySpec/MountOp - relay_main trailing ExitCode::SUCCESS -> unreachable!() Also: dedupe wait-status decode, merge geteuid/getegid into one unsafe block, use io::Error from write_all instead of re-reading errno, fix outer-stub waitpid to not decode status on error return. Net: -44 LOC, ~-13 unsafe blocks. clippy -D warnings clean on x64 and arm64.
…x pidns
The relays now fork from the stub between the NET unshare and the NS|PID
unshare, landing them in {host pidns, host mountns, sandbox netns}:
stub unshare(USER?|NET) → lo up → fork relays → unshare(NS|PID)
├─ relay :3128 host pidns/mountns, sandbox netns — invisible to workload
├─ relay :1080 (same)
══════════════════ mount + PID ns boundary
└─ PID 1 setup_filesystem → fork worker → reap (unchanged)
└─ worker seccomp → exec
The workload has no PID for the relays — kill, ptrace, and /proc
enumeration all return ESRCH/ENOENT — which is structurally stronger
than the previous DUMPABLE=0 protection and closes the `kill -9 -1`
self-DoS that DUMPABLE=0 doesn't.
The relay also now opens an O_PATH fd to the bridge socket at startup
(before the workload exists) and connects via /proc/self/fd/N, pinning
the inode. A workload with rw access to the socket's directory (via a
bind mount of /tmp) can no longer redirect the relay to a different
host unix socket by swapping the path.
Relays drop all caps and set PR_SET_PDEATHSIG immediately after fork
(they no longer get the kernel's pidns-teardown SIGKILL); the stub
reaps them and explicitly kill(-pgid, SIGKILL)s them after PID 1 exits.
The stub now also drops caps after forking PID 1 (it only waits and
forwards signals from there). DUMPABLE=0 is set on the stub after the
userns map writes (writing /proc/self/{uid,gid}_map needs the files to
be owned by us, which DUMPABLE=0 prevents) and inherited by PID 1;
relays set it themselves.
Three processes (stub → PID 1 → worker) is the floor: unshare(NEWPID)
requires a fork to enter, and execve resets signal handlers — so a
PID 1 that is the workload can't receive SIGTERM.
mount.rs is unchanged; PID 1 still does all of setup_filesystem.
Tests added: relay survives `kill -9 -1` from inside the pidns; relay
ignores a workload-side socket-path swap.
|
Why, though? Why try and reinvent (even vibe code) niche security sandbox tooling when widely used, trusted, reviewed industry standards exist? |
Reconcile Windows proxy port-range support with Linux unix-socket listen targets in sandbox-manager; keep WindowsConfigSchema, drop bwrapPath/socatPath/SeccompConfig.
…untinfo lookup Resolve bind sources on host root before pivot; capture ttyname(1) before unshare; refuse to run setuid; die on root remount failure and on missing mountinfo entry; tolerate only EACCES on submounts.
…evel non-dir denyRead masking
…t; adjust ProxyCommand test
…inherited fds before exec Re-arm PDEATHSIG and set TCP keepalive in host-relay children; retry read on EINTR in splice; skip kill(-pgid) for already-reaped relays; const-assert BPF blob size; cwd $HOME fallback + $PWD export.
…vm_writev assertions
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The Linux sandbox no longer depends on
bubblewraporsocat. They're replaced bysrt-launcher, a single statically-linked Rust binary vendored undervendor/srt-launcher-rs/and shipped prebuilt undervendor/srt-launcher/{x64,arm64}/.Two commits:
vendor: add srt-launcher Rust crate— the binary itself.run(namespace + mount + relay-fork + seccomp + exec),relay(host-side Unix→TCP bridge for the external-proxy case only),connect(ssh ProxyCommand HTTP CONNECT helper). The mount module is written against rawlibcso it reads side-by-side with bubblewrap'sbind-mount.c/bubblewrap.cfor review.linux: replace bwrap+socat with vendored srt-launcher— TS wiring, tests, CI, README.Architecture
There is one namespace layer instead of two: PID 1 and the relays are forked from
srt-launcherwithPR_SET_DUMPABLE=0from the start, so the seccomp'd worker can'tptraceor write/proc/N/memagainst them regardless ofkernel.yama.ptrace_scope. The nested PID namespace fromapply-seccompexisted to hide bwrap's init, which we don't have anymore.Breaking config changes
bwrapPath,socatPath,seccompconfig fields are removed. New:launcher: { path?, argv0? }for overriding the binary location or invoking it as a multicall sub-binary.SandboxManager.getProxyPort()/getSocksProxyPort()returnundefinedon Linux when the internal proxy is used (it listens on a Unix socket, not a TCP port). They still return the configured port whennetwork.httpProxyPort/socksProxyPortis set.Runtime dependencies
bubblewrap,socat,ripgrepripgrepsrt-launcheris statically linked against musl; no host libc requirement.Build dependencies (CI /
npm run build:launcher)rustup+<arch>-unknown-linux-musltarget,gcc+libseccomp-dev(for the build-time BPF generator),musl-tools.What didn't change
The bind-mount flag surface (
--bind/--ro-bind/--tmpfs/--proc/--dev) and per-path mount semantics match bwrap's, sogenerateFilesystemArgs()and the symlink/ancestor/mandatory-deny handling that feed it are unchanged. macOS path is untouched.