Skip to content

feat(linux): linux BPF-LSM mediation filter (closes exec + read bypasses)#26

Open
drewmchugh wants to merge 20 commits into
kipz:developfrom
drewmchugh:am/linux-exec-filter-bpf-lsm
Open

feat(linux): linux BPF-LSM mediation filter (closes exec + read bypasses)#26
drewmchugh wants to merge 20 commits into
kipz:developfrom
drewmchugh:am/linux-exec-filter-bpf-lsm

Conversation

@drewmchugh

@drewmchugh drewmchugh commented Apr 30, 2026

Copy link
Copy Markdown

Summary

Replaces the seccomp-unotify exec filter from #20 with a BPF-LSM mediation filter that moves the mediation decision into the kernel (closing the TOCTOU race a sibling thread sharing the trapped task's memory could exploit), and adds a file_open LSM hook so the agent cannot read the mediated binary's bytes at all — closing copy-the-binary, ld-linux trick, unprivileged-tmpfs, and shellcode bypasses in addition to the original direct-path exec bypass #20 was scoped to.

This is a fork-and-replace of #20 rather than a follow-on commit: the seccomp-unotify filter, its supervisor handler, and the shebang-chain walker are deleted; BPF-LSM is the sole enforcement path for mediated commands. Sessions where mediation is active hard-fail at startup if BPF-LSM isn't installable, surfacing the specific reason (missing kernel cmdline / caps).

Update (commits a5b6a85, 00063f4, 378f12c). The same BPF-LSM infrastructure now also enforces filesystem subtree deny-within-allow on Linux — matching macOS Seatbelt's (deny file-write* (subpath ...)) semantics for policy.add_deny_access and the required policy-group denies. The kernel side adds eight inode_* LSM hooks plus a dentry parent-chain walk on file_open; the userspace side routes the existing policy.add_deny_access and required policy-group denies into a new protected_roots BPF map. See the new Filesystem subtree deny section below.

How the kernel side decides

Two BPF-LSM hooks share a single (dev, ino) deny map and a cgroup-ancestor scope check:

  • bprm_check_security — fires after the kernel has resolved the binary the call will actually load (bprm->file); a -EACCES return atomically aborts the exec syscall, with no race against any user-memory pointer the agent controlled.
  • file_open — fires inside do_filp_open for every successful path resolution that yields a file descriptor; denying opens of mediated inodes prevents the agent from reading the binary's bytes through any path (cp / cat / mmap / splice / dynamic linker).

Per-session scoping by cgroup ancestry: the broker creates a per-session cgroup at session start, the agent joins it as its first post-fork action, and the BPF program walks the calling task's cgroup ancestor chain looking for the session cgroup id. Cgroup membership inherits on fork() and is unaffected by reparenting, so daemonized agent descendants stay scoped. Ancestry (not equality) closes a sub-cgroup-escape where an agent with CAP_SYS_ADMIN could mkdir a child cgroup of the session and move into it.

Bypass classes closed

# Bypass Closed by
1 Direct path: /usr/bin/gh bprm_check_security denies the exec
2 TOCTOU on user-memory paths (incl. vfork-bomb) decision is in-kernel against bprm->file, not user-memory pointers
3 Copy the binary then exec the copy file_open denies the read of mediated bytes
4 Indirect load: /lib/ld-linux-x86-64.so.2 /usr/bin/gh file_open denies the linker's open()
5 Unprivileged tmpfs + copy + exec file_open denies the read at the source
6 Shellcode (read bytes, mmap PROT_EXEC, jump) file_open denies the source read
7 Hardlink at non-deny-set path identity is (dev, ino), hardlinks share the inode
8 Daemonize / reparent to break parent-chain identity scope is cgroup membership, not parent chain
9 Sub-cgroup escape (privileged agent only) scope check walks ancestor chain
10 Read of a path inside a protected subtree extended file_open walks dentry parents against protected_roots
11 Write to a file inside a protected subtree same file_open path (open-for-write)
12 mmap / splice of a protected file same file_open (mmap setup opens the file)
13 unlink / rmdir / rename of a protected child new inode_* hooks return -EACCES
14 Create file/dir/symlink/link inside protected inode_create / inode_mkdir / inode_symlink / inode_link
15 chmod / chown / truncate / utimes on protected inode_setattr
16 Access via a bind mount of a non-protected source over a protected target userspace populator inserts both directly-listed inodes AND bind-mount source roots into protected_roots

The vfork-bomb residual #20 left open empirically went from 49/600 bypasses (under the seccomp-unotify filter on this branch with cgroup scoping but no BPF) to 0/600 under BPF-LSM. The pthread variant remains 0/600.

Deployment requirements

Requirement How
Active LSM stack lsm=...,bpf in kernel cmdline. The host kernel needs CONFIG_BPF_LSM=y; verify via grep bpf_lsm_bprm_check_security /proc/kallsyms.
Broker capabilities setcap cap_bpf,cap_sys_admin,cap_dac_override+ep /usr/bin/nono. CAP_BPF for the BPF program load; CAP_SYS_ADMIN for the cgroup-namespace privilege check on mkdir; CAP_DAC_OVERRIDE because cgroup v2 mkdir runs the VFS DAC check before the cgroup-namespace privilege check, and CAP_SYS_ADMIN does not subsume DAC.
Agent invariants PR_SET_NO_NEW_PRIVS=1 set before agent execve (existing nono behaviour) so file caps don't apply. The broker also verifies CapEff: 0 on the agent post-execve and emits an explicit invariants log line at session start.

If any requirement is unmet, the broker fails session start with an explicit error pointing at the specific fix (kernel cmdline / caps / config). There is no silent partial enforcement.

Filesystem subtree deny (extension)

The first BPF-LSM hook pair gates binaries by inode. The same infrastructure now extends to gating filesystem subtrees — the same problem macOS Seatbelt solves with (deny file-write* (subpath ...)) rules.

Why this exists

Three pre-existing Linux gaps motivated this:

  • validate_caps_against_protected_roots hard-rejected on Linux when a profile granted a parent of ~/.nono (e.g. --workdir=$HOME), because Landlock can't express deny-within-allow. macOS handled the same case via Seatbelt deny rules. The user-visible symptom (Jira WRK-2585): claude from $HOME on a workspace fails sandbox init with nono: Sandbox initialization failed: Refusing to grant '/home/bits' (source: Profile) because it overlaps protected nono state root '/home/bits/.nono'.
  • policy.add_deny_access (crates/nono-cli/src/profile/mod.rs:98) was a no-op on Linux — add_deny_access_rules short-circuited at if cfg!(target_os = "macos"). shadowfax's Linux profile declares add_deny_access: ["$HOME/.config/password-store/gpg"] and the deny was silently dropped.
  • validate_deny_overlaps rejected any session whose allow set covered a required policy-group deny (deny_credentials, deny_keychains_linux, …), forcing Linux profiles to enumerate their allows narrowly enough to dodge the overlap.

All three are now resolved by routing the deny-within-allow enforcement through BPF-LSM at the kernel level.

Kernel side (commit a5b6a85)

A new protected_roots BPF_MAP_TYPE_HASH (capacity 64, keyed by (dev, ino)). dentry_in_protected_subtree walks up to MAX_DENTRY_DEPTH=16 levels via BPF_CORE_READ on d_parent, verifier-friendly with the same #pragma unroll shape as the existing cgroup walker. check_file_open extended to also consult the new map; eight new SEC blocks — inode_unlink, inode_rmdir, inode_rename, inode_create, inode_mkdir, inode_symlink, inode_link, inode_setattr — cover the structural mutations that don't go through file_open. Two new audit reason codes (protected_open_deny, protected_mutate_deny) carry through the existing ringbuf and JSONL pipeline.

bpf_d_path was considered and rejected: it returns mount-namespace-aware paths (the wrong granularity for the bind-mount case below), and several of the new hooks operate on dentry directly with no struct path to hand to it.

Userspace side (commit 00063f4)

  • protected_paths.rs: target_os = "macos" gate dropped from validate_requested_path_against_protected_roots. With allow_parent_of_protected: true set on the profile, both platforms admit the parent grant; the OS sandbox layer enforces the subtree at runtime (Seatbelt rules on macOS, BPF-LSM hooks on Linux). New bpf_lsm_protected_roots_for_session(state_roots, add_deny_access, policy_group_denies) merges all three deny sources, runs add_deny_access strings through policy::expand_path so $HOME/... and ~/... placeholders become concrete paths the populator can stat (without the expansion the literal string would never make it into the kernel map — bug caught during end-to-end validation).
  • policy.rs: validate_deny_overlaps no-ops on Linux when BPF-LSM is available — the kernel hooks return -EACCES for any in-subtree access regardless of what Landlock allows, so the silent-drop motivation no longer holds. Hosts without BPF-LSM (no lsm=...,bpf in cmdline, no cap_bpf) still hit the existing rejection.
  • bpf_lsm.rs: install_mediation_filter takes a protected_paths: &[PathBuf] slice. The populator stats each canonical path, inserts (dev, ino), then parses /proc/self/mountinfo to enumerate bind-mount source roots mounted at-or-under any protected path — needed because the dentry parent walk follows the source filesystem tree, not the mount tree (e.g. shadowfax's ~/.nono/sessions bind-mount). Both the directly-listed inode and the bind-mount source inode go into the same map; the BPF walker doesn't care which.
  • exec_strategy.rs: BPF-LSM now installs when EITHER the exec deny set OR the protected_roots set is non-empty (a profile with no mediation.commands but with add_deny_access or any policy-group deny still gets enforcement).
  • Plumbing: protected_paths flows through PreparedSandboxExecutionFlagsSupervisedRuntimeContextSupervisorConfiginstall_mediation_filter. Audit log dir falls back to ~/.nono/sessions when only protected_paths is active (no exec mediation session).
  • 12 new integration tests in crates/nono-cli/tests/bpf_lsm_protected_subtree.rs covering read / write / mmap / unlink / rmdir / rename / create / bind-mount / regression / ~/.nono / add_deny_access / opt-in session start.

Documentation (commit 378f12c)

  • docs/linux-bpf-lsm-mediation.md: new ## Filesystem subtree deny section between Performance and Deployment requirements, with hook coverage table, identity model, bind-mount handling, audit, performance numbers, and profile schema. Summary section linked.
  • qa-profiles/04-allow-parent-of-protected.json: sample profile granting $HOME with the opt-in plus add_deny_access for $HOME/.config/qa-secret. Used for manual validation.

Performance (microbench)

  • 200-iteration cat /etc/passwd inside the sandbox: ~1879 ms (release nono).
  • Same loop native: ~254 ms.
  • Subtracting ~1.5 s nono one-time setup: ~125 µs added per cat. Each cat does ~50 file opens (loader resolving libs, /etc/passwd itself), so ~2-3 µs per BPF-checked file_open in the busy path.
  • 5-run median of claude -p hello: native 1.85 s, sandboxed 4.6 s. The 2.75 s overhead is dominated by the existing nono setup costs that PR feat(linux): linux BPF-LSM mediation filter (closes exec + read bypasses) #26 already incurred; the new BPF code is in the noise.
  • Imperceptible for interactive agent workloads. Potentially measurable on tight build loops (e.g. make -j with thousands of forks per second), where the existing nono machinery already dominates the budget.

What changed vs #20

Deleted:

  • install_seccomp_exec_filter and its BPF program builder from sandbox/linux.rs
  • handle_exec_notification, classify_exec_path, ExecDecision, read_path_at, read_execve_argv_at, count_threads from supervisor_linux.rs
  • mediation/shebang.rs (the shebang-chain walker)
  • mediation/filter_audit.rs (the seccomp-era audit emitter — replaced by a library-side reader for the BPF ringbuf)
  • The seccomp poll branch and exec_notify_fd SCM_RIGHTS plumbing in exec_strategy.rs

Added (original BPF-LSM work):

  • crates/nono/src/bpf/mediation.bpf.c — two LSM programs sharing a deny map + scope check, plus a BPF_MAP_TYPE_RINGBUF for audit records
  • crates/nono/src/sandbox/bpf_lsm.rs — loader (install_mediation_filter, MediationFilterHandle, SessionCgroup)
  • crates/nono/src/sandbox/bpf_audit.rs — userspace ring-buffer reader that appends JSONL events to ~/.nono/sessions/audit.jsonl
  • crates/nono/tests/bpf_lsm_smoke.rs — load + attach unit tests
  • crates/nono-cli/tests/bpf_lsm_integration.rs — 16 end-to-end tests organised into exec mediation / read mediation / audit / composition groups, with self-skip if the test binary lacks the BPF caps
  • make test-integration — build → setcap → cargo-test target so the integration suite actually runs in CI

Added (filesystem subtree deny extension):

  • crates/nono/src/bpf/mediation.bpf.c extended — protected_roots map, dentry_in_protected_subtree, file_in_protected_subtree, extended check_file_open, eight new inode_* SEC blocks, two new audit reason codes
  • crates/nono/src/sandbox/bpf_lsm.rs extended — MAX_PROTECTED_ROOTS=64, eight new Link fields on MediationFilterHandle, attached_program_count(), protected_roots_map(), collect_protected_root_entries, bind_mount_sources_under, decode_mountinfo_octal, two new BpfLsmError variants
  • crates/nono-cli/src/protected_paths.rs extended — bpf_lsm_protected_roots_for_session, target_os = "macos" gate dropped
  • crates/nono-cli/src/policy.rsvalidate_deny_overlaps no-ops on Linux when BPF-LSM is available
  • crates/nono-cli/src/sandbox_prepare.rs + the ExecutionFlags/SupervisedRuntimeContext/SupervisorConfig chain — thread protected_paths end-to-end
  • crates/nono-cli/tests/bpf_lsm_protected_subtree.rs — 12 new integration tests
  • qa-profiles/04-allow-parent-of-protected.json — sample profile

Schema change (audit.jsonl entries from BPF):

  • action_type: "allow_unmediated" / "deny" (no exec_filter_ prefix)
  • reason (only on deny): "exec_deny" / "open_deny" / "protected_open_deny" / "protected_mutate_deny"
  • interpreter_chain: dropped (kernel resolves shebangs internally; BPF fires for the actually-loaded binary)
  • exit_code: 126 on deny, absent on allow

The shim's own mediation::AuditEvent shape is unchanged; both event shapes interleave in the same JSONL file.

Validation

Performed end-to-end on the BPF-LSM workspace AMI (am/bpf-lsm-workspace-ami on dd-source — adds lsm=...,bpf to the kernel cmdline). Each check was re-run after every phase:

  • cargo clippy --workspace --all-targets --all-features -- -D warnings -D clippy::unwrap_used clean
  • cargo fmt --all -- --check clean
  • cargo test -p nono --lib — 628/628 pass
  • sudo -E cargo test -p nono --test bpf_lsm_smoke — 7/7 pass (5 original + 2 new for protected_roots)
  • make test-integration — 16/16 (original) pass, plus 12/12 new in bpf_lsm_protected_subtree
  • Plain cargo test -p nono-cli --test bpf_lsm_integration (no caps) — self-skip cleanly with informative messages
  • cargo test -p nono-cli --bins protected_paths — 9/9 unit tests pass (including the rewritten Linux opt-in cases)
  • vfork-bomb POC — 0/600 bypasses (was 49/600 with seccomp-unotify alone)
  • pthread TOCTOU POC — 0/600 bypasses (regression check)
  • sub-cgroup escape POC (privileged-agent threat model) — denied via ancestor walk
  • Real claude end-to-end against qa-profiles/04-allow-parent-of-protected.json from $HOME: session starts cleanly, all deny scenarios fire, audit records protected_open_deny and protected_mutate_deny events
  • Real claude end-to-end against the shadowfax production Linux profile (DataDog/shadowfax/deployments/linux/profiles/claude.json) extended with allow_parent_of_protected: true: session starts cleanly, claude -p "what is 2+2?" returns "4" (no productivity regression), all deny scenarios fire including ~/.config/password-store/gpg/key from add_deny_access

Documentation

Test plan

  • Reviewer boots a workspace from am/bpf-lsm-workspace-ami and confirms cat /sys/kernel/security/lsm includes bpf
  • make build-release && sudo setcap cap_bpf,cap_sys_admin,cap_dac_override+ep target/release/nono
  • sudo -E cargo test -p nono --test bpf_lsm_smoke — expect 7 pass (5 original + 2 new)
  • make test-integration — expect 16 (original) + 12 (new bpf_lsm_protected_subtree) = 28 pass
  • Run a mediation-active session and confirm ~/.nono/sessions/audit.jsonl receives allow_unmediated and deny/open_deny entries with the expected schema
  • Run a session against qa-profiles/04-allow-parent-of-protected.json, verify protected_open_deny and protected_mutate_deny audit events fire when the agent reads/writes $HOME/.config/qa-secret
  • Run the kipz POCs (vfork + pthread, 600 attempts each); expect 0 bypasses

🤖 Generated with Claude Code

@drewmchugh drewmchugh force-pushed the am/linux-exec-filter-bpf-lsm branch 2 times, most recently from eddfce2 to 1c0993f Compare April 30, 2026 16:38
kipz and others added 14 commits May 6, 2026 12:37
Introduces a mediation framework that intercepts command execution
within sandboxed sessions, requiring admin approval for sensitive
operations. Includes audit trail logging, Unix socket control server,
and per-command sandbox shims.

Signed-off-by: James Carnegie <me@kipz.org>
Adds a macOS menu bar application that discovers active nono sessions
and provides a UI for reviewing and approving/denying mediated commands.

Signed-off-by: James Carnegie <me@kipz.org>
Signed-off-by: James Carnegie <me@kipz.org>
Commands like `gh` authenticate via macOS Keychain using mach-lookup IPC
to securityd. The per-command Seatbelt sandbox blocks these mach-lookups
by default, causing 401 auth failures when gh tries to retrieve its
stored GitHub token.

Add `keychain_access: bool` to CommandSandbox. When true, the per-command
sandbox grants read access to keychain DB files (login.keychain-db,
metadata.keychain-db), which triggers the existing mach-lookup deny skip
in the Seatbelt profile generator. This allows commands to authenticate
via keychain while keeping all other sandbox restrictions (filesystem,
network allowed_hosts) intact.

Also documents that the Approve action intentionally runs without a
per-command sandbox (None) because Seatbelt blocks keychain mach-lookup
even with keychain file grants insufficient for all auth paths.

Signed-off-by: Christine Le <christine.le@datadoghq.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
After the explicit pass, `profile::expand_vars` now resolves any remaining
`$VAR` / `${VAR}` tokens (uppercase+underscore+digit names) from the
process environment. Unset vars are left literal, matching the existing
`$XDG_RUNTIME_DIR` fallback so downstream `add_sandbox_*` helpers log
"does not exist, skipping" rather than failing the session.

Also switches per-command mediation sandbox paths from `expand_home`
to `expand_vars` so `$WORKDIR`, `$HOME`, XDG vars, and any launch-time
env var (e.g. a caller-provided `$GIT_ROOT`) resolve consistently in
both the main and per-command sandboxes. `SessionCtx` gains a `workdir`
field threaded from `execution_runtime` through `session::setup` and
`server::run` so per-command expansion uses the same workdir as the
rest of nono.
`expand_vars` already resolved $VAR / ${VAR} tokens in top-level
filesystem paths, policy.* paths, command_args, and per-command
sandbox paths. Extend the same expansion to the remaining
user-authored string fields in the profile:

- `mediation.commands[].intercept[].args_prefix` — the intercept
  matcher compared each entry literally against the incoming argv,
  so `"$USER"` used to be a dead string. Expanding at profile-load
  time lets authors write session-aware matchers (e.g. the macOS
  Keychain `security find-generic-password <user> ...` rule) without
  install-time `sed` substitutions over the profile file.
- `mediation.commands[].binary_path` — consumed verbatim by `PathBuf::from`,
  now expanded so profiles can point at user-specific binaries like
  `$HOME/.local/bin/tool`.
- `network.custom_credentials[].{tls_ca,tls_client_cert,tls_client_key}` —
  previously used the narrower `policy::expand_path` which only handled
  `~`, `$HOME`, `$TMPDIR`. Switched to the full `expand_str` so any
  configured env var (e.g. `$XDG_CONFIG_HOME`) resolves.

Introduces `profile::expand_str` as the string-returning core of the
expansion pipeline. `expand_vars` now delegates to it. Threads the
session `workdir` through `resolve_credentials`, `build_proxy_config_from_flags`,
`start_proxy_runtime`, and `resolve_command` so `$WORKDIR` resolves
consistently across all expansion sites.
Stamps each audit.jsonl entry with session_id, session_name, nono_pid,
sandboxed_pid, and command_pid so operators can correlate commands across
a session and trace the full process hierarchy.

Process hierarchy per log entry:
  nono_pid       — the nono supervisor (unsandboxed parent)
  sandboxed_pid  — the direct child process nono sandboxed (e.g. claude, codex)
  command_pid    — the shim process that ran the specific command (e.g. echo, git)

The session_id/session_name are pre-generated in execution_runtime before
mediation setup so audit.jsonl and the session record share the same values.
sandboxed_pid is resolved after fork via an Arc<OnceLock<u32>> latch shared
between the mediation server and the on_fork callback.

ShimRequest gains a pid field so the mediated path can record command_pid.
The shim's AuditEvent also gains command_pid for the audit-only datagram path.
All new AuditEvent fields use #[serde(default)] for backward compatibility
with older shims that do not send them.

Signed-off-by: Christine Le <christine.le@datadoghq.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mediated passthrough previously buffered stdio: the shim read stdin with
a 50ms timeout into a UTF-8 String, the server spawned the real binary
with Stdio::piped() + wait_with_output(), and the parent's ChildStdin
dropped before the child could read. ssh/git over a binary pipe hit
SIGPIPE; every long-running mediated command (gh, kubectl, ...) silently
buffered output until exit.

The shim now sends its stdin/stdout/stderr fds via SCM_RIGHTS after the
JSON request. The server passes them straight to the real binary via
Stdio::from(...) for no-intercept passthrough (and the allow_commands
sub-branch and admin_passthrough), then wait()s. Capture/Respond/Approve
drop the passed fds and keep the existing buffered behaviour so they can
still inspect or transform the output.

The shim's stdin field is removed from ShimRequest (so was the
read_stdin_nonblocking helper); shim and server are versioned together,
no wire compatibility is needed.

Tests: added a streaming socketpair harness, a binary-roundtrip test
covering 0xFF bytes through stdin/stdout, and a Respond-path test that
verifies the dropped fds let the test side see EOF.
… command

Adds an optional CallerPolicy on each CommandEntry:

- agent_allowed: bool (default true) — whether the primary sandbox
  (no NONO_SANDBOX_CONTEXT) may invoke this command.
- allowed_parents: Option<Vec<String>> — restrict which mediated parents
  may invoke. None (field absent) accepts any parent (existing
  behaviour). Some([]) blocks every mediated parent. Some(["git"])
  permits only the listed parents.

The gate fires at the top of apply(), before the existing
allow_commands skip-intercepts branch, so a parent allowed by
allowed_parents still flows through unchanged. Rejected calls
return exit 126 with action_type "denied".

Defaults preserve full backward compatibility: a profile with no
caller_policy field on a command keeps current behaviour (agent +
any parent allowed).

Use case: ssh / ssh-keygen in shadowfax — set agent_allowed=false,
allowed_parents=["git"] so a malicious prompt cannot invoke them
directly to authenticate to attacker-controlled hosts (or sign
arbitrary data) using the user's keys via the per-command sandbox.
Signed-off-by: James Carnegie <me@kipz.org>
Sending three separate sendmsg(SCM_RIGHTS) calls for stdin/stdout/stderr
fails with EMSGSIZE (os error 40) on macOS when the socket receive buffer
already holds a large JSON request — as happens when git is invoked from
within gh's execution sandbox, which adds proxy and session env vars.

Replace the three individual sends with a single sendmsg carrying all
three fds in one SCM_RIGHTS control message, and update recv_three_fds
to receive them in a matching single recvmsg call.

This is the standard idiom for passing multiple fds and avoids the
macOS control-buffer constraint entirely.

Fixes: DataDog/shadowfax#95
The shim now sends its own cwd in `ShimRequest.cwd`, and the server
sets it on the spawned binary via `Command::current_dir`.

Without this, the spawned binary inherited the mediation server's
launch cwd. Tools that resolve config from cwd silently operated on
the wrong target — `git` in a worktree being the canonical case:
discovery would walk up from the server's cwd, find the wrong `.git`,
and report the wrong branch and toplevel.

The new field is `Option<String>` with `#[serde(default)]`:

- old shim → new server: missing field → legacy behaviour (server cwd)
- new shim → old server: extra field is ignored
- unreadable cwd: shim sends None → legacy behaviour
- non-directory cwd: server logs a warning and falls back to its own cwd

Adds `test_passthrough_uses_request_cwd`: drives `apply()` end-to-end
with a real `/bin/pwd` and asserts the spawned process prints the
caller's cwd, not the server's.
The mediation/* and nono-shim formatting drift accumulated across the 13-
commit mediation series. Folding the fmt fixes into the originating commits
caused conflicts because subsequent mediation commits re-touched the same
lines, so capture the cumulative fmt result here instead.

Signed-off-by: James Carnegie <me@kipz.org>
kipz and others added 4 commits May 6, 2026 19:24
When nono creates an audit shim for a binary at session start, the shim
later runs `resolve_real_binary` which re-walks PATH to find the real
target. Intermediate shells (e.g. husky pre-commit hooks, lint-staged
workers) often munge PATH between session start and shim invocation,
stripping user toolchain dirs that contained the real binary. The walk
then returns nothing and the shim reports `nono-shim: <name>: command
not found` even though the real binary is still on disk at the path
nono saw at session start.

This change makes audit shim resolution deterministic by recording the
resolved absolute path at session start and consulting it first at exec
time:

- A new sibling dir `<session_dir>/shim-sources/` holds one sidecar
  per shim — `<name>` containing the absolute path of the binary
  the shim was created for. Both mediated commands and universal
  audit shims write a sidecar.
- `SessionHandle` exposes `shim_sources_dir` and the path is forwarded
  to mediated subprocesses via a new `NONO_SHIM_SOURCES_DIR` env var
  (alongside `NONO_SHIM_DIR`). The session-level dir is reused for
  filtered per-command sandboxes — sidecars are created once and
  shared, since the recorded paths do not change.
- `nono-shim::resolve_real_binary` now consults
  `NONO_SHIM_SOURCES_DIR/<name>` first and only falls back to the
  existing PATH walk if the sidecar is missing or its recorded path
  is no longer an executable file.

Tests:
- 8 new unit tests in nono-shim cover sidecar hits, missing dirs,
  trimming, deleted/non-executable targets, and the PATH-walk
  fallback.
- 2 new unit tests in mediation::session cover sidecar writes and
  overwrites.
- The existing `test_allow_commands_sets_nono_shim_dir_to_filtered_dir`
  test is extended to assert `NONO_SHIM_SOURCES_DIR` is forwarded
  unchanged into the per-command sandbox env.
fix(mediation): record audit-shim source paths to survive PATH munging
…ediation foundation

Transplants the BPF-LSM mediation and protected-subtree work from
drewmchugh:am/linux-exec-filter-bpf-lsm onto kipz/develop (v0.47.0).

Key changes:
- Add crates/nono/src/bpf/ (mediation.bpf.c, vmlinux.h) and the libbpf-rs
  loader (bpf_lsm.rs) + audit ring-buffer reader (bpf_audit.rs)
- Update nono/Cargo.toml + build.rs to compile the BPF program at build time
- Update sandbox/mod.rs to expose bpf_lsm + bpf_audit modules
- Add nono-cli bpf-lsm feature that propagates nono/bpf-lsm
- Wire protected_paths field through PreparedSandbox → ExecutionFlags →
  SupervisedRuntimeContext → exec_strategy::SupervisorConfig
- Add mediation_filter_state() helper in execution_runtime to extract
  deny_set + shim/audit dirs from SessionHandle
- Fix mediation/policy.rs Sandbox::apply() return type mismatch (pre-existing
  develop bug: Linux returns SeccompNetFallback not ())
- Add bpf_lsm BPF-LSM skip in policy::validate_deny_overlaps
- Add bpf_lsm_protected_roots_for_session() in protected_paths (reads
  filesystem.deny from new schema)
- Add integration tests: bpf_lsm_integration, bpf_lsm_protected_subtree
- Add smoke tests: bpf_lsm_smoke
- Add qa-profiles/04-allow-parent-of-protected.json sample profile
- Add docs/linux-bpf-lsm-mediation.md design doc

Signed-off-by: Andrew McHugh <andrew.mchugh@datadoghq.com>
…t wiring

- Add protected_paths: Vec::new() to PreparedSandbox test initializers in main.rs
- Fix clippy unwrap_used lint in mediation/mod.rs tests (unwrap → expect)
- Fix clippy loop-index lint in mediation/server.rs (for i in 0..n → enumerate)
- Run cargo fmt to fix whitespace in bpf_lsm.rs and mediation/policy.rs

Signed-off-by: Andrew McHugh <andrew.mchugh@datadoghq.com>
@drewmchugh drewmchugh force-pushed the am/linux-exec-filter-bpf-lsm branch 2 times, most recently from af1ff8b to 18ac8cb Compare May 7, 2026 14:01
@drewmchugh drewmchugh changed the title feat(linux): BPF-LSM mediation filter (closes exec + read bypasses) feat(linux): linux BPF-LSM mediation filter (closes exec + read bypasses) May 7, 2026
drewmchugh and others added 2 commits May 7, 2026 17:05
BPF audit records for non-mediated execs had empty command and path
fields. The audit_record struct carried only (dev, ino); userspace
resolved those to a path via inode_to_path, which is built from the
mediation deny set. Non-deny-set binaries were not in that table so
both fields were left empty.

Fix: add char filename[256] to struct audit_record and populate it in
check_exec via bpf_probe_read_kernel_str(bprm->filename). On the Rust
side, decode the field as a PathBuf fallback in handle_record when the
inode_to_path lookup returns nothing. The deny-set lookup still wins
when present (canonical path). As a side effect, the shim-suppression
check now correctly fires for shim-routed commands whose path_str was
previously always empty.

Signed-off-by: Andrew McHugh <andrew.mchugh@datadoghq.com>
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Andrew McHugh <andrew.mchugh@datadoghq.com>
The Approve action previously dropped the shim's stdin/stdout/stderr
fds and ran the real binary (or script) with null stdin and piped
stdout/stderr buffered into ShimResponse. This worked for the original
use cases — ddtool auth gitlab login, ddtool auth token X — because
none of them read stdin and their stdout is just session-info prints
that the shim relays to its caller's stdout transparently.

It breaks for any binary that follows the Docker credential-helper
protocol, where the caller writes the registry hostname to stdin and
expects JSON creds back on stdout. docker-credential-ddtool launched
through Approve was getting empty stdin and returning "no credentials
server URL", which surfaces in Bazel's rules_oci_bootstrap as:

    failed to run credential helper, stdout: no credentials server URL

This was first hit on Datadog Linux workspaces where the Docker
auth flow is multi-process (docker-credential-ddtool -> pass -> gpg)
and can't be cleanly held inside a per-command sandbox, forcing the
choice of Approve to escape the sandbox — exposing the gap.

Switch Approve to streaming mode: pass the shim's stdio fds into
exec_passthrough/exec_script via Stdio::from(...), call wait()
instead of wait_with_output(), return an empty ShimResponse stdout/
stderr (only exit code matters; the binary's output streamed
directly to the caller). Buffered mode is preserved for Respond
and Capture, which still need to inspect stdout (Capture issues a
nonce in place of it).

Behavior change for existing Approve usages: unchanged in practice.
None of them read stdin, so connecting stdin doesn't break anything.
They write to stdout, which previously was buffered into ShimResponse
and relayed back; now it streams directly with the same end result
for the user. The only observable delta is that nono's audit log
entries for Approve actions no longer carry stdout/stderr content.

Add stdio_fds parameter to exec_script. Capture's script path passes
None (preserves buffered behaviour for nonce issuance); Approve's
script path passes Some(stdio_fds) for streaming.

Verified: bzl build of an internal Datadog target inside a sandboxed
session now completes through the docker-credential-helper path.

Signed-off-by: Andrew McHugh <andrew.mchugh@datadoghq.com>
@kipz kipz force-pushed the develop branch 3 times, most recently from 86b464a to 4f89e43 Compare June 4, 2026 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants