Summary
Research findings and recommendations for supporting alternative OCI-compliant container runtimes (gVisor/runsc, Docker SBX/Kata Containers) in AWF, while maintaining security guarantees for network isolation, volume mounts, and syscall filtering.
Current Architecture
AWF currently uses the default Docker runtime (runc) with extensive security hardening:
| Layer |
Mechanism |
Files |
| Syscall filtering |
Seccomp deny-by-default profile (~350 allowed syscalls) |
containers/agent/seccomp-profile.json |
| Capabilities |
Agent: cap_add: SYS_CHROOT, SYS_ADMIN (dropped before user code), cap_drop: NET_RAW, SYS_PTRACE, SYS_MODULE, SYS_RAWIO, MKNOD |
src/services/agent-service.ts:97-105 |
| Network isolation |
iptables DNAT via init container sharing agent network namespace |
containers/agent/setup-iptables.sh, src/host-iptables-rules.ts |
| Filesystem isolation |
chroot to /host with selective bind mounts (system dirs RO, workspace RW) |
containers/agent/entrypoint.sh, src/services/agent-volumes.ts |
| Privilege escalation |
no-new-privileges:true, UID/GID remapping, capability drop via capsh |
src/services/agent-service.ts:111, containers/agent/entrypoint.sh:356-365 |
| Process limits |
pids_limit: 1000, mem_limit: 6g |
src/services/agent-service.ts:119-122 |
| AppArmor |
Set to unconfined (required for procfs mount, safe because SYS_ADMIN dropped before user code) |
src/services/agent-service.ts:113 |
Docker-Specific Dependencies
AWF is tightly coupled to Docker in these areas:
- Docker Compose v3+ — Orchestrates all containers (
docker compose up/down/logs/wait)
- Docker CLI commands —
docker inspect, docker logs, docker network create/rm, docker rm -f
- Docker bridge networking — Fixed subnet
172.30.0.0/24 with custom bridge name fw-bridge
- Docker socket — Optional DinD support via
/var/run/docker.sock mount
network_mode: service:agent — iptables-init shares agent's network namespace
- Docker healthchecks — Service dependency ordering (Squid → Agent → iptables-init)
security_opt — Seccomp profile, no-new-privileges, AppArmor configuration
- tmpfs overlays — Hide sensitive files (docker-compose.yml, MCP logs)
Runtime Analysis
gVisor (runsc)
gVisor interposes a user-space kernel (the "Sentry") between the container and the host kernel, intercepting all syscalls. This provides stronger isolation than seccomp alone.
Compatibility Assessment
| AWF Feature |
gVisor Support |
Impact |
| iptables |
⚠️ Partial — Only supports featureset for Docker-in-gVisor. DNAT rules may not work. |
CRITICAL — AWF's entire network security relies on iptables DNAT to Squid |
| chroot |
✅ Full support (syscall 161) |
Compatible |
| Seccomp profiles |
⚠️ Redundant — gVisor already intercepts all syscalls. Seccomp applied to the Sentry, not the sandbox. |
Need to verify AWF's seccomp profile doesn't conflict |
| Capabilities |
✅ capget/capset fully supported |
Compatible |
network_mode: service: |
⚠️ Requires all containers sharing a network namespace to use the same runtime |
Must apply --runtime=runsc consistently |
| Bind mounts / volumes |
✅ Supported, but no block device filesystems (ext4 etc.) inside sandbox |
AWF only uses bind mounts — compatible |
| tmpfs |
✅ Fully supported |
Compatible |
| procfs |
✅ Supported (gVisor provides its own /proc) |
May need adjustment — AWF mounts fresh /host/proc |
| Healthchecks |
✅ Supported via Docker |
Compatible |
| Resource limits |
⚠️ cgroups for accounting only, not enforcement within sandbox |
mem_limit and pids_limit may not be enforced |
| AppArmor |
⚠️ Not applicable inside gVisor sandbox |
Harmless — already unconfined |
| OCI image format |
✅ Fully compatible |
No image changes needed |
Critical Issue: iptables in gVisor
From gVisor docs: "iptables are only partially supported. The general goal is to support the featureset necessary to be able to run Docker in gVisor, but not necessarily further."
AWF's iptables rules in setup-iptables.sh include:
iptables -t nat -A OUTPUT ... -j DNAT --to-destination 172.30.0.10:3128 (redirect HTTP/HTTPS to Squid)
iptables -A OUTPUT ... -j DROP (block dangerous ports)
iptables -A OUTPUT ... -j LOG (audit logging)
ip6tables rules (IPv6 blocking)
If gVisor's iptables doesn't support DNAT rules, the entire AWF network security model breaks. Traffic would bypass Squid and reach the internet directly.
Mitigation Strategies for iptables
-
Run iptables-init with runc, agent with gVisor: Use --runtime=runsc only for the agent container. The iptables-init container (which shares the agent's network namespace via network_mode: service:agent) would need the same runtime, making this approach unworkable since it needs full iptables.
-
Host-level network isolation instead: Move ALL iptables rules to the host's DOCKER-USER chain (AWF already does this partially in src/host-iptables-rules.ts). The host runs native Linux, so iptables always works. This would make the in-container iptables-init redundant when using gVisor.
-
Use gVisor's network passthrough mode: Configure --network=host for runsc so gVisor uses the host network stack. But this defeats gVisor's network isolation benefits.
-
Use gVisor's netstack with proxy env vars only: Rely entirely on HTTP_PROXY/HTTPS_PROXY env vars (which AWF already sets) plus gVisor's netstack isolation, dropping iptables DNAT as defense-in-depth. Acceptable if gVisor's syscall interposition is considered sufficient to prevent proxy bypass.
Docker SBX / Kata Containers
Kata Containers runs each container inside a lightweight VM (using QEMU, Cloud Hypervisor, or Firecracker). This provides VM-level isolation with OCI compatibility.
Compatibility Assessment
| AWF Feature |
Kata Support |
Impact |
| iptables |
✅ Full — runs a real Linux kernel inside the VM |
Compatible |
| chroot |
✅ Full — real Linux kernel |
Compatible |
| Seccomp |
✅ Applied inside the guest VM |
Compatible |
| Capabilities |
✅ Full Linux capability model |
Compatible |
| Network namespace sharing |
⚠️ Complex — each Kata container is a separate VM |
CRITICAL — network_mode: service:agent won't work natively |
| Bind mounts |
⚠️ File sharing between host and VM uses virtio-fs or 9pfs — performance overhead |
Works but slower I/O |
| tmpfs |
✅ Supported |
Compatible |
| Resource limits |
✅ Enforced at VM level |
Better isolation than cgroups |
| OCI image format |
✅ Fully compatible |
No image changes needed |
| Docker socket |
⚠️ Mounting host Docker socket into a VM is complex |
DinD support may break |
Critical Issue: Network Namespace Sharing
AWF's iptables-init pattern (network_mode: service:agent) requires containers to share a network namespace. Kata Containers run each container in a separate VM, making namespace sharing fundamentally incompatible.
Mitigation: Move iptables setup into the agent entrypoint itself (remove the separate init container) or use Kata's sandbox concept where multiple containers share a single VM.
Recommended Architecture
Option A: Runtime Abstraction Layer (Recommended)
Add a --container-runtime CLI flag that selects a runtime profile:
awf --container-runtime gvisor|kata|runc|auto ...
Each profile adjusts the security model:
| Aspect |
runc (default) |
gvisor |
kata |
| Syscall filtering |
Seccomp profile |
gVisor Sentry (seccomp optional) |
Seccomp inside VM |
| Network isolation |
iptables DNAT + Squid proxy |
Host iptables + Squid proxy (no in-container iptables) |
iptables inside VM + Squid proxy |
| iptables-init |
Separate init container |
Removed — host-level rules only |
Merged into agent entrypoint |
network_mode: service: |
Used |
Not used (unnecessary) |
Not used (same VM sandbox) |
| Capability grants |
SYS_CHROOT, SYS_ADMIN |
Minimal (gVisor handles isolation) |
SYS_CHROOT, SYS_ADMIN |
| Runtime flag |
(none) |
runtime: runsc in compose |
runtime: kata in compose |
Implementation Changes Required
1. Docker Compose Generation (src/compose-generator.ts, src/services/agent-service.ts)
// Add runtime field to DockerService interface (src/types/docker.ts)
interface DockerService {
// ... existing fields
runtime?: string; // 'runsc', 'kata-runtime', etc.
}
// In compose generation, conditionally set:
if (config.containerRuntime === 'gvisor') {
agentService.runtime = 'runsc';
// Remove iptables-init service entirely
// Remove network_mode: service:agent
// Adjust security_opt (no seccomp — gVisor handles it)
}
2. Network Security Refactoring
Move iptables rules to host level (src/host-iptables-rules.ts):
- Current: Host rules in
DOCKER-USER chain + container rules in iptables-init
- Proposed: Host rules handle ALL filtering for gVisor/Kata; container iptables-init only for runc
The host-level DOCKER-USER chain rules already exist and work regardless of container runtime. They need to be expanded to cover the DNAT-to-Squid functionality currently handled by the init container.
3. Volume Security
No changes needed — all runtimes support OCI bind mounts. For Kata:
- Bind mounts use virtio-fs (transparent to AWF)
- tmpfs overlays work inside the VM
- Performance may be lower for heavy I/O workloads
4. Image Compatibility
No image changes needed. All AWF images are standard OCI images:
ubuntu:22.04 (agent)
ubuntu/squid:latest (Squid)
node:22-alpine (API proxy, CLI proxy)
All runtimes (runc, runsc, kata) consume OCI images identically.
5. CLI and Configuration
// src/cli-options.ts - new option
.option('--container-runtime <runtime>',
'Container runtime to use (runc, gvisor, kata, auto)',
'runc')
// src/types/runtime-options.ts
containerRuntime?: 'runc' | 'gvisor' | 'kata' | 'auto';
auto mode would detect available runtimes and select the most secure option.
Security Analysis
Security Properties by Runtime
| Property |
runc + AWF hardening |
gVisor |
Kata |
| Kernel exploit protection |
❌ Shares host kernel |
✅ User-space kernel (Sentry) |
✅ Separate guest kernel |
| Syscall filtering |
Seccomp (350 allowed) |
Sentry intercepts all (~277 implemented) |
Seccomp + VM boundary |
| Network isolation |
iptables + Squid L7 proxy |
Netstack + Squid L7 proxy |
VM network + iptables + Squid L7 proxy |
| Filesystem isolation |
chroot + bind mounts |
gVisor overlay FS + bind mounts |
virtio-fs + bind mounts |
| Container escape risk |
Medium (kernel shared) |
Low (user-space kernel) |
Very low (VM boundary) |
| Performance overhead |
Baseline |
~10-30% syscall overhead |
~20-50% startup, I/O overhead |
Non-Negotiable Security Requirements
Regardless of runtime, these must be maintained:
- All HTTP/HTTPS traffic MUST route through Squid — Domain ACL enforcement is the core security guarantee
- Proxy env vars (
HTTP_PROXY, HTTPS_PROXY) MUST be set — For proxy-aware tools
- Dangerous ports MUST be blocked — SSH, SMTP, databases, Redis, etc.
- DNS MUST be restricted — Only whitelisted DNS servers
- Sensitive paths MUST NOT be mounted — No
/etc/shadow, no unwhitelisted home dirs
- Capabilities MUST be dropped before user code — No
NET_ADMIN, SYS_ADMIN at user code time
- OCI image format — All images must work across all runtimes without modification
Implementation Phases
Phase 1: Refactor iptables to support host-only mode
- Move DNAT rules from iptables-init container to host
DOCKER-USER chain
- Keep iptables-init as optional (for backward compatibility with runc)
- This unblocks gVisor support without requiring gVisor-specific iptables
Phase 2: Add --container-runtime flag
- Add CLI option and config file support
- Add
runtime: field to Docker Compose generation
- Conditionally skip iptables-init for non-runc runtimes
Phase 3: Runtime-specific security profiles
- gVisor: Simplified seccomp (or none), rely on Sentry
- Kata: Full seccomp inside VM, adjust resource limits
- Validation: Ensure all security tests pass with each runtime
Phase 4: CI/CD integration
- Add smoke tests for each supported runtime
- GitHub Actions runners with gVisor/Kata pre-installed
- Performance benchmarking across runtimes
Open Questions
-
gVisor iptables DNAT support: Need to empirically test whether iptables -t nat -A OUTPUT -p tcp --dport 443 -j DNAT --to-destination 172.30.0.10:3128 works inside a gVisor sandbox. The docs say "partial support" but don't enumerate supported features.
-
gVisor + Docker Compose runtime: field: Docker Compose v3 doesn't have a runtime: field. It was added in Compose v2 format. Need to verify compatibility or use docker run --runtime=runsc directly instead of Compose.
-
Kata + network namespace sharing: Can Kata's "sandbox" concept (multiple containers in one VM) replace network_mode: service:agent? Need to test.
-
Performance impact: What's the real-world performance overhead for typical AWF workloads (npm install, git clone, curl) under each runtime?
-
Docker SBX: "Docker SBX" appears to refer to Docker's sandbox mode using gVisor internally. Need to clarify whether this is a distinct product or just Docker + gVisor.
References
Summary
Research findings and recommendations for supporting alternative OCI-compliant container runtimes (gVisor/runsc, Docker SBX/Kata Containers) in AWF, while maintaining security guarantees for network isolation, volume mounts, and syscall filtering.
Current Architecture
AWF currently uses the default Docker runtime (
runc) with extensive security hardening:containers/agent/seccomp-profile.jsoncap_add: SYS_CHROOT, SYS_ADMIN(dropped before user code),cap_drop: NET_RAW, SYS_PTRACE, SYS_MODULE, SYS_RAWIO, MKNODsrc/services/agent-service.ts:97-105containers/agent/setup-iptables.sh,src/host-iptables-rules.ts/hostwith selective bind mounts (system dirs RO, workspace RW)containers/agent/entrypoint.sh,src/services/agent-volumes.tsno-new-privileges:true, UID/GID remapping, capability drop viacapshsrc/services/agent-service.ts:111,containers/agent/entrypoint.sh:356-365pids_limit: 1000,mem_limit: 6gsrc/services/agent-service.ts:119-122unconfined(required for procfs mount, safe because SYS_ADMIN dropped before user code)src/services/agent-service.ts:113Docker-Specific Dependencies
AWF is tightly coupled to Docker in these areas:
docker compose up/down/logs/wait)docker inspect,docker logs,docker network create/rm,docker rm -f172.30.0.0/24with custom bridge namefw-bridge/var/run/docker.sockmountnetwork_mode: service:agent— iptables-init shares agent's network namespacesecurity_opt— Seccomp profile, no-new-privileges, AppArmor configurationRuntime Analysis
gVisor (runsc)
gVisor interposes a user-space kernel (the "Sentry") between the container and the host kernel, intercepting all syscalls. This provides stronger isolation than seccomp alone.
Compatibility Assessment
capget/capsetfully supportednetwork_mode: service:--runtime=runscconsistently/host/procmem_limitandpids_limitmay not be enforcedunconfinedCritical Issue: iptables in gVisor
From gVisor docs: "iptables are only partially supported. The general goal is to support the featureset necessary to be able to run Docker in gVisor, but not necessarily further."
AWF's iptables rules in
setup-iptables.shinclude:iptables -t nat -A OUTPUT ... -j DNAT --to-destination 172.30.0.10:3128(redirect HTTP/HTTPS to Squid)iptables -A OUTPUT ... -j DROP(block dangerous ports)iptables -A OUTPUT ... -j LOG(audit logging)ip6tablesrules (IPv6 blocking)If gVisor's iptables doesn't support DNAT rules, the entire AWF network security model breaks. Traffic would bypass Squid and reach the internet directly.
Mitigation Strategies for iptables
Run iptables-init with runc, agent with gVisor: Use
--runtime=runsconly for the agent container. The iptables-init container (which shares the agent's network namespace vianetwork_mode: service:agent) would need the same runtime, making this approach unworkable since it needs full iptables.Host-level network isolation instead: Move ALL iptables rules to the host's
DOCKER-USERchain (AWF already does this partially insrc/host-iptables-rules.ts). The host runs native Linux, so iptables always works. This would make the in-container iptables-init redundant when using gVisor.Use gVisor's network passthrough mode: Configure
--network=hostfor runsc so gVisor uses the host network stack. But this defeats gVisor's network isolation benefits.Use gVisor's netstack with proxy env vars only: Rely entirely on
HTTP_PROXY/HTTPS_PROXYenv vars (which AWF already sets) plus gVisor's netstack isolation, dropping iptables DNAT as defense-in-depth. Acceptable if gVisor's syscall interposition is considered sufficient to prevent proxy bypass.Docker SBX / Kata Containers
Kata Containers runs each container inside a lightweight VM (using QEMU, Cloud Hypervisor, or Firecracker). This provides VM-level isolation with OCI compatibility.
Compatibility Assessment
network_mode: service:agentwon't work nativelyCritical Issue: Network Namespace Sharing
AWF's iptables-init pattern (
network_mode: service:agent) requires containers to share a network namespace. Kata Containers run each container in a separate VM, making namespace sharing fundamentally incompatible.Mitigation: Move iptables setup into the agent entrypoint itself (remove the separate init container) or use Kata's
sandboxconcept where multiple containers share a single VM.Recommended Architecture
Option A: Runtime Abstraction Layer (Recommended)
Add a
--container-runtimeCLI flag that selects a runtime profile:Each profile adjusts the security model:
runc(default)gvisorkatanetwork_mode: service:SYS_CHROOT,SYS_ADMINSYS_CHROOT,SYS_ADMINruntime: runscin composeruntime: katain composeImplementation Changes Required
1. Docker Compose Generation (
src/compose-generator.ts,src/services/agent-service.ts)2. Network Security Refactoring
Move iptables rules to host level (
src/host-iptables-rules.ts):DOCKER-USERchain + container rules in iptables-initThe host-level
DOCKER-USERchain rules already exist and work regardless of container runtime. They need to be expanded to cover the DNAT-to-Squid functionality currently handled by the init container.3. Volume Security
No changes needed — all runtimes support OCI bind mounts. For Kata:
4. Image Compatibility
No image changes needed. All AWF images are standard OCI images:
ubuntu:22.04(agent)ubuntu/squid:latest(Squid)node:22-alpine(API proxy, CLI proxy)All runtimes (runc, runsc, kata) consume OCI images identically.
5. CLI and Configuration
automode would detect available runtimes and select the most secure option.Security Analysis
Security Properties by Runtime
Non-Negotiable Security Requirements
Regardless of runtime, these must be maintained:
HTTP_PROXY,HTTPS_PROXY) MUST be set — For proxy-aware tools/etc/shadow, no unwhitelisted home dirsNET_ADMIN,SYS_ADMINat user code timeImplementation Phases
Phase 1: Refactor iptables to support host-only mode
DOCKER-USERchainPhase 2: Add
--container-runtimeflagruntime:field to Docker Compose generationPhase 3: Runtime-specific security profiles
Phase 4: CI/CD integration
Open Questions
gVisor iptables DNAT support: Need to empirically test whether
iptables -t nat -A OUTPUT -p tcp --dport 443 -j DNAT --to-destination 172.30.0.10:3128works inside a gVisor sandbox. The docs say "partial support" but don't enumerate supported features.gVisor + Docker Compose
runtime:field: Docker Compose v3 doesn't have aruntime:field. It was added in Compose v2 format. Need to verify compatibility or usedocker run --runtime=runscdirectly instead of Compose.Kata + network namespace sharing: Can Kata's "sandbox" concept (multiple containers in one VM) replace
network_mode: service:agent? Need to test.Performance impact: What's the real-world performance overhead for typical AWF workloads (npm install, git clone, curl) under each runtime?
Docker SBX: "Docker SBX" appears to refer to Docker's sandbox mode using gVisor internally. Need to clarify whether this is a distinct product or just Docker + gVisor.
References
containers/agent/setup-iptables.sh,src/host-iptables-rules.tssrc/compose-generator.ts,src/services/agent-service.tscontainers/agent/seccomp-profile.json,containers/agent/entrypoint.sh