Skip to content
90 changes: 90 additions & 0 deletions docs/adr/31614-auto-detect-arc-dind-docker-host-path-prefix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# ADR-31614: Auto-Detect ARC/DinD and Emit AWF `--docker-host-path-prefix` at Runtime

**Date**: 2026-05-12
**Status**: Draft
**Deciders**: pelikhan, Copilot

---

## Part 1 — Narrative (Human-Friendly)

### Context

AWF-backed workflows running on Actions Runner Controller (ARC) with the Docker-in-Docker (DinD) sidecar pattern have a split filesystem: the runner container and the Docker daemon container do not share `/tmp` and other staging paths, so bind-mounts that AWF constructs from the runner's perspective fail inside the daemon. AWF supports a `--docker-host-path-prefix` flag to bridge this split (`v0.25.43`+), but until now `gh-aw`'s compiler did not emit it. Users on ARC DinD had to author per-workflow `sandbox.agent.args` hacks or pass the flag manually, even though the topology is detectable at runtime from a single environment variable (`DOCKER_HOST`). The fix needs to be safe for non-DinD runners and for pinned AWF versions that do not understand the flag.

### Decision

We will inject a generated bash probe into every AWF command that inspects `DOCKER_HOST` at runtime, matches it against `^tcp://(localhost|127\.0\.0\.1)(:[0-9]+)?$`, and conditionally appends `--docker-host-path-prefix /tmp/gh-aw` to the `awf` invocation when the topology looks like ARC DinD. The probe is gated by a new version constant `AWFDockerHostPathPrefixMinVersion = v0.25.43` and the helper `awfSupportsDockerHostPathPrefix(...)`, so workflows pinned to older AWF versions never receive the unsupported flag. The bash regex, the hardcoded prefix path, and the shell variable name are centralized as package-level constants in `pkg/workflow/awf_helpers.go` to keep the generated script deterministic and grep-able.

### Alternatives Considered

#### Alternative 1: Require users to opt in via workflow frontmatter

We could add a YAML field such as `runtime.arc-dind: true` that explicitly tells the compiler to emit `--docker-host-path-prefix`. This would keep the generated script smaller for the common GitHub-hosted-runner case, but it pushes platform topology knowledge onto every workflow author, contradicts the goal of "first-class" ARC DinD support, and means a misconfigured runner produces a confusing failure at execution time rather than a self-correcting probe. We rejected this because the topology is observable from `DOCKER_HOST` with no false positives expected for typical GitHub-hosted runners (which do not export a localhost TCP `DOCKER_HOST`).

#### Alternative 2: Detect DinD at compile time from runner labels

The compiler could inspect `runs-on:` and emit the flag only when the workflow targets an ARC runner. This would avoid generating runtime shell on hosted runners entirely. However, ARC runner labels are arbitrary user strings (`self-hosted`, `arc-runners`, custom pool names), there is no reliable way to distinguish "ARC with DinD sidecar" from "ARC with rootless Docker" or "self-hosted bare metal" at compile time, and the detection would need to be re-implemented every time a new ARC topology emerged. The runtime probe relies on a single facts-based check (`DOCKER_HOST` shape) that is invariant across ARC versions.

#### Alternative 3: Push the detection into AWF itself

AWF could implement the same `DOCKER_HOST` regex internally and self-apply the path prefix when needed, removing the responsibility from `gh-aw`. This is the cleanest long-term design but requires an AWF release, coordinated version pinning across the ecosystem, and a fallback path in `gh-aw` for older AWF versions anyway. Doing the detection in `gh-aw` ships the fix immediately and is reversible: once a future AWF version self-detects, the compiler can stop emitting the probe behind a higher version gate.

### Consequences

#### Positive
- AWF-backed workflows on ARC DinD now work out of the box without user-authored `sandbox.agent.args` or manual `--docker-host-path-prefix` flags.
- The probe is a no-op on GitHub-hosted runners and any environment where `DOCKER_HOST` is unset or points at a Unix socket, so existing users see no behavioral change.
- The version gate prevents breakage for workflows pinned to AWF `< v0.25.43`, which would otherwise fail with an unknown-flag error.
- Detection logic lives next to the AWF command builder where reviewers can audit it together with related flag emission (`--allow-host-ports`, `--cli-proxy`, etc.).

#### Negative
- Every generated AWF command on a supported version now includes ~4 extra lines of shell, even on runners that will never match the regex. This adds noise to the compiled `.lock.yml` output.
- The bash regex must stay synchronized with whatever `DOCKER_HOST` shapes ARC DinD actually exports; if a future ARC release uses a non-localhost TCP host or a Unix socket forwarded over a different scheme, the probe will silently miss it.
- The path prefix `/tmp/gh-aw` is hardcoded; workflows that customize the runner temp directory will not see it overridden by this probe (though no such customization is currently supported in `gh-aw`).

#### Neutral
- A new minimum-version constant (`AWFDockerHostPathPrefixMinVersion`) joins the existing family of AWF version gates (`AWFCliProxyMinVersion`, `AWFAllowHostPortsMinVersion`), continuing the established pattern.
- Lock files generated against `latest` or `v0.25.43+` will now contain the probe; downstream consumers comparing pre/post-change lock diffs should expect this new block.
- The probe's three-line structure (init → conditional → flag assignment) is asserted by a test (`TestFirewallArgsInCopilotEngine`) that pins the ordering, making accidental refactors detectable.

---

## Part 2 — Normative Specification (RFC 2119)

> The key words **MUST**, **MUST NOT**, **REQUIRED**, **SHALL**, **SHALL NOT**, **SHOULD**, **SHOULD NOT**, **RECOMMENDED**, **MAY**, and **OPTIONAL** in this section are to be interpreted as described in [RFC 2119](https://www.rfc-editor.org/rfc/rfc2119).

### Runtime ARC/DinD Probe Emission

1. When `awfSupportsDockerHostPathPrefix(firewallConfig)` returns true, `BuildAWFCommand` **MUST** prepend a bash probe block to the generated AWF command that initializes the variable `GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS` to the empty string, tests `DOCKER_HOST` against the regex `^tcp://(localhost|127\.0\.0\.1)(:[0-9]+)?$`, and on match assigns the variable to `"--docker-host-path-prefix /tmp/gh-aw"`.
2. The generated `awf` invocation **MUST** reference the probe variable via `${GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS}` (unquoted) so the flag expands to zero arguments when the probe does not match.
3. The probe block **MUST** appear before the `awf` invocation in the generated script so that the variable is defined by the time the command is executed.
4. The probe's three textual fragments (variable initialization, `if [[ ... =~ ... ]]; then`, flag assignment) **MUST** appear in that order in the generated output; reordering breaks the `TestFirewallArgsInCopilotEngine` conformance test.
5. The bash regex used in the probe **MUST** remain compatible with bash `[[ ... =~ ... ]]` matching and **MUST NOT** be rewritten to a different shell's regex dialect (e.g. POSIX BRE).

### Version Gating

1. The constant `AWFDockerHostPathPrefixMinVersion` **MUST** be the single source of truth for the minimum AWF version that supports `--docker-host-path-prefix`.
2. When `firewallConfig` is `nil` or its `Version` field is empty, `awfSupportsDockerHostPathPrefix` **MUST** fall back to `constants.DefaultFirewallVersion` and gate accordingly.
3. When the configured AWF version is the literal string `"latest"` (case-insensitive), `awfSupportsDockerHostPathPrefix` **MUST** return `true`.
4. When the configured AWF version is less than `AWFDockerHostPathPrefixMinVersion` per semver comparison, `BuildAWFCommand` **MUST NOT** emit the probe block, the variable reference, or the `--docker-host-path-prefix` flag.
5. New version-gated AWF flags **SHOULD** follow the same three-part pattern: (a) a `AWF<Flag>MinVersion` constant in `pkg/constants/version_constants.go`, (b) a `awfSupports<Flag>(firewallConfig)` helper in `pkg/workflow/awf_helpers.go`, (c) conditional emission inside `BuildAWFCommand` or `BuildAWFArgs`.

### Hardcoded Constants

1. The shell variable name `GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS`, the regex `^tcp://(localhost|127\.0\.0\.1)(:[0-9]+)?$`, and the flag value `--docker-host-path-prefix /tmp/gh-aw` **MUST** be defined as package-level constants in `pkg/workflow/awf_helpers.go` and **MUST NOT** be duplicated as string literals elsewhere in the codebase.
2. The path prefix `/tmp/gh-aw` **MUST** match the workspace root that `gh-aw` uses for staging on the runner; changing one **MUST** be accompanied by changing the other.

### Test Coverage

1. Tests **MUST** verify that the probe block is emitted (in the correct order) for the default AWF version and for `latest`.
2. Tests **MUST** verify that the probe block is not emitted when the configured AWF version is below `AWFDockerHostPathPrefixMinVersion` (e.g. `v0.25.42`).
3. Tests **MUST** verify the exact regex string and flag assignment text so that accidental rewording of the generated probe is caught at compile time.

### Conformance

An implementation is considered conformant with this ADR if it satisfies all **MUST** and **MUST NOT** requirements above. Failure to meet any **MUST** or **MUST NOT** requirement constitutes non-conformance.

---

*ADR created by [adr-writer agent]. Review and finalize before changing status from Draft to Accepted.*
5 changes: 5 additions & 0 deletions pkg/constants/version_constants.go
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,11 @@ const AWFCliProxyMinVersion Version = "v0.25.17"
// --allow-host-ports or the run will fail at startup with an unknown flag error.
const AWFAllowHostPortsMinVersion Version = "v0.25.24"

// AWFDockerHostPathPrefixMinVersion is the minimum AWF version that supports the
// --docker-host-path-prefix flag used for ARC/DinD split runner/daemon filesystems.
// Workflows pinning an older AWF version must not emit this flag.
const AWFDockerHostPathPrefixMinVersion Version = "v0.25.43"

// CopilotNoAskUserMinVersion is the minimum Copilot CLI version that supports the --no-ask-user
// flag, which enables fully autonomous agentic runs by suppressing interactive prompts.
// Workflows using an older Copilot CLI version must not emit --no-ask-user or the run will fail.
Expand Down
65 changes: 61 additions & 4 deletions pkg/workflow/awf_helpers.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,15 @@ import (

var awfHelpersLog = logger.New("workflow:awf_helpers")

const (
awfArcDindPrefixArgsVarName = "GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS"
// Bash regex used in [[ ... =~ ... ]] to detect localhost TCP Docker hosts.
// Keep this in bash-compatible syntax (escaped dots) because it is emitted directly
// into generated shell scripts.
awfArcDindDockerHostRegex = `^tcp://(localhost|127\.0\.0\.1)(:[0-9]+)?$`
awfArcDindHostPathPrefixFlag = "--docker-host-path-prefix /tmp/gh-aw"
)

// AWFCommandConfig contains configuration for building AWF commands.
// This struct centralizes all the parameters needed to construct an AWF-wrapped command.
type AWFCommandConfig struct {
Expand Down Expand Up @@ -88,6 +97,24 @@ func BuildAWFCommand(config AWFCommandConfig) string {
// and --mount "${RUNNER_TEMP}/...") are appended raw below so that shell variable
// expansion is not suppressed by single-quoting.
awfArgs := BuildAWFArgs(config)
firewallConfig := getFirewallConfig(config.WorkflowData)

// Auto-detect ARC/DinD split daemon topology at runtime and emit
// --docker-host-path-prefix when supported by the selected AWF version.
// This avoids requiring workflow-authored sandbox.agent.args for standard ARC DinD setups.
arcDindPrefixProbe := ""
arcDindPrefixArgsRef := ""
if awfSupportsDockerHostPathPrefix(firewallConfig) {
arcDindPrefixProbe = fmt.Sprintf(`%s=""
if [[ "${DOCKER_HOST:-}" =~ %s ]]; then
%s="%s"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] When awfSupportsDockerHostPathPrefix returns false, arcDindPrefixProbe is set to "" and arcDindPrefixArgsRef is also "". Each format string has %s\n%s\n — two consecutive %s slots that will both expand to empty strings, producing a stray blank line in the generated script on older AWF versions.

A test that captures the full command output for the old-version path and asserts there are no consecutive blank lines (or that the probe variable name does not appear at all) would pin this behavior explicitly and catch any future regression.

fi`,
awfArcDindPrefixArgsVarName,
awfArcDindDockerHostRegex,
awfArcDindPrefixArgsVarName,
awfArcDindHostPathPrefixFlag)
arcDindPrefixArgsRef = fmt.Sprintf("${%s}", awfArcDindPrefixArgsVarName)
}
Comment on lines +115 to +117

// Build the expandable args string for args that need shell variable expansion.
// These MUST be appended as raw (unescaped) strings because single-quoting would

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/zoom-out] The arcDindPrefixProbe injection is copy-pasted into all four fmt.Sprintf branches (lines ~193, ~212, ~229, ~245). These branches already differ only in which of pathSetup / preCreateLog / configFileSetup they include — this change makes each branch one line longer in two places, amplifying the existing duplication.

Consider extracting the command assembly into a small helper that accepts the optional preamble lines as a slice and joins them, so the ARC/DinD probe (and any future injections) only needs to be added once:

func assemblePipefailCommand(preambles []string, probe, awfCmd, expandableArgs, arcDindRef, awfArgs, wrapped, logFile string) string {
    body := strings.Join(append(preambles, probe), "\n")
    return fmt.Sprintf("set -o pipefail\n%s\n# shellcheck disable=SC1003\n%s %s %s %s \\\n  -- %s 2>&1 | tee -a %s", body, awfCmd, expandableArgs, arcDindRef, awfArgs, wrapped, logFile)
}

This would make future version-gated injections trivial and keep the logic in one place.

Expand Down Expand Up @@ -166,14 +193,17 @@ func BuildAWFCommand(config AWFCommandConfig) string {
%s
%s
%s
%s
# shellcheck disable=SC1003
%s %s %s \
%s %s %s %s \
-- %s 2>&1 | tee -a %s`,
config.PathSetup,
preCreateLog,
configFileSetup,
arcDindPrefixProbe,
awfCommand,
expandableArgs,
arcDindPrefixArgsRef,
shellJoinArgs(awfArgs),
shellWrappedCommand,
shellEscapeArg(config.LogFile))
Expand All @@ -182,39 +212,48 @@ func BuildAWFCommand(config AWFCommandConfig) string {
command = fmt.Sprintf(`set -o pipefail
%s
%s
%s
# shellcheck disable=SC1003
%s %s %s \
%s %s %s %s \
-- %s 2>&1 | tee -a %s`,
config.PathSetup,
preCreateLog,
arcDindPrefixProbe,
awfCommand,
expandableArgs,
arcDindPrefixArgsRef,
shellJoinArgs(awfArgs),
shellWrappedCommand,
shellEscapeArg(config.LogFile))
} else if configFileSetup != "" {
command = fmt.Sprintf(`set -o pipefail
%s
%s
%s
# shellcheck disable=SC1003
%s %s %s \
%s %s %s %s \
-- %s 2>&1 | tee -a %s`,
preCreateLog,
configFileSetup,
arcDindPrefixProbe,
awfCommand,
expandableArgs,
arcDindPrefixArgsRef,
shellJoinArgs(awfArgs),
shellWrappedCommand,
shellEscapeArg(config.LogFile))
} else {
command = fmt.Sprintf(`set -o pipefail
%s
%s
# shellcheck disable=SC1003
%s %s %s \
%s %s %s %s \
-- %s 2>&1 | tee -a %s`,
preCreateLog,
arcDindPrefixProbe,
awfCommand,
expandableArgs,
arcDindPrefixArgsRef,
shellJoinArgs(awfArgs),
shellWrappedCommand,
shellEscapeArg(config.LogFile))
Expand Down Expand Up @@ -684,3 +723,21 @@ func awfSupportsAllowHostPorts(firewallConfig *FirewallConfig) bool {
minVersion := string(constants.AWFAllowHostPortsMinVersion)
return semverutil.Compare(versionStr, minVersion) >= 0
}

// awfSupportsDockerHostPathPrefix returns true when the effective AWF version supports
// --docker-host-path-prefix.
func awfSupportsDockerHostPathPrefix(firewallConfig *FirewallConfig) bool {
var versionStr string
if firewallConfig != nil && firewallConfig.Version != "" {
versionStr = firewallConfig.Version
} else {
versionStr = string(constants.DefaultFirewallVersion)
}

if strings.EqualFold(versionStr, "latest") {
return true
}

minVersion := string(constants.AWFDockerHostPathPrefixMinVersion)
return semverutil.Compare(versionStr, minVersion) >= 0
}
42 changes: 42 additions & 0 deletions pkg/workflow/awf_helpers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1173,6 +1173,48 @@ func TestAWFSupportsAllowHostPorts(t *testing.T) {
}
}

// TestAWFSupportsDockerHostPathPrefix tests the awfSupportsDockerHostPathPrefix version gate.
func TestAWFSupportsDockerHostPathPrefix(t *testing.T) {
tests := []struct {
name string
firewallConfig *FirewallConfig
want bool
}{
{
name: "nil firewall config returns true (uses default version)",
firewallConfig: nil,
want: true,
},
{
name: "empty version returns true (uses default version)",
firewallConfig: &FirewallConfig{},
want: true,
},
{
name: "latest returns true",
firewallConfig: &FirewallConfig{Version: "latest"},
want: true,
},
{
name: "v0.25.43 supports --docker-host-path-prefix (exact minimum version)",
firewallConfig: &FirewallConfig{Version: "v0.25.43"},
want: true,
},
{
name: "v0.25.42 does not support --docker-host-path-prefix",
firewallConfig: &FirewallConfig{Version: "v0.25.42"},
want: false,
},
}

for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := awfSupportsDockerHostPathPrefix(tt.firewallConfig)
assert.Equal(t, tt.want, got, "awfSupportsDockerHostPathPrefix result")
})
}
}

// TestGetGeminiAPITarget tests the GetGeminiAPITarget helper that resolves the effective
// Gemini API target from GEMINI_API_BASE_URL in engine.env or the default endpoint.
func TestGetGeminiAPITarget(t *testing.T) {
Expand Down
34 changes: 34 additions & 0 deletions pkg/workflow/firewall_args_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,17 @@ func TestFirewallArgsInCopilotEngine(t *testing.T) {
t.Error("Expected command to contain '--log-level'")
}

initSnippet := `GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS=""`
conditionSnippet := `if [[ "${DOCKER_HOST:-}" =~ ^tcp://(localhost|127\.0\.0\.1)(:[0-9]+)?$ ]]; then`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The conditionSnippet string duplicates the regex literal from the awfArcDindDockerHostRegex package-level constant in awf_helpers.go. If the regex ever changes (e.g., to add IPv6 support), this test will pass with the old string while the real probe uses the new one.

Reference the constant directly so the test always stays in sync:

conditionSnippet := fmt.Sprintf(`if [[ "${DOCKER_HOST:-}" =~ %s ]]; then`, awfArcDindDockerHostRegex)

flagAssignmentSnippet := `GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="--docker-host-path-prefix /tmp/gh-aw"`

initIdx := strings.Index(stepContent, initSnippet)
conditionIdx := strings.Index(stepContent, conditionSnippet)
flagIdx := strings.Index(stepContent, flagAssignmentSnippet)
if initIdx == -1 || conditionIdx == -1 || flagIdx == -1 || !(initIdx < conditionIdx && conditionIdx < flagIdx) {
t.Error("Expected command to initialize ARC/DinD probe variable, then evaluate DOCKER_HOST condition, then assign docker-host-path-prefix flag")
}

// Verify that --log-dir is included in copilot args for log collection
if !strings.Contains(stepContent, "--log-dir /tmp/gh-aw/sandbox/agent/logs/") {
t.Error("Expected copilot command to contain '--log-dir /tmp/gh-aw/sandbox/agent/logs/' for log collection in firewall mode")
Expand Down Expand Up @@ -211,6 +222,29 @@ func TestFirewallArgsInCopilotEngine(t *testing.T) {
}
})

t.Run("skips docker-host-path-prefix probe when AWF version is too old", func(t *testing.T) {
workflowData := &WorkflowData{
Name: "test-workflow",
EngineConfig: &EngineConfig{
ID: "copilot",
},
NetworkPermissions: &NetworkPermissions{
Firewall: &FirewallConfig{
Enabled: true,
Version: "v0.25.42",
},
},
}

engine := NewCopilotEngine()
steps := engine.GetExecutionSteps(workflowData, "test.log")
stepContent := requireCopilotExecutionStep(t, steps)

if strings.Contains(stepContent, "--docker-host-path-prefix /tmp/gh-aw") {
t.Error("Expected command to skip --docker-host-path-prefix for unsupported AWF versions")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The "skips docker-host-path-prefix probe" test only asserts that --docker-host-path-prefix is absent from the generated command. It doesn't assert that the probe variable initialisation (GH_AW_DOCKER_HOST_PATH_PREFIX_ARGS="") is also absent.

On an old AWF version the probe block should be entirely omitted — not just the flag value. Adding:

if strings.Contains(stepContent, awfArcDindPrefixArgsVarName) {
    t.Error("Expected no probe variable for unsupported AWF versions")
}

would give the test full coverage of the negative path and guard against accidental partial injection.

}
})

t.Run("AWF command includes ssl-bump flag when enabled", func(t *testing.T) {
workflowData := &WorkflowData{
Name: "test-workflow",
Expand Down