Skip to content

feat(firewall): opt-in ~ regex marker for egress path rules#400

Merged
schmitthub merged 26 commits into
mainfrom
feat/firewall-path-end-anchor
Jun 24, 2026
Merged

feat(firewall): opt-in ~ regex marker for egress path rules#400
schmitthub merged 26 commits into
mainfrom
feat/firewall-path-end-anchor

Conversation

@schmitthub

@schmitthub schmitthub commented Jun 23, 2026

Copy link
Copy Markdown
Owner

Summary

Closes #399.

Egress path rules were compiled to an open-ended Envoy prefix match, so an allow rule for /repos/x also admitted /repos/x-evil — a real bypass on hosts where the path embeds an untrusted name (/repos/<user>, /u/<name>).

This adds an opt-in regex mode signalled by a leading ~ on a path rule:

  • path: /repos/x → literal prefix (unchanged — no migration)
  • path: ~/repos/(x|y)/? → Envoy safe_regex (RE2), full-string matched → exact, anchored, alternation

~ rides inside the existing path string end-to-end — no proto, schema, or CLI-flag change. Methods are untouched (still a concrete list).

Validation (single seam)

firewall.ValidateRule is the one guard on every rule-update path (CLI add, yaml bootstrap, firewall refresh). It fails the whole operation on an invalid path:

  • a literal must start with / (no auto-prepend — a security setting must not guess)
  • a ~ regex must compile via Go regexp (Go's engine is RE2, so compile-compat with Envoy is exact) and anchor at the path root (/… or ^/…)

Generation (pathSpecifier) emits safe_regex for ~ paths (deprecated google_re2 engine field omitted, matching the existing :method matcher), prefix otherwise.

Safety

  • No ReDoSsafe_regex is RE2 (linear-time, no backtracking).
  • No injection — the path is the user's complete pattern; nothing is interpolated into it.

CLI sharp edge

Regex paths must be quoted (the shell expands ~/ and treats (/|/? as metacharacters):

clawker firewall add api.github.com --path '~/repos/(myorg|other)/?' --action allow

Tests

  • New TestValidateRule_PathAnchoring (literal/regex accept + reject branches: missing leading /, won't compile, not anchored, bare ~).
  • Extended the single comprehensive Envoy golden (comprehensiveRules) with a ~ regex allow (alternation) + a ~ regex deny; re-blessed comprehensive + comprehensive_mtls (no per-feature golden).

Docs

PathRule.path (schema + proto), docs/firewall.mdx (new "Exact and pattern matching" section, end-user terms), firewall add --path help, internal/cmd/firewall + controlplane/firewall CLAUDE.md, clawker-support plugin (firewall-security.md + SKILL.md, version bump), CLI reference regenerated.

Host-proxy egress lockstep (security)

The host proxy /open/url browser-exfil gate is a second egress-rule evaluator that must reach the same verdict as Envoy on the same egress-rules.yaml. Two fixes land here so it stays in lockstep with the new ~ regex semantics:

  • ~ marker honored (97e90471) — the gate matched paths prefix-only, so every ~ regex allow over-blocked (and a ~ regex deny under-blocked) versus Envoy. It now compiles ~ paths as full-string RE2, identical to safe_regex.
  • Dot-segment directory canonicalization (6b717e48) — canonicalizePath restored a directory's trailing slash only when the path literally ended in /, while Envoy's normalize_path (RFC 3986 remove_dot_segments) keeps it for any path resolving to a directory. So /blog/. and /blog/x/.. canonicalized to /blog here but /blog/ at Envoy, and an end-anchored rule like ~/blog$ was enforced on the data path yet bypassed on the host-browser channel (the request opened a real host browser tab). Now restored whenever the path resolves to a directory (ends in /, /., or /..). Percent-decoding is deliberately kept — the browser + origin decode the path, so decoding here is what actually gets fetched and is the fail-closed direction; encoded paths cannot evade a deny.

Red-first unit tests prove both directions (deny-bypass + over-block) with a new end-anchored fixture host.

Firewall rule-help docs

bff64516 sharpens domain-wildcard (example.com exact vs .example.com subdomain+apex, exact-beats-wildcard, anti-shadowing) and regex-path guidance across the clawker-support skill, and adds a terse cheat-sheet to the injected agent prompt.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 23, 2026 00:14
@mintlify

mintlify Bot commented Jun 23, 2026

Copy link
Copy Markdown

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
schmitthub 🟢 Ready View Preview Jun 23, 2026, 12:16 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in ~ marker for egress path_rules.path to switch from Envoy prefix matching to RE2 safe_regex full-string matching, closing the /repos/x/repos/x-evil bypass without migrating existing configs.

Changes:

  • Validate path_rules.path as either a literal prefix starting with /, or a ~-prefixed RE2 regex anchored at the path root.
  • Generate Envoy route matches as safe_regex for ~ paths and prefix otherwise.
  • Update CLI help, docs, schema/proto comments, and Envoy golden tests to document and cover the new mode.

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
internal/docs/configdoc_test.go Updates schema-comment expectations for the path field description.
internal/config/schema.go Expands PathRule.Path schema description to document ~ regex mode.
internal/cmd/firewall/CLAUDE.md Updates command reference text for --path semantics.
internal/cmd/firewall/add.go Updates CLI help and examples to include ~ regex paths + quoting guidance.
docs/firewall.mdx Documents exact/pattern matching behavior and the bypass motivation.
docs/configuration.mdx Updates generated YAML schema comment for path_rules.path.
docs/cli-reference/clawker_firewall_add.md Regenerated CLI reference reflecting updated --path help/examples.
controlplane/firewall/testdata/envoy/comprehensive.envoy.golden Updates Envoy golden output to include regex path rules.
controlplane/firewall/testdata/envoy/comprehensive_mtls.envoy.golden Updates mTLS Envoy golden output to include regex path rules.
controlplane/firewall/rules_store.go Adds ~ marker constant and path validation logic in ValidateRule.
controlplane/firewall/rules_store_test.go Adds unit tests covering literal vs regex path validation.
controlplane/firewall/envoy_http.go Emits safe_regex RouteMatch when path is ~-prefixed.
controlplane/firewall/envoy_config_test.go Extends comprehensive fixture to cover regex allow/deny path rules.
controlplane/firewall/CLAUDE.md Updates internal design notes to describe regex path rule behavior.
claude-plugin/clawker-support/skills/clawker-support/SKILL.md Updates support skill docs to mention ~ exact path matching.
claude-plugin/clawker-support/skills/clawker-support/reference/firewall-security.md Updates security guidance to recommend ~-regex anchoring.
claude-plugin/clawker-support/.claude-plugin/plugin.json Bumps support plugin version.
api/admin/v1/admin.proto Documents PathRule.path semantics for literal vs ~ regex.
api/admin/v1/admin.pb.go Regenerated proto Go output with updated field comments.
Files not reviewed (1)
  • api/admin/v1/admin.pb.go: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread controlplane/firewall/rules_store.go
Comment thread controlplane/firewall/envoy_http.go Outdated
Comment thread controlplane/firewall/envoy_http.go Outdated
Comment thread controlplane/firewall/envoy_http.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 21 out of 22 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • api/admin/v1/admin.pb.go: Generated file

Comment thread controlplane/firewall/rules_store_test.go
Comment thread docs/firewall.mdx
Comment thread docs/firewall.mdx

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 23 out of 24 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • api/admin/v1/admin.pb.go: Generated file

Comment thread internal/hostproxy/egress_check.go Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 29 out of 30 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • api/admin/v1/admin.pb.go: Generated file

Comment thread internal/hostproxy/egress_check.go
schmitthub and others added 6 commits June 23, 2026 09:09
Egress path rules compiled to an open-ended Envoy prefix match, so an
allow rule for /repos/x also admitted /repos/x-evil — a bypass on hosts
where the path embeds an untrusted name (/repos/<user>, /u/<name>).

A path prefixed with ~ is now compiled to an Envoy safe_regex (RE2,
full-string) for exact, anchored, alternation matching; paths without ~
keep the literal-prefix behavior unchanged (no migration). Validation at
the write front-door (ValidateRule — the single seam for CLI add, yaml
bootstrap, and firewall refresh) fails the whole rule-update on an
invalid path: a literal must start with /, a ~ regex must compile (Go
regexp is RE2, exact compile-compat with Envoy) and anchor at the path
root. The ~ rides in the existing path string — no proto, schema, or
CLI-flag change.

Closes #399

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The host proxy's /open/url browser-exfil gate independently evaluates the
same egress-rules.yaml as Envoy and must stay in lockstep with it, but its
checkPathRules did literal longest-prefix matching only. A ~-marked regex
path rule (added for Envoy in this PR) can never prefix a real URL path, so
it silently no-op'd there: a denylist regex Envoy enforces would let the
host browser open a path Envoy blocks, and an allowlist regex would
over-block paths Envoy permits.

checkPathRules now matches through pathRuleMatches, which RE2 full-string
matches a ~-marked path (Go regexp is RE2, identical to Envoy safe_regex)
and prefix-matches the rest, keeping len(pr.Path) as the longest-wins
tiebreak to mirror Envoy's longest-first route sort. The marker const is an
intentional local copy, like the existing egressRule/pathRule mirrors,
since host proxy is a leaf package. Adds regex coverage to the egress-check
table, including the deny-with-default-allow exfil-direction case.

Also syncs the now-stale httpAllowRoute/httpDenyRoute/routeMatch doc
comments to describe a literal prefix or RE2 regex (via pathSpecifier).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e_path

The /open/url egress gate's canonicalizePath restored a directory's
trailing slash only when the path literally ended in "/", while Envoy's
normalize_path (RFC 3986 remove_dot_segments) keeps it for any path that
resolves to a directory. So "/blog/." and "/blog/x/.." canonicalized to
"/blog" here but "/blog/" at Envoy, and an end-anchored rule like
"~/blog$" was enforced on the firewall data path yet bypassed on the
host-browser channel.

Restore the trailing slash whenever the path resolves to a directory
(original ends in "/", "/." or "/..") via a new endsInDirectory helper,
so both egress evaluators reach the same verdict. Percent-decoding is
deliberately kept: the host browser and origin decode the path, so
decoding here is what actually gets fetched and is the fail-closed
direction for this channel — encoded paths cannot evade a deny.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Domain matching help was weak on the exact-host-vs-wildcard distinction:
a bare domain matches only that exact host, while a leading-dot entry
(.example.com) matches subdomains plus the apex. Spell that out — plus
exact-beats-wildcard, deny-wins-in-tier, and label-boundary anti-shadowing
— in the support skill, and add the "I allowed example.com but
api.example.com is blocked" gotcha.

Sharpen the path-rule explanation: literal prefixes are open-ended
(/api/ also admits /api-evil), the "~" marker makes a path a full-string
RE2 regex that closes that bypass, regex paths must be single-quoted on
the CLI, methods are a concrete enum (never regex), and path rules apply
to http/https/ws/wss only. Add a terse version of all this to the
injected agent prompt so the in-container agent gets it right with low
context cost, pointing at the skill for depth.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reject literal (non-regex) path rules containing characters that cannot
appear in a URL path — almost always a regex written without the leading
~ marker, which would otherwise commit a dead DENY (silent bypass). Regex
paths stay exempt; their RE2 compile check remains the load-bearing gate
against an unrecoverable Envoy config-load failure.

Document the overlap-precedence contract (longest rule string wins, ties
to declaration order; a regex's length is its char count, not its match
breadth) across the firewall docs, support skill, and agent prompt, and
align the host-proxy checkPathRules doc comment with the firewall's
stable longest-first route sort.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y agent test

- Changelog "what's new" teaser rendered edge-to-edge on wide terminals
  (glamour WithWordWrap(0) leaned on terminal soft-wrap); RenderMarkdown now
  word-wraps at min(terminal width, 80).
- Remove all URLs from CHANGELOG.md: the teaser can't make them clickable and
  they wrap badly mid-token. Describe features instead; drop the dev-facing
  note from the header (changelog is user-facing).
- Fix flaky TestDriveRegister_* races: publishRegisterFailure emits the
  registered event then the untrusted event onto one FIFO subscriber queue;
  tests that waited only for the registered event then asserted the untrusted
  count raced the in-flight second publish (reproduced at -count=300 -race).
  Gate require.Eventually on both events before asserting.
- Allow api.github.com /repos/charmbracelet/ so the agent can read charm
  upstream metadata for the dependency-currency work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The /open/url egress gate read egress-rules.yaml per request but treated
every read failure as a soft policy deny (Warn + 403), so a missing,
unreadable, unparseable, or bad-regex rules file silently degraded to a
permissive skip instead of failing closed the way Envoy rejects a bad
config.

Split enforcement into two tiers: a genuine policy deny (URL not in the
allow list) stays Warn + 403 "blocked by egress policy", while a broken
rules file is errEgressRulesInvalid — an infrastructure fault → Error +
500 "egress rules unavailable", never a silent skip. A single uncompilable
~ regex anywhere now rejects the whole file, via a shared compilePathRegex
used by both match-time and load-time so the two cannot drift from Envoy's
safe_regex form.

Add a staged startup-readiness gate to the daemon. Because the firewall
boots after the host proxy in container bootstrap (and the manager gates
that bootstrap on the host proxy's /health), the server binds first, then
ensureEgressRulesReady runs off the main goroutine as three sequential
loops — firewall container running → Envoy health → rules file readable —
each gating the next so a later stage never races ahead of its prerequisite.
Per-stage budgets derive from the firewall's own bringup/health timeouts in
internal/consts. On exhaustion the daemon shuts down loud; the container
watcher only starts after the gate passes so its clean auto-exit cannot mask
a readiness failure. NewDaemon fails closed when the firewall is enabled but
the rules path cannot be resolved.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 35 out of 36 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • api/admin/v1/admin.pb.go: Generated file

Comment thread internal/hostproxy/egress_check.go Outdated
Comment thread internal/hostproxy/egress_check.go

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

golangci-lint found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Comment thread internal/auth/auth_material.go Fixed
Comment thread internal/consts/consts.go Fixed
Comment thread internal/consts/consts.go Fixed
Comment thread internal/consts/consts.go Fixed
Comment thread internal/consts/consts.go Fixed
Comment thread internal/hostproxy/internals/cmd/clawker-socket-server/main.go Fixed
Comment thread internal/hostproxy/internals/cmd/clawker-socket-server/main.go Fixed
Comment thread internal/hostproxy/internals/cmd/clawker-socket-server/main.go Fixed
Comment thread internal/hostproxy/internals/cmd/clawker-socket-server/main.go Fixed
Comment thread internal/hostproxy/internals/cmd/clawker-socket-server/main.go Fixed
schmitthub and others added 6 commits June 23, 2026 16:53
Bring the branch into full golangci-lint-full compliance (errcheck
check-blank, gocognit, funlen, gochecknoglobals, goconst, gosec, govet
shadow, wrapcheck, revive import-shadowing, thelper, unparam, mnd,
nonamedreturns, ireturn, noctx, dogsled, godoclint, containedctx,
contextcheck, staticcheck). Fixes are in code only — no //nolint, no
.golangci.yml exclusions — and preserve behavior.

controlplane/agent:
- exec.go: extract Run/runStep helpers for funlen+gocognit; gitconfig
  filter script var -> function; split misplaced KillAfterGrace/runStep
  doc comments.
- init_steps.go/boot_steps.go: drop package-level step + plan vars;
  InitPlan/BootPlan become constructor functions building from locals.
- dialer.go: tryEstablish returns a concrete establishedStream struct
  (no bare interface return); registryOutcome -> RegistryOutcome;
  extract establishWithRetry/DriveRegister helpers; lowercase handshake
  error string.

api/clawkerd/v1/mocks: FakeSessionStream no longer stores a Context;
uses ctxFn/done/ctxErr; Recv wraps the ctx error.

internal/hostproxy: handle resp.Body.Close/io.Copy errors; wrap
ctx.Err/regexp.Compile; egressHealthTimeout const; schemeHTTP +
errEgressRulesUnavailable consts; rename shadowing params; httptest
NewRequestWithContext; 0o600 test fixtures.

internal/cmd/container/shared: extract CreateContainer helpers for
gocognit; reclaim is gated on an explicit failed flag set only at error
returns, so a panic does not reclaim (matches the original contract);
fix CreateContainerOptions doc; drop unused setupHostProxy ctx;
flatten container_start host-proxy nesting + fix err shadow.

Test/support files: t.Helper() in table closures, shared test consts,
drop always-constant params, check cleanup errors.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
t.Fatalf("expected nil when all stages pass, got %v", err)
}
want := []string{"firewall", "envoy"}
if len(order) != 2 || order[0] != want[0] || order[1] != want[1] {
schmitthub and others added 5 commits June 23, 2026 18:20
A gRPC client stream must hold its own context (Context() takes no args),
so a fake of it storing one is correct and necessary — the prior
ctxFn/done/ctxErr closure split was an obfuscated way to hold the same
context purely to sidestep containedctx, and it read as garbage.

Store a single ctx context.Context field; Recv/Send select on
f.ctx.Done() and report f.ctx.Err() inline. Drop the dead nil-ctx branch
(no caller passes nil; it only existed to fabricate a context.Background,
which tripped contextcheck) — callers pass context.Background() for a
never-cancelling stream. Lint stays at 0.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@schmitthub schmitthub merged commit 0dadf49 into main Jun 24, 2026
14 of 15 checks passed
@schmitthub schmitthub deleted the feat/firewall-path-end-anchor branch June 24, 2026 01:35
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 24, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Firewall paths need start and end of line matches

3 participants