You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The repository has a mature and layered CI/CD setup with 72 total workflow files (45 standard YAML + 27 Markdown-based agentic workflows). PR quality gates are generally healthy, with one notable broken gate.
Overall health: 🟡 Good — strong coverage breadth, but meaningful gaps in depth.
✅ Existing Quality Gates
The following checks run automatically on every PR targeting main:
Secret digger runs (Claude, Codex, Copilot — every 1h)
Daily token usage analyzers
🔍 Identified Gaps
🔴 High Priority
1. 8 Integration Tests Not Included in Any CI Workflow
The following test files exist in tests/integration/ but are not referenced by any workflow job's --testPathPatterns:
api-proxy-observability.test.ts
api-proxy-rate-limit.test.ts
api-target-allowlist.test.ts
chroot-capsh-chain.test.ts
chroot-copilot-home.test.ts
gh-host-injection.test.ts
ghes-auto-populate.test.ts
host-tcp-services.test.ts
workdir-tmpfs-hiding.test.ts
These are never executed in CI, meaning regressions in API proxy observability, token rate limiting, GitHub host injection prevention, GHES support, host TCP service access, tmpfs workdir hiding, and capsh privilege-drop chain can be merged undetected.
2. Dependency Vulnerability Audit Consistently Failing on PRs
Recent PR workflow data shows Dependency Vulnerability Audit with a 0% pass rate (2 failures out of 2 recent PR runs). This security gate is broken, meaning npm vulnerabilities could merge undetected. This is a critical regression.
3. Coverage Thresholds Are Critically Low for a Security Tool
Current enforced thresholds: 38% statements, 30% branches, 35% functions. For a security-critical infrastructure tool (network firewall), these are far below industry standards. Crucially:
cli.ts (main entry point): 0% coverage
docker-manager.ts (orchestration): 18% coverage, 4% function coverage
Any refactoring or logic change in these files gets zero test signal.
🟡 Medium Priority
4. No Shell Script Static Analysis (ShellCheck)
The repository contains 20+ shell scripts in containers/agent/, containers/squid/, scripts/ci/, and examples/. None of these are linted by any CI check. Shell scripts are high-risk surface area (entrypoint.sh, setup-iptables.sh, cleanup.sh) and bugs here have security implications.
5. No Dockerfile Linting (Hadolint)
Three Dockerfiles exist (containers/agent/, containers/squid/, containers/api-proxy/). No static analysis checks for best practices, layer ordering, or known antipatterns (e.g., apt-get without --no-install-recommends, missing HEALTHCHECK, pinned base images).
6. Performance Benchmarks Not PR-Gated
performance-monitor.yml runs weekly but not on PRs. A PR that significantly increases container startup time, domain resolution latency, or proxy overhead will not be caught before merge. The benchmark infrastructure already exists — it just isn't triggered on PRs.
7. Smoke Tests Require Manual Triggers for 3 of 5 Agents
smoke-claude: requires ❤️ reaction
smoke-codex: requires 🎉 reaction
smoke-copilot: requires 👀 reaction
smoke-chroot: path-based (runs automatically)
smoke-services: automatic (runs on all PRs)
Real agent smoke tests for the three major engines don't run automatically on every PR. If a PR breaks agent invocation or credential injection, it may be merged before anyone adds a reaction.
8. Documentation Link Check Not Triggered on Code-Only PRs
link-check.yml only runs when *.md files change. A PR that renames a file, deletes a section, or changes an anchor can silently break doc links if no markdown is modified.
🟢 Low Priority
9. Secret Digger Workflows Failing Consistently
The hourly secret-digger-claude and secret-digger-copilot workflows have a 100% failure rate in recent runs. While this is not a PR gate, these runs are intended to proactively detect leaked credentials. They are currently providing no value.
10. No SBOM (Software Bill of Materials) Generation
No workflow generates a CycloneDX or SPDX SBOM for the published container images or npm package. This is increasingly required for supply chain security compliance (SLSA, NTIA guidelines).
11. No License Compliance Check
With 100+ transitive npm dependencies, no workflow validates that all dependency licenses are compatible with the project's license. This is common in open-source security tools.
12. No PR Size Gate
Very large PRs (1000+ line changes) are not flagged or blocked. This makes review quality harder to maintain for a security-critical project.
📋 Actionable Recommendations
1. Add Missing Integration Tests to CI (High | Low complexity)
Add the 8 uncovered test patterns to the appropriate workflow jobs in test-integration-suite.yml and test-chroot.yml:
# In test-api-proxy job, expand testPathPatterns:--testPathPatterns="(api-proxy|api-proxy-observability|api-proxy-rate-limit|api-target-allowlist)"# Add new job or expand test-chroot for:--testPathPatterns="(chroot-capsh-chain|chroot-copilot-home)"# Add new "Security Features" job for:--testPathPatterns="(gh-host-injection|ghes-auto-populate|host-tcp-services|workdir-tmpfs-hiding)"
2. Fix or Isolate the Failing Dependency Audit (High | Low complexity)
Investigate whether the audit failures are high/critical CVEs in prod dependencies or false positives in dev dependencies. Options:
Fix the actual vulnerabilities, or
Use --production flag to scope the audit to runtime dependencies only, or
Add --ignore-scripts + specific advisory exceptions via .nsprc
3. Raise Coverage Thresholds Incrementally (High | Medium complexity)
Increase thresholds 5% per quarter and prioritize cli.ts and docker-manager.ts. Immediate target: 50% statements, 45% branches. Long-term target for a security tool: ≥70% statements.
6. Add PR-Triggered Performance Regression Check (Medium | Medium complexity)
Add a lightweight subset of benchmarks to PRs (e.g., container startup time only). Use a threshold of +20% regression as a warning, not a hard failure, to avoid flakiness.
7. Make Smoke Tests Automatic on PR (Medium | Low complexity)
Remove the reaction requirements from smoke-claude.md, smoke-codex.md, and smoke-copilot.md or change them to run automatically on PRs from maintainers (using roles: maintainer). The current reaction gate adds human latency to catching agent regressions.
Run agenticworkflows-audit on a recent Secret Digger run to identify whether failures are credential, model, or logic issues. These scans provide real security value when working.
📈 Metrics Summary
Metric
Value
Total workflow files
72 (45 YAML + 27 Markdown agentic)
Workflows running on PRs
~19 distinct checks
Recent PR success rate (overall)
~95% (Dependency Audit failing)
Unit test statement coverage
~38%
Unit test branch coverage
~32%
Integration tests in CI
25 / 34 (73%)
Integration tests missing from CI
9 (27%)
Shell scripts with no linting
~20+
Dockerfiles with no linting
3
Secret Digger failure rate
100% (Claude + Copilot)
Assessment generated by CI/CD Pipelines and Integration Tests Gap Assessment workflow — run ID 24032352554.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and layered CI/CD setup with 72 total workflow files (45 standard YAML + 27 Markdown-based agentic workflows). PR quality gates are generally healthy, with one notable broken gate.
Overall health: 🟡 Good — strong coverage breadth, but meaningful gaps in depth.
✅ Existing Quality Gates
The following checks run automatically on every PR targeting
main:lint.ymllint.ymltsc --noEmit)test-integration.ymlbuild.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlpr-title.ymlcodeql.ymldependency-audit.ymllink-check.yml*.mdonlysecurity-guard.lock.ymlbuild-test.lock.ymlsmoke-*.lock.ymlScheduled / ongoing:
performance-monitor.yml)🔍 Identified Gaps
🔴 High Priority
1. 8 Integration Tests Not Included in Any CI Workflow
The following test files exist in
tests/integration/but are not referenced by any workflow job's--testPathPatterns:api-proxy-observability.test.tsapi-proxy-rate-limit.test.tsapi-target-allowlist.test.tschroot-capsh-chain.test.tschroot-copilot-home.test.tsgh-host-injection.test.tsghes-auto-populate.test.tshost-tcp-services.test.tsworkdir-tmpfs-hiding.test.tsThese are never executed in CI, meaning regressions in API proxy observability, token rate limiting, GitHub host injection prevention, GHES support, host TCP service access, tmpfs workdir hiding, and capsh privilege-drop chain can be merged undetected.
2. Dependency Vulnerability Audit Consistently Failing on PRs
Recent PR workflow data shows
Dependency Vulnerability Auditwith a 0% pass rate (2 failures out of 2 recent PR runs). This security gate is broken, meaning npm vulnerabilities could merge undetected. This is a critical regression.3. Coverage Thresholds Are Critically Low for a Security Tool
Current enforced thresholds: 38% statements, 30% branches, 35% functions. For a security-critical infrastructure tool (network firewall), these are far below industry standards. Crucially:
cli.ts(main entry point): 0% coveragedocker-manager.ts(orchestration): 18% coverage, 4% function coverageAny refactoring or logic change in these files gets zero test signal.
🟡 Medium Priority
4. No Shell Script Static Analysis (ShellCheck)
The repository contains 20+ shell scripts in
containers/agent/,containers/squid/,scripts/ci/, andexamples/. None of these are linted by any CI check. Shell scripts are high-risk surface area (entrypoint.sh, setup-iptables.sh, cleanup.sh) and bugs here have security implications.5. No Dockerfile Linting (Hadolint)
Three Dockerfiles exist (
containers/agent/,containers/squid/,containers/api-proxy/). No static analysis checks for best practices, layer ordering, or known antipatterns (e.g.,apt-getwithout--no-install-recommends, missingHEALTHCHECK, pinned base images).6. Performance Benchmarks Not PR-Gated
performance-monitor.ymlruns weekly but not on PRs. A PR that significantly increases container startup time, domain resolution latency, or proxy overhead will not be caught before merge. The benchmark infrastructure already exists — it just isn't triggered on PRs.7. Smoke Tests Require Manual Triggers for 3 of 5 Agents
smoke-claude: requires ❤️ reactionsmoke-codex: requires 🎉 reactionsmoke-copilot: requires 👀 reactionsmoke-chroot: path-based (runs automatically)smoke-services: automatic (runs on all PRs)Real agent smoke tests for the three major engines don't run automatically on every PR. If a PR breaks agent invocation or credential injection, it may be merged before anyone adds a reaction.
8. Documentation Link Check Not Triggered on Code-Only PRs
link-check.ymlonly runs when*.mdfiles change. A PR that renames a file, deletes a section, or changes an anchor can silently break doc links if no markdown is modified.🟢 Low Priority
9. Secret Digger Workflows Failing Consistently
The hourly
secret-digger-claudeandsecret-digger-copilotworkflows have a 100% failure rate in recent runs. While this is not a PR gate, these runs are intended to proactively detect leaked credentials. They are currently providing no value.10. No SBOM (Software Bill of Materials) Generation
No workflow generates a CycloneDX or SPDX SBOM for the published container images or npm package. This is increasingly required for supply chain security compliance (SLSA, NTIA guidelines).
11. No License Compliance Check
With 100+ transitive npm dependencies, no workflow validates that all dependency licenses are compatible with the project's license. This is common in open-source security tools.
12. No PR Size Gate
Very large PRs (1000+ line changes) are not flagged or blocked. This makes review quality harder to maintain for a security-critical project.
📋 Actionable Recommendations
1. Add Missing Integration Tests to CI (High | Low complexity)
Add the 8 uncovered test patterns to the appropriate workflow jobs in
test-integration-suite.ymlandtest-chroot.yml:2. Fix or Isolate the Failing Dependency Audit (High | Low complexity)
Investigate whether the audit failures are high/critical CVEs in prod dependencies or false positives in dev dependencies. Options:
--productionflag to scope the audit to runtime dependencies only, or--ignore-scripts+ specific advisory exceptions via.nsprc3. Raise Coverage Thresholds Incrementally (High | Medium complexity)
Increase thresholds 5% per quarter and prioritize
cli.tsanddocker-manager.ts. Immediate target: 50% statements, 45% branches. Long-term target for a security tool: ≥70% statements.Add to
jest.config.js:4. Add ShellCheck for Shell Scripts (Medium | Low complexity)
Add to
lint.ymlalongside ESLint.5. Add Hadolint for Dockerfile Linting (Medium | Low complexity)
Add to
build.ymlor a newlint-containers.yml.6. Add PR-Triggered Performance Regression Check (Medium | Medium complexity)
Add a lightweight subset of benchmarks to PRs (e.g., container startup time only). Use a threshold of +20% regression as a warning, not a hard failure, to avoid flakiness.
7. Make Smoke Tests Automatic on PR (Medium | Low complexity)
Remove the reaction requirements from
smoke-claude.md,smoke-codex.md, andsmoke-copilot.mdor change them to run automatically on PRs from maintainers (usingroles: maintainer). The current reaction gate adds human latency to catching agent regressions.8. Investigate and Fix Secret Digger Failures (Low | Low complexity)
Run
agenticworkflows-auditon a recent Secret Digger run to identify whether failures are credential, model, or logic issues. These scans provide real security value when working.📈 Metrics Summary
Assessment generated by CI/CD Pipelines and Integration Tests Gap Assessment workflow — run ID 24032352554.
Beta Was this translation helpful? Give feedback.
All reactions