[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1698

2026-04-06T12:56:11Z

github-actions[bot]
bot Apr 6, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and layered CI/CD setup with 72 total workflow files (45 standard YAML + 27 Markdown-based agentic workflows). PR quality gates are generally healthy, with one notable broken gate.

Overall health: 🟡 Good — strong coverage breadth, but meaningful gaps in depth.

✅ Existing Quality Gates

The following checks run automatically on every PR targeting main:

Check	Workflow	Trigger
ESLint (TypeScript)	`lint.yml`	All PRs
Markdownlint	`lint.yml`	All PRs
TypeScript type-check (`tsc --noEmit`)	`test-integration.yml`	All PRs
Build Verification (Node 20 + 22 matrix)	`build.yml`	All PRs
Unit Tests + Coverage comparison	`test-coverage.yml`	All PRs (non-md)
Integration Tests (5 job suites)	`test-integration-suite.yml`	All PRs
Chroot Integration Tests (4 job suites)	`test-chroot.yml`	All PRs
Examples functional tests	`test-examples.yml`	All PRs (non-md)
Setup Action tests	`test-action.yml`	All PRs (non-md)
PR title semantic validation	`pr-title.yml`	All PRs
CodeQL (JS/TS + Actions)	`codeql.yml`	All PRs
Dependency vulnerability audit (npm)	`dependency-audit.yml`	All PRs (non-md)
Documentation link check	`link-check.yml`	PRs touching `*.md` only
AI Security Guard review	`security-guard.lock.yml`	All PRs
AI Build Test Suite	`build-test.lock.yml`	All PRs
Smoke tests (Claude/Codex/Copilot/Chroot/Services)	`smoke-*.lock.yml`	PRs (some reaction-gated)

Scheduled / ongoing:

Weekly performance benchmarks (performance-monitor.yml)
Daily dependency security monitor (agentic)
Weekly CLI flag consistency checker (agentic)
Secret digger runs (Claude, Codex, Copilot — every 1h)
Daily token usage analyzers

🔍 Identified Gaps

🔴 High Priority

1. 8 Integration Tests Not Included in Any CI Workflow

The following test files exist in tests/integration/ but are not referenced by any workflow job's --testPathPatterns:

api-proxy-observability.test.ts
api-proxy-rate-limit.test.ts
api-target-allowlist.test.ts
chroot-capsh-chain.test.ts
chroot-copilot-home.test.ts
gh-host-injection.test.ts
ghes-auto-populate.test.ts
host-tcp-services.test.ts
workdir-tmpfs-hiding.test.ts

These are never executed in CI, meaning regressions in API proxy observability, token rate limiting, GitHub host injection prevention, GHES support, host TCP service access, tmpfs workdir hiding, and capsh privilege-drop chain can be merged undetected.

2. Dependency Vulnerability Audit Consistently Failing on PRs

Recent PR workflow data shows Dependency Vulnerability Audit with a 0% pass rate (2 failures out of 2 recent PR runs). This security gate is broken, meaning npm vulnerabilities could merge undetected. This is a critical regression.

3. Coverage Thresholds Are Critically Low for a Security Tool

Current enforced thresholds: 38% statements, 30% branches, 35% functions. For a security-critical infrastructure tool (network firewall), these are far below industry standards. Crucially:

cli.ts (main entry point): 0% coverage
docker-manager.ts (orchestration): 18% coverage, 4% function coverage

Any refactoring or logic change in these files gets zero test signal.

🟡 Medium Priority

4. No Shell Script Static Analysis (ShellCheck)

The repository contains 20+ shell scripts in containers/agent/, containers/squid/, scripts/ci/, and examples/. None of these are linted by any CI check. Shell scripts are high-risk surface area (entrypoint.sh, setup-iptables.sh, cleanup.sh) and bugs here have security implications.

5. No Dockerfile Linting (Hadolint)

Three Dockerfiles exist (containers/agent/, containers/squid/, containers/api-proxy/). No static analysis checks for best practices, layer ordering, or known antipatterns (e.g., apt-get without --no-install-recommends, missing HEALTHCHECK, pinned base images).

6. Performance Benchmarks Not PR-Gated

performance-monitor.yml runs weekly but not on PRs. A PR that significantly increases container startup time, domain resolution latency, or proxy overhead will not be caught before merge. The benchmark infrastructure already exists — it just isn't triggered on PRs.

7. Smoke Tests Require Manual Triggers for 3 of 5 Agents

smoke-claude: requires ❤️ reaction
smoke-codex: requires 🎉 reaction
smoke-copilot: requires 👀 reaction
smoke-chroot: path-based (runs automatically)
smoke-services: automatic (runs on all PRs)

Real agent smoke tests for the three major engines don't run automatically on every PR. If a PR breaks agent invocation or credential injection, it may be merged before anyone adds a reaction.

8. Documentation Link Check Not Triggered on Code-Only PRs

link-check.yml only runs when *.md files change. A PR that renames a file, deletes a section, or changes an anchor can silently break doc links if no markdown is modified.

🟢 Low Priority

9. Secret Digger Workflows Failing Consistently

The hourly secret-digger-claude and secret-digger-copilot workflows have a 100% failure rate in recent runs. While this is not a PR gate, these runs are intended to proactively detect leaked credentials. They are currently providing no value.

10. No SBOM (Software Bill of Materials) Generation

No workflow generates a CycloneDX or SPDX SBOM for the published container images or npm package. This is increasingly required for supply chain security compliance (SLSA, NTIA guidelines).

11. No License Compliance Check

With 100+ transitive npm dependencies, no workflow validates that all dependency licenses are compatible with the project's license. This is common in open-source security tools.

12. No PR Size Gate

Very large PRs (1000+ line changes) are not flagged or blocked. This makes review quality harder to maintain for a security-critical project.

📋 Actionable Recommendations

1. Add Missing Integration Tests to CI (High | Low complexity)

Add the 8 uncovered test patterns to the appropriate workflow jobs in test-integration-suite.yml and test-chroot.yml:

# In test-api-proxy job, expand testPathPatterns:
--testPathPatterns="(api-proxy|api-proxy-observability|api-proxy-rate-limit|api-target-allowlist)"

# Add new job or expand test-chroot for:
--testPathPatterns="(chroot-capsh-chain|chroot-copilot-home)"

# Add new "Security Features" job for:
--testPathPatterns="(gh-host-injection|ghes-auto-populate|host-tcp-services|workdir-tmpfs-hiding)"

2. Fix or Isolate the Failing Dependency Audit (High | Low complexity)

Investigate whether the audit failures are high/critical CVEs in prod dependencies or false positives in dev dependencies. Options:

Fix the actual vulnerabilities, or
Use --production flag to scope the audit to runtime dependencies only, or
Add --ignore-scripts + specific advisory exceptions via .nsprc

3. Raise Coverage Thresholds Incrementally (High | Medium complexity)

Increase thresholds 5% per quarter and prioritize cli.ts and docker-manager.ts. Immediate target: 50% statements, 45% branches. Long-term target for a security tool: ≥70% statements.

Add to jest.config.js:

coverageThreshold: {
  global: { branches: 35, functions: 40, lines: 43, statements: 43 }  // immediate step
}

4. Add ShellCheck for Shell Scripts (Medium | Low complexity)

- name: ShellCheck
  uses: ludeeus/action-shellcheck@master
  with:
    scandir: './containers'
    additional_files: 'scripts/ci/*.sh install.sh'

Add to lint.yml alongside ESLint.

5. Add Hadolint for Dockerfile Linting (Medium | Low complexity)

- uses: hadolint/hadolint-action@v3.1.0
  with:
    dockerfile: containers/agent/Dockerfile
- uses: hadolint/hadolint-action@v3.1.0
  with:
    dockerfile: containers/squid/Dockerfile

Add to build.yml or a new lint-containers.yml.

6. Add PR-Triggered Performance Regression Check (Medium | Medium complexity)

Add a lightweight subset of benchmarks to PRs (e.g., container startup time only). Use a threshold of +20% regression as a warning, not a hard failure, to avoid flakiness.

7. Make Smoke Tests Automatic on PR (Medium | Low complexity)

Remove the reaction requirements from smoke-claude.md, smoke-codex.md, and smoke-copilot.md or change them to run automatically on PRs from maintainers (using roles: maintainer). The current reaction gate adds human latency to catching agent regressions.

8. Investigate and Fix Secret Digger Failures (Low | Low complexity)

Run agenticworkflows-audit on a recent Secret Digger run to identify whether failures are credential, model, or logic issues. These scans provide real security value when working.

📈 Metrics Summary

Metric	Value
Total workflow files	72 (45 YAML + 27 Markdown agentic)
Workflows running on PRs	~19 distinct checks
Recent PR success rate (overall)	~95% (Dependency Audit failing)
Unit test statement coverage	~38%
Unit test branch coverage	~32%
Integration tests in CI	25 / 34 (73%)
Integration tests missing from CI	9 (27%)
Shell scripts with no linting	~20+
Dockerfiles with no linting	3
Secret Digger failure rate	100% (Claude + Copilot)

Assessment generated by CI/CD Pipelines and Integration Tests Gap Assessment workflow — run ID 24032352554.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 1.3M · ◷

expires on Apr 13, 2026, 12:56 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1698

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1698

Uh oh!

github-actions[bot] bot Apr 6, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

1. Add Missing Integration Tests to CI (High | Low complexity)

2. Fix or Isolate the Failing Dependency Audit (High | Low complexity)

3. Raise Coverage Thresholds Incrementally (High | Medium complexity)

4. Add ShellCheck for Shell Scripts (Medium | Low complexity)

5. Add Hadolint for Dockerfile Linting (Medium | Low complexity)

6. Add PR-Triggered Performance Regression Check (Medium | Medium complexity)

7. Make Smoke Tests Automatic on PR (Medium | Low complexity)

8. Investigate and Fix Secret Digger Failures (Low | Low complexity)

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Apr 6, 2026