[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1672

2026-04-04T22:21:51Z

github-actions[bot]
bot Apr 4, 2026

This assessment analyzes the current state of CI/CD pipelines and integration tests in the gh-aw-firewall repository as of April 2026, identifying gaps and providing actionable recommendations to improve PR quality measurement.

📊 Current CI/CD Pipeline Status

The repository has a well-structured and mature CI/CD setup with 57 total GitHub Actions workflows (including agentic workflows). The pipeline covers traditional quality gates (build, lint, type-check, test) as well as novel AI-driven checks (Security Guard, Build Test Suite, Smoke tests).

Recent PR run health (last batch of completed PR runs):

Workflow	Conclusion
Build Verification	✅ success
Lint	✅ success
TypeScript Type Check	✅ success
Test Coverage	✅ success
Integration Tests	✅ success
Chroot Integration Tests	✅ success
CodeQL	✅ success
Security Guard (AI)	✅ success
Smoke Tests (Copilot/Claude/Codex/Services)	✅ success / ❌ occasional
Dependency Vulnerability Audit	❌ failure (active vulnerability)
PR Title Check	❌ occasional semantic violation
Examples Test	✅ success
Test Setup Action	✅ success
Build Test Suite (AI, 8 ecosystems)	✅ success

✅ Existing Quality Gates

Build & Compilation

Build Verification (build.yml) — TypeScript build on Node 20 & 22 matrix, ESLint, build artifact verification, api-proxy unit tests
TypeScript Type Check (test-integration.yml) — Strict tsc --noEmit via tsconfig.check.json
Lint (lint.yml) — ESLint (TypeScript) + Markdownlint on Markdown files

Testing

Test Coverage (test-coverage.yml) — Unit tests with Jest, coverage comparison against base branch, PR comment with delta, fails on regression (thresholds: 38% statements, 30% branches, 35% functions, 38% lines)
Integration Tests (test-integration-suite.yml) — 5 parallel jobs: Domain, Network, Protocol/Security, Container/Ops, API Proxy — covering 34 integration test files
Chroot Integration Tests (test-chroot.yml) — Multi-language chroot tests (Node, Python, Go, Java, .NET)
Examples Test (test-examples.yml) — Executes real examples/*.sh scripts end-to-end
Test Setup Action (test-action.yml) — Verifies the GitHub Action installer works at latest and specific versions

Security

CodeQL (codeql.yml) — Static analysis for JS/TypeScript and Actions with security-extended,security-and-quality queries
Dependency Vulnerability Audit (dependency-audit.yml) — npm audit for root and docs-site packages, SARIF uploaded to Security tab, fails on high/critical
Security Guard (agentic, security-guard.md) — AI-powered PR security review (Claude) checking iptables, Squid, container hardening, and domain patterns
Secret Digger (agentic, runs hourly) — Proactive secret scanning

PR Hygiene

PR Title Check (pr-title.yml) — Conventional commits format enforcement (feat/fix/docs/...)
Link Check (link-check.yml) — Dead link detection in Markdown (runs on MD changes)
CODEOWNERS — All PRs require @Mossaka review

Documentation

Documentation Preview (docs-preview.yml) — Astro Starlight docs site build check on docs changes
Deploy Docs (deploy-docs.yml) — Deploys documentation on push to main

AI/Agentic Checks

Build Test Suite (agentic) — Runs AWF in 8 language ecosystems (Bun, C++, Deno, .NET, Go, Java, Node, Rust)
Smoke Tests (agentic) — Runs real AI agents (Copilot, Claude, Codex) through the firewall; triggered by emoji reactions or schedule
Smoke Services (agentic) — Tests network services through the firewall
Smoke Chroot (agentic) — Tests chroot environment via PR path triggers

Scheduled/Periodic

Performance Monitor (weekly) — Benchmarks container startup, command execution times, with regression detection
Dependency Security Monitor (daily, agentic) — Broader dependency monitoring
Daily Security Review (daily, agentic) — Threat modeling and evidence-based security review
Dependabot — Weekly PRs for npm (root + docs-site), Docker (agent + squid), and GitHub Actions

🔍 Identified Gaps

🔴 High Priority

1. Unit Test Coverage Is Critically Low on Core Modules

The two most critical files have severely inadequate unit test coverage:

cli.ts: 0% coverage (0/69 statements) — the main CLI entry point with signal handling, argument parsing, and main flow orchestration
docker-manager.ts: 18% coverage (45/250 statements, 4% function coverage) — the core orchestration layer managing all Docker lifecycle
types.ts: Not tracked in coverage
Overall: 38.39% statements — thresholds are set low to avoid failing CI rather than reflecting adequate coverage

The risk: bugs in container lifecycle management (cleanup, timeout handling, volume mounts) can only be caught by the slower, environment-dependent integration tests.

2. No Container Image Vulnerability Scanning on PRs

Docker images (containers/squid/, containers/agent/, containers/api-proxy/) are never scanned for OS-level CVEs during PR checks. The only image security happens at release time via SBOM (anchore/sbom-action) — but there's no Trivy/Grype scan that would catch a CVE introduced via a base image change (FROM ubuntu:22.04, FROM ubuntu/squid:latest).

Given this is a security firewall product, a compromised base image would directly undermine the product's value proposition.

3. Performance Benchmarks Not Run on PRs

The performance-monitor.yml workflow only runs on a weekly schedule and workflow_dispatch. Performance regressions in container startup time, command execution latency, or iptables setup time would not be detected until after merging.

Container startup is the product's critical path — a 2x regression directly degrades user experience.

4. Coverage Thresholds Are Not Meaningful Quality Gates

The configured thresholds (38/30/35/38%) are baseline-anchored to the current low state, not aspirational targets. The test-coverage.yml workflow can fail on regression (exit 1 when coverage drops), but it does not enforce minimum standards like "80% coverage for new code" or "no function with 0% coverage in production-critical paths."

🟡 Medium Priority

5. Smoke Tests Are Not Automatically Required on Every PR

The smoke tests (Copilot, Claude, Codex, Services) are triggered by emoji reactions (👀 ❤️ 🎉 🚀) or schedule. This means:

A PR changing containers/agent/entrypoint.sh could merge without any real agent smoke test running
Only smoke-chroot.md has a proper paths: trigger (limited to src/** and containers/** changes)
The build-test.md agentic workflow runs on all PRs but tests external repos, not the core firewall behavior

6. No Mutation Testing

With only 38% unit test coverage and Jest as the test runner, there is no way to assess test quality. Mutation testing (e.g., via Stryker) would reveal whether existing tests actually catch bugs or just execute code paths.

7. No Automated License Compliance Check

There is no workflow scanning dependency licenses (e.g., license-checker, licensee) to ensure compatibility with the project's MIT license. A GPL or AGPL dependency introduced via npm install could go unnoticed.

8. No Commitlint Enforcement in CI

commitlint is configured (commitlint.config.js, @commitlint/config-conventional in devDeps, Husky as pre-commit) but there is no CI workflow enforcing commit message format. If a developer pushes without Husky (e.g., --no-verify) or via the GitHub UI, malformed commit messages bypass the check.

9. SBOM and Container Scanning Only Happen at Release

Software Bill of Materials generation (anchore/sbom-action) and container image publishing only happen in release.yml. A security vulnerability introduced in a container change may not be discovered until the next release cycle — potentially many PRs later.

10. No PR Size / Diff Complexity Gate

There is no check discouraging very large PRs. Large PRs with 1000+ line diffs are harder to review and more likely to contain subtle bugs. An automated check (even advisory) would improve review quality.

🟢 Low Priority

11. Redundant Lint Execution

ESLint runs in both build.yml and lint.yml on the same triggering conditions. This wastes ~2 minutes of runner time per PR without adding coverage.

12. No Spell Checking for Documentation

While markdownlint checks formatting, there is no spell checker (e.g., cspell, typos) for documentation. Documentation typos currently only get caught by human review.

13. Docs Preview Does Not Generate a Shareable URL

docs-preview.yml builds the docs but does not deploy to a temporary preview URL (e.g., Cloudflare Pages, Netlify, or GitHub Pages preview). Reviewers cannot visually verify documentation changes without checking out the branch locally.

14. No API/CLI Flag Compatibility Check

There is a weekly cli-flag-consistency-checker agentic workflow but no automated check that detects breaking changes to the CLI's public flag interface across PRs. A renamed or removed flag could silently break users.

15. Integration Tests Do Not Run On `paths:` Scope

The integration test suite (test-integration-suite.yml) runs on all PRs regardless of what changed. A documentation-only PR triggers the same expensive Docker build + 5-job integration test suite. Adding paths: filtering would reduce noise and CI costs.

📋 Actionable Recommendations

[High-1] Add Docker Image CVE Scanning on PRs

Problem: OS-level vulnerabilities in base images (ubuntu:22.04, ubuntu/squid:latest) are never caught during PR review.

Solution: Add Trivy scan step to build.yml after container builds:

- name: Build containers
  run: |
    docker build -t local/awf-squid containers/squid/
    docker build -t local/awf-agent containers/agent/

- name: Scan agent container for CVEs
  uses: aquasecurity/trivy-action@915b19bbe73b92a6cf82a1bc12b087c9a19a5fe # latest pinned
  with:
    image-ref: local/awf-agent
    format: sarif
    output: trivy-agent.sarif
    severity: HIGH,CRITICAL
    exit-code: '1'

- name: Upload Trivy results
  uses: github/codeql-action/upload-sarif@...
  with:
    sarif_file: trivy-agent.sarif

Complexity: Low | Impact: High (critical for a security product)

[High-2] Increase Test Coverage Thresholds Incrementally

Problem: 38% coverage thresholds don't prevent untested code from shipping. cli.ts at 0% means signal handling and cleanup code has no automated regression protection.

Solution:

Raise thresholds in jest.config.js by 5% each quarter: target 60% statements, 50% branches within 6 months
Add per-file threshold exceptions for currently-tested files (100% for logger.ts, squid-config.ts — prevent regression there)
Add targeted tests for cli.ts signal handling using Jest's process.emit() mocking

Complexity: Medium | Impact: High

[High-3] Add Performance Regression Check to PR Pipeline

Problem: Startup latency regressions only detected weekly.

Solution: Add a lightweight smoke benchmark to the build workflow:

- name: Quick startup benchmark
  run: |
    # Time 3 runs and fail if median > 15s threshold
    npx tsx scripts/ci/benchmark-performance.ts --quick --fail-threshold 15000

This would run --quick mode (1 iteration, basic checks) rather than the full weekly 5-iteration suite.

Complexity: Low | Impact: High

[Medium-4] Require Smoke Tests on `containers/` and `src/` Path Changes

Problem: Core code changes can merge without a real end-to-end firewall test.

Solution: Change smoke test triggers from reaction-only to also include path-based triggers:

on:
  pull_request:
    types: [opened, synchronize, reopened]
    paths:
      - 'src/**'
      - 'containers/**'
      - 'package.json'
  reaction: "eyes"

This mirrors the existing smoke-chroot.md pattern. At minimum, smoke-copilot.md should auto-trigger on container/src changes.

Complexity: Low | Impact: High

[Medium-5] Add License Compliance Check

Problem: Incompatible dependency licenses could be introduced silently.

Solution: Add to dependency-audit.yml:

- name: Check dependency licenses
  run: npx license-checker --onlyAllow 'MIT;ISC;BSD-2-Clause;BSD-3-Clause;Apache-2.0;CC0-1.0;CC-BY-3.0;CC-BY-4.0;Python-2.0;Unlicense;0BSD' --excludePrivatePackages

Complexity: Low | Impact: Medium

[Medium-6] Enforce Commitlint in CI

Problem: Husky is bypassed on direct pushes or GitHub UI edits.

Solution: Add a lightweight commitlint job to build.yml:

commitlint:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@...
      with:
        fetch-depth: 0
    - run: npm ci
    - run: npx commitlint --from $\{\{ github.event.pull_request.base.sha }} --to $\{\{ github.event.pull_request.head.sha }}

Complexity: Low | Impact: Medium

[Medium-7] Add Container SBOM to PR Security Tab

Problem: SBOM is only generated at release. Vulnerabilities introduced in PR container changes are invisible.

Solution: Generate a lightweight SBOM during the build CI job and upload to Security tab as SARIF. Even a CycloneDX SBOM without full scan gives a diff-able artifact.

Complexity: Medium | Impact: Medium

[Low-8] Remove Duplicate ESLint Execution

Problem: ESLint runs in both build.yml and lint.yml.

Solution: Remove Run linter step from build.yml and add needs: [eslint] dependency, or simply delete the step and rely on lint.yml to gate.

Complexity: Low | Impact: Low (saves ~2 min per PR)

[Low-9] Add Docs Preview Deployment

Problem: Reviewers cannot visually verify documentation changes.

Solution: Use Cloudflare Pages or GitHub Pages preview deployment:

- name: Deploy preview
  uses: cloudflare/pages-action@...
  with:
    projectName: awf-docs-preview
    directory: docs-site/dist
    gitHubToken: $\{\{ secrets.GITHUB_TOKEN }}

Complexity: Medium | Impact: Low-Medium

[Low-10] Add Integration Test Path Filtering

Problem: Docs-only PRs trigger full Docker-based integration test suite (~45 min per job).

Solution: Add paths-ignore to test-integration-suite.yml:

on:
  pull_request:
    branches: [main]
    paths-ignore:
      - '**/*.md'
      - 'docs/**'
      - 'docs-site/**'
      - '.github/workflows/release.yml'

Complexity: Low | Impact: Low (CI cost reduction)

📈 Metrics Summary

Metric	Value
Total GitHub Actions workflows	57
Workflows triggered on PRs	~16 (including agentic)
Unit test coverage (statements)	38.39%
Unit test coverage (branches)	31.78%
`cli.ts` coverage	0% ⚠️
`docker-manager.ts` coverage	18% ⚠️
Total unit tests	135
Total integration test files	34
Node versions tested	20, 22
Language ecosystems smoke-tested	8 (Bun, C++, Deno, .NET, Go, Java, Node, Rust)
Dependabot configured	✅ (npm, Docker, Actions — weekly)
CODEOWNERS enforced	✅ (all files → `@Mossaka`)
Container image CVE scanning on PRs	❌ Missing
Performance regression on PRs	❌ Missing
License compliance check	❌ Missing
Commitlint in CI	❌ Missing
Recent PR workflow success rate	~85% (Dependency Audit failing on active vuln)

Assessment generated by CI/CD Gaps Assessment workflow — run ID 23988703189

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Apr 11, 2026, 10:21 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1672

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1672

Uh oh!

github-actions[bot] bot Apr 4, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Build & Compilation

Testing

Security

PR Hygiene

Documentation

AI/Agentic Checks

Scheduled/Periodic

🔍 Identified Gaps

🔴 High Priority

1. Unit Test Coverage Is Critically Low on Core Modules

2. No Container Image Vulnerability Scanning on PRs

3. Performance Benchmarks Not Run on PRs

4. Coverage Thresholds Are Not Meaningful Quality Gates

🟡 Medium Priority

5. Smoke Tests Are Not Automatically Required on Every PR

6. No Mutation Testing

7. No Automated License Compliance Check

8. No Commitlint Enforcement in CI

9. SBOM and Container Scanning Only Happen at Release

10. No PR Size / Diff Complexity Gate

🟢 Low Priority

11. Redundant Lint Execution

12. No Spell Checking for Documentation

13. Docs Preview Does Not Generate a Shareable URL

14. No API/CLI Flag Compatibility Check

15. Integration Tests Do Not Run On paths: Scope

📋 Actionable Recommendations

[High-1] Add Docker Image CVE Scanning on PRs

[High-2] Increase Test Coverage Thresholds Incrementally

[High-3] Add Performance Regression Check to PR Pipeline

[Medium-4] Require Smoke Tests on containers/** and src/** Path Changes

[Medium-5] Add License Compliance Check

[Medium-6] Enforce Commitlint in CI

[Medium-7] Add Container SBOM to PR Security Tab

[Low-8] Remove Duplicate ESLint Execution

[Low-9] Add Docs Preview Deployment

[Low-10] Add Integration Test Path Filtering

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Apr 4, 2026

15. Integration Tests Do Not Run On `paths:` Scope

[Medium-4] Require Smoke Tests on `containers/` and `src/` Path Changes