Skip to content

S40: repo release hygiene — doc reorg + CI rewrite (0 gates dropped) + soak gate + dev-setup #101

S40: repo release hygiene — doc reorg + CI rewrite (0 gates dropped) + soak gate + dev-setup

S40: repo release hygiene — doc reorg + CI rewrite (0 gates dropped) + soak gate + dev-setup #101

Workflow file for this run

name: CI
# ExpressGateway CI — every per-PR/push BLOCKING gate, in one workflow.
#
# S40 consolidation: the old `ci.yml` (fast checks + build/test) and
# `prod-readiness-gates.yml` (the D-gates that need a real CI environment) were
# two separate blocking workflows that both fired on every PR/push to main. They
# are merged here into one coherent CI workflow, grouped into sections. NO gate
# was dropped — see audit/release/s40-gate-map.md for the before->after mapping
# (every old job -> its new home) and the required-status-check rename list.
#
# Shared setup (toolchain + cache + system-deps) is single-sourced via the
# ./.github/actions/rust-setup composite (R12). Weekly informational scans live
# in scheduled.yml; tag-triggered build/publish + the soak release gate live in
# release.yml.
on:
push:
branches: [main]
pull_request:
workflow_dispatch:
env:
CARGO_TERM_COLOR: always
RUSTFLAGS: -D warnings
# MSRV — matches rust-toolchain.toml (moved 1.85 -> 1.88 at S31 for quiche
# 0.29.1 + tokio-quiche 0.19, which hard-require Rust 1.88).
RUST_MSRV: "1.88"
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
# =================================================================
# SECTION 1 — fast checks (format, compile, lint, doc + panic guards)
# =================================================================
fmt:
name: Format
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
components: rustfmt
cache: 'false'
- run: cargo fmt --all -- --check
check:
name: Check
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
# --all-features is REQUIRED: the R8 memory integration tests in tests/
# read lb-l7's `#[cfg(feature = "test-gauges")]` gauges, forwarded by the
# root crate's `test-gauges` feature (off by default). The canonical
# session gate has always built with --all-features.
- run: cargo check --workspace --all-targets --all-features
clippy:
name: Clippy
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
components: clippy
- run: cargo clippy --workspace --all-targets --all-features -- -D warnings
panic-freedom:
name: Panic Freedom Audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Verify panic-freedom deny lints are present in every library crate
run: |
MISSING=""
for lib in crates/*/src/lib.rs; do
# Match the deny attribute even when it spans multiple lines
# (e.g. `#![deny(\n clippy::unwrap_used, ...\n)]`).
if ! grep -Pzoq '#!\[deny\([^)]*clippy::unwrap_used' "$lib" 2>/dev/null; then
MISSING="$MISSING\n $lib"
fi
done
if [ -n "$MISSING" ]; then
echo "::error::Crates missing panic-freedom deny lints:$MISSING"
exit 1
fi
echo "All library crates have panic-freedom deny lints."
doc-lint:
# Guards operator-facing docs against drift (tier-1 stale patterns) AND
# verifies every `Status: Verified-Fixed(<sha>)` audit claim resolves to a
# SHA whose diff actually closes the recommendation (tier-2 audit-of-audit).
name: Doc Lint
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
with:
# Tier-2 walks `git show` / `git ls-tree` against historical SHAs
# cited in Verified-Fixed claims — needs full history.
fetch-depth: 0
- run: bash scripts/ci/doc-lint.sh
# =================================================================
# SECTION 2 — build & test (suite, MSRV, fuzz smoke, release codegen)
# =================================================================
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
# Mirror the canonical session gate (--all-features = test-gauges) +
# --no-fail-fast. The heavy real-wire e2e binaries (grpc_h3_e2e, ws_*)
# self-serialize via an in-file `static SUITE_SERIAL` tokio Mutex.
#
# CF-FCAP1-FLAKE / CF-SATURATION-1: fcap1_h2_over_cap_upload_yields_413
# must push a 66 MiB body past the 64 MiB cap to assert 413. On the hosted
# runner, sharing CPU with its ~14 sibling tests starves that upload, so it
# is isolated and run ALONE below (full CPU, uncontended). This is
# serialization, not weakening: same test, same assert_eq!(status, 413).
- name: cargo test (suite minus the saturation-isolated fcap1)
run: cargo test --workspace --all-features --no-fail-fast -- --skip fcap1_h2_over_cap_upload_yields_413
timeout-minutes: 45
- name: cargo test (fcap1 over-cap, isolated / serial, retry on env-flake)
# fcap1 must transfer 66 MiB past the const 64 MiB cap; on a hosted
# runner the achievable rate varies (~0.3-1.1 MiB/s) so a single attempt
# can still be starved by a noisy neighbour even when isolated. The
# gateway deadlines are raised to 300 s (see the test); up to 3 attempts:
# a genuine cap regression fails ALL three (assertion unchanged), only a
# slow-transfer env-flake is retried. CF-SATURATION-1.
run: |
for attempt in 1 2 3; do
echo "::group::fcap1 attempt $attempt"
if cargo test -p lb-integration-tests --test h2h1_md_streaming_verify \
--all-features -- --exact fcap1_h2_over_cap_upload_yields_413 --nocapture; then
echo "::endgroup::"; echo "fcap1 passed on attempt $attempt"; exit 0
fi
echo "::endgroup::"; echo "fcap1 attempt $attempt failed (env throughput?); retrying"
done
echo "::error::fcap1 failed all 3 attempts — this is a real cap-enforcement failure, not an env flake"
exit 1
timeout-minutes: 25
msrv:
# `cargo check --all-targets --all-features` on the pinned MSRV is the
# canonical "compiles on 1.88" gate; full-codegen runs in release-build.
name: MSRV (1.88)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: ${{ env.RUST_MSRV }}
- run: cargo check --workspace --all-targets --all-features
fuzz-smoke:
name: Fuzz Smoke Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: nightly
- name: Install cargo-fuzz
run: cargo install cargo-fuzz --locked
- name: Smoke test all fuzz targets (10 seconds each)
run: |
cd fuzz
targets=$(cargo +nightly fuzz list)
if [ -z "$targets" ]; then
echo "::error::No fuzz targets discovered in fuzz/Cargo.toml"
exit 1
fi
for target in $targets; do
echo "::group::Fuzzing $target"
cargo +nightly fuzz run "$target" -- -max_total_time=10 2>&1
echo "::endgroup::"
done
release-build:
name: Release Build
runs-on: ubuntu-latest
needs: [check, clippy, test, fmt, panic-freedom, doc-lint]
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
- run: cargo build --workspace --release
timeout-minutes: 30
# =================================================================
# SECTION 3 — security & dependency gates
# =================================================================
audit:
name: Security Audit
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
# cargo-audit needs a recent rustc to build; rust-toolchain.toml pins 1.88
# but we `+stable install` so cargo-audit always builds on latest stable.
- uses: ./.github/actions/rust-setup
- name: Install cargo-audit
run: cargo +stable install cargo-audit --locked
# SEC-2-07: fail on ANY RUSTSEC advisory. Explicit ignores must live in
# .cargo/audit.toml with a justification + link. The SAME strict audit runs
# weekly in scheduled.yml to catch advisories published against unchanged
# deps between PRs.
- run: cargo audit -D warnings
cargo-deny:
# advisories / licenses / bans / sources. cargo-deny is a standalone prebuilt
# binary (no toolchain needed). The explicit subcommand list keeps a future
# cargo-deny "check" default from silently shrinking the gate. This is the
# SINGLE cargo-deny gate (the duplicate was removed at S34).
name: cargo-deny (licenses/advisories/bans/sources)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Install cargo-deny (prebuilt binary)
run: |
VER=0.19.6
curl -fsSL "https://github.com/EmbarkStudios/cargo-deny/releases/download/${VER}/cargo-deny-${VER}-x86_64-unknown-linux-musl.tar.gz" \
| tar -xz --strip-components=1 -C /usr/local/bin --wildcards '*/cargo-deny'
cargo-deny --version
- run: cargo-deny check licenses advisories bans sources
# =================================================================
# SECTION 4 — coverage & protocol conformance
# =================================================================
coverage:
# Hot-path coverage, PER-MODULE >= 80% (audit/coverage-scope.md charter
# metric, NOT a whole-package aggregate). Runs the FULL workspace suite under
# instrumentation so the integration tests exercise the hot paths, then
# enforces >= 80% per hot-path module via scripts/ci/coverage-check.sh
# (lb-l4-xdp/src/loader.rs carved out by name — XDP load needs root, smoke-
# validated by the xdp-smoke job). A new hot-path module under 80% -> RED.
name: Coverage (per-module hot-path >= 80%)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: ${{ env.RUST_MSRV }}
components: llvm-tools-preview
system-deps: 'true'
- name: Free runner disk (instrumented --workspace build is ~28 GB)
run: |
sudo rm -rf /usr/share/dotnet /opt/ghc /usr/local/.ghcup \
/usr/local/lib/android /opt/hostedtoolcache/CodeQL \
/usr/share/swift 2>/dev/null || true
df -h /
- name: Install cargo-llvm-cov + nextest
run: |
curl -fsSL https://github.com/taiki-e/cargo-llvm-cov/releases/latest/download/cargo-llvm-cov-x86_64-unknown-linux-gnu.tar.gz | tar -xz -C ~/.cargo/bin
curl -fsSL https://get.nexte.st/latest/linux | tar -xz -C ~/.cargo/bin
- name: Coverage (full workspace suite, --all-features)
# --ignore-run-fail: this is the COVERAGE gate, not the pass/fail gate
# (that is the `test` job). Measure what the suite covers even if a test
# flakes; coverage-check.sh below is the actual verdict.
run: |
cargo llvm-cov nextest \
--workspace --all-features --ignore-run-fail \
--lcov --output-path coverage.lcov
- name: Enforce per-module hot-path threshold (charter metric)
run: bash scripts/ci/coverage-check.sh coverage.lcov
- uses: actions/upload-artifact@v7
if: always()
with:
name: coverage-lcov
path: coverage.lcov
conformance:
# HTTP/2 + HTTP/3 conformance against the REAL gateway: an h1s listener
# (TLS, ALPN h2/http1.1) on TCP :8443 -> h2spec; a quic listener
# (H3-terminate, quiche::h3) on UDP :8444 -> h3spec. Tool versions are PINNED
# so the h3spec waiver list (CF-QUICHE-UPGRADE) stays exact.
name: Conformance (h2spec --strict + h3spec)
runs-on: ubuntu-latest
env:
H2SPEC_VER: "v2.6.0"
H3SPEC_VER: "v0.1.13"
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: ${{ env.RUST_MSRV }}
system-deps: 'true'
- run: cargo build -p lb
- name: Stub backend + test cert/key + QUIC retry secret
run: |
python3 -m http.server 3000 >/tmp/backend.log 2>&1 &
echo $! > backend.pid
# Self-signed cert (DNS + IP SANs); h2spec uses -k and h3spec uses -n
# so cert validation is skipped (protocol conformance, not PKI) — but
# the cert must still load into rustls + BoringSSL.
openssl req -x509 -newkey rsa:2048 -keyout key.pem -out cert.pem \
-days 1 -nodes -subj "/CN=localhost" \
-addext "subjectAltName=DNS:localhost,IP:127.0.0.1"
head -c 32 /dev/urandom > retry.bin
- name: Write conformance config (lb-config schema)
run: |
cat > conformance.toml <<TOML
[runtime]
drain_timeout_ms = 5000
readiness_settle_ms = 100
# H1+H2 over TLS (ALPN h2 then http/1.1) -> h2spec target (TCP).
[[listeners]]
address = "127.0.0.1:8443"
protocol = "h1s"
[listeners.tls]
cert_path = "$PWD/cert.pem"
key_path = "$PWD/key.pem"
[[listeners.backends]]
address = "127.0.0.1:3000"
protocol = "h1"
weight = 1
# HTTP/3 over QUIC, H3-terminate (quiche::h3) -> h3spec target (UDP).
[[listeners]]
address = "127.0.0.1:8444"
protocol = "quic"
[listeners.quic]
cert_path = "$PWD/cert.pem"
key_path = "$PWD/key.pem"
retry_secret_path = "$PWD/retry.bin"
[[listeners.backends]]
address = "127.0.0.1:3000"
protocol = "h1"
weight = 1
[observability]
metrics_bind = "127.0.0.1:9090"
TOML
cat conformance.toml
- name: Boot gateway + wait for BOTH listeners
run: |
./target/debug/expressgateway conformance.toml >/tmp/gw.log 2>&1 &
echo $! > gw.pid
up=0
for i in $(seq 1 40); do
if ss -ltn | grep -q ':8443' && ss -lun | grep -q ':8444'; then up=1; break; fi
sleep 1
done
if [ "$up" -ne 1 ]; then
echo "::error::gateway listeners did not come up"; cat /tmp/gw.log; exit 1
fi
echo "both listeners up; gateway log:"; tail -5 /tmp/gw.log
- name: h2spec --strict (HTTP/2 conformance, TCP :8443)
run: |
curl -fsSL "https://github.com/summerwind/h2spec/releases/download/${H2SPEC_VER}/h2spec_linux_amd64.tar.gz" | tar -xz
# -k: skip cert validation (protocol conformance, not PKI). --strict:
# fail on ANY non-conformant case. The gateway passes 147/147.
./h2spec -t -k -h 127.0.0.1 -p 8443 --strict
- name: h3spec (HTTP/3 conformance, UDP :8444) via named-waiver gate
run: |
curl -fsSL -o h3spec "https://github.com/kazu-yamamoto/h3spec/releases/download/${H3SPEC_VER}/h3spec-linux-x86_64"
chmod +x h3spec
# The wrapper runs h3spec -n and PASSES iff every failure is one of the
# 12 individually-named, documented quiche-0.29 limitations
# (CF-QUICHE-UPGRADE). A NEW/un-waived failure turns this RED.
bash scripts/ci/h3spec-check.sh ./h3spec 127.0.0.1 8444
- name: Stop gateway + backend
if: always()
run: |
kill "$(cat gw.pid)" 2>/dev/null || true
kill "$(cat backend.pid)" 2>/dev/null || true
chaos-attacks:
# The chaos ATTACK suite (the half that does not need a soak host): Rapid
# Reset, CONTINUATION flood, HPACK bomb, slowloris. The 4-hour soak (D-3b)
# is the release-soak gate (scripts/release-soak.sh), not a PR gate.
name: Chaos Attack Suite
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: ${{ env.RUST_MSRV }}
system-deps: 'true'
- name: Install cargo-nextest
run: curl -fsSL https://get.nexte.st/latest/linux | tar -xz -C ~/.cargo/bin
- run: cargo build -p lb
- name: Run chaos attack suite
# Round-8 wired Smuggle/Slowloris/SlowPost detectors + the
# CONTINUATION/Rapid-Reset caps. --all-features keeps the lb-l7/lb-h2
# test surface identical to the session gate.
run: |
cargo nextest run --all-features --package lb-h2 --package lb-l7 \
-E 'test(/chaos|rapid_reset|continuation|hpack|slowloris/)'
# =================================================================
# SECTION 5 — container image & XDP datapath
# =================================================================
image-scan:
# Real docker build + RUN+SERVE smoke + trivy scan. Builds the image ONCE
# and (a) proves it BOOTS and SERVES L7 traffic via docker-smoke.sh — a real
# HTTP/1.1 request through the live container must return the backend's
# 200 + body — and only then (b) Trivy-scans the same image. `docker build`
# exit-0 alone was never proof the container works (a pre-S35 image had a
# wrong CMD that could not boot — caught here now).
name: Container Image (build + serve smoke + trivy)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- name: Build image
run: |
docker build -f docker/Dockerfile \
--build-arg GIT_SHA="${GITHUB_SHA::8}" \
--build-arg BUILD_DATE="$(date -u +%Y-%m-%dT%H:%M:%SZ)" \
--build-arg VERSION="ci-${GITHUB_SHA::8}" \
-t expressgateway:ci .
- name: Smoke — RUN the container and serve a real request through it
run: IMAGE=expressgateway:ci bash scripts/ci/docker-smoke.sh
- name: Trivy image scan
uses: aquasecurity/trivy-action@master
with:
image-ref: expressgateway:ci
format: table
exit-code: "1" # fail the job on findings
severity: HIGH,CRITICAL
ignore-unfixed: true
xdp-smoke:
# XDP verifier-accept smoke (RUNNER-KERNEL ONLY). Loads the committed XDP ELF
# (crates/lb-l4-xdp/src/lb_xdp.bin) into the verifier of the runner's OWN
# kernel (GitHub hosted = Linux 6.x) and asserts ACCEPT. A true, meaningful
# claim: "the shipped XDP object builds and the in-kernel verifier on a
# current kernel accepts it." It does NOT cover the full 5.15/6.1/6.6 matrix
# (that needs nested virt the hosted runners lack — F-ESC-1, self-hosted).
name: XDP Verifier Smoke (runner kernel)
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v6
- uses: ./.github/actions/rust-setup
with:
toolchain: ${{ env.RUST_MSRV }}
system-deps: 'true'
- name: Print runner kernel (for the record — smoke is THIS kernel only)
run: uname -r
- name: Build the real-load verifier test (validates the embedded ELF too)
run: cargo test -p lb-l4-xdp --test round8_verifier_baseline_70 --no-run
- name: Load the committed XDP ELF and assert verifier ACCEPT
# The committed ELF (built by scripts/build-xdp.sh) uses aya LEGACY map
# definitions, which libbpf v1.0+ tools (bpftool, ip) REFUSE to load. aya
# is the only loader that can load it — and it is the gateway's OWN
# loader. round8_verifier_baseline_70 does a genuine BPF_PROG_LOAD via
# XdpLoader::load_from_bytes_pinned + kernel_load on the RUNNING kernel
# (no NIC/attach) and asserts real kernel facts (prog_id, tag,
# verified_insns > 0). Needs CAP_BPF + bpffs, hence sudo + --ignored.
run: |
set -euo pipefail
sudo -E env "PATH=$PATH" cargo test -p lb-l4-xdp \
--test round8_verifier_baseline_70 -- --ignored --nocapture