A practical orientation for a new engineer making a change: the build/test/gate dev loop, three worked "how do I add X" maps into the real code, and where the project's prior reasoning (the audit trail) lives.
This page is the connective tissue. It does not restate the canonical setup and rules — read those once and keep them open:
../../CONTRIBUTING.md— the rules: panic-free library invariant, the audit-trail expectation, PR expectations, commit hygiene.DEV-SETUP.md— clone → build → test → run-the-gates → run-a-soak, the toolchain, system dependencies, and box sizing per task.../architecture.md— the crate map and dependency graph (which crate owns what).overview.md— the request-path narrative and the "parsing is delegated" model.
Build the binary (it is named expressgateway and takes the config path as a
positional argument — there is no --config flag), then run it:
cargo build --release -p lb --bin expressgateway
./target/release/expressgateway config/default.tomlRun the tests — the default suite while iterating, the full session gate before you push:
cargo test --workspace # fast inner loop
cargo test --workspace --all-features --no-fail-fast # the session gate (mirrors CI)--all-features enables test-gauges, which the bounded-memory (R8) integration
tests read; it is off by default. --no-fail-fast gives the full failure set.
On a low-RAM box, cap parallelism (CARGO_BUILD_JOBS=4) or the --all-features
compile can OOM — see DEV-SETUP.md for box sizing and a shared
CARGO_TARGET_DIR.
CI is a set of thin wrappers you can run locally before pushing. Each gate, what it checks, and how to run it:
| Gate | Run it locally | What it checks |
|---|---|---|
| Format | cargo fmt --all -- --check |
rustfmt is clean |
| Lint + panic-freedom | cargo clippy --workspace --all-targets --all-features -- -D warnings |
no warnings; the #![deny(...)] panic-free lints (no unwrap/expect/panic!/indexing) hold |
| Tests | cargo test --workspace --all-features --no-fail-fast |
the workspace suite + required-tests manifest |
| Doc-lint | bash scripts/ci/doc-lint.sh |
operator docs are free of stale patterns; the "audit-of-audit" Verified-Fixed claims resolve |
| Coverage | bash scripts/ci/coverage-check.sh <lcov> |
per-module hot-path line coverage ≥ 80% |
| h3spec | bash scripts/ci/h3spec-check.sh <h3spec> <host> <port> |
HTTP/3 conformance against the named-waiver list |
| Container smoke | IMAGE=expressgateway:ci bash scripts/ci/docker-smoke.sh |
the image builds, boots, and serves a request |
Two gates run in CI without a local wrapper script:
- h2spec (HTTP/2 conformance) runs in the
Conformancejob of.github/workflows/ci.yml: it stands up a TLS listener (ALPNh2), downloads a pinnedh2spec, and runs it--strictagainst127.0.0.1:8443. To reproduce locally, run ah1slistener and pointh2spec -t -k -h 127.0.0.1 -p <port> --strictat it. - Panic-freedom is enforced three ways: the
#![deny(...)]block at the top of everycrates/*/src/lib.rs, a dedicated CI job, and anawkgrep inscripts/halting-gate.shthat scanscrates/for panicking constructs outside#[cfg(test)]. The release profile ispanic = "abort", so a stray panic is a hard outage — which is why the libraries forbid it by construction. Background:../decisions/ADR-0010-panic-free-enforcement.md.
The soak and the multi-kernel XDP matrix are not run on hosted CI — they need
dedicated hardware. The release soak gate (scripts/release-soak.sh) provisions
its own box; see DEV-SETUP.md "Release soak gate".
These are orientation maps into the real code, not copy-paste tutorials. Each points at the files you will touch and the test that proves the change.
The 9-cell HTTP matrix is a Bridge per front×back pair. The shape:
- The neutral types and the trait live in
crates/lb-l7/src/lib.rs:BridgeRequest/BridgeResponse(protocol-neutral: method, URI, aVec<(String, String)>header list that may carry:-pseudo-headers, aBytesbody, and trailers), theBridgetrait (bridge_request/bridge_response/source_protocol/dest_protocol), and thecreate_bridge(source, dest)factory that maps eachProtocolpair to its cell. - Each cell is one file,
crates/lb-l7/src/h{1,2,3}_to_h{1,2,3}.rs, implementingBridgeforH?ToH?Bridge. The request and response transforms are where pseudo-header insertion/removal, scheme handling, and trailer threading happen. - Every cell calls the shared
check_header_count(theMAX_HEADERS = 256cap) in both directions — keep that invariant when you edit one.
To modify a cell, edit the matching bridge_request / bridge_response. To
add a behavior across all cells, prefer the shared helper in lib.rs so you
fix it once (the codebase has been bitten by single-cell fixes that missed the
other eight). The end-to-end proof is the matching integration test
tests/bridging_h{1,2,3}_h{1,2,3}.rs, which drives the public lb_l7 API and
asserts header-cap, method/path/body preservation, and trailer handling — add or
extend the test for your case and confirm with the full suite, not just the one
file.
The backend (upstream) protocol is a small enum plumbed from config to a pool:
- The enum is
UpstreamProto(crates/lb-l7/src/upstream.rs); the binary maps config tokens to it inparse_upstream_proto(crates/lb/src/main.rs) — today"tcp"/"h1"→Http1,"h2"→Http2,"h3"→Http3, with an unknown token failing fast at startup with a clear message. - The connection itself comes from a pool in
lb-io:TcpPool(crates/lb-io/src/pool.rs) for H1/raw-TCP backends,Http2Pool(http2_pool.rs) for the hyper h2 client, andQuicUpstreamPool(quic_pool.rs) for the quiche client. Name resolution isdns.rs.
A new backend protocol therefore means: extend UpstreamProto, accept the token
in parse_upstream_proto and in lb_config::validate_config (so the schema
and the binary agree), add or reuse a pool in lb-io, and select that pool on the
upstream leg of the relevant bridge cells. The backend-protocol token is part of
BackendConfig in crates/lb-config/src/lib.rs.
This one is honest groundwork: the algorithms exist, but the selection knob does not. It is a well-bounded contributor task, and the seam is clear.
What is already there:
- All eleven algorithms are implemented in
crates/lb-balancer/src/*.rs, and the enum that names them,LbPolicy, lives incrates/lb-core/src/policy.rs. - The live data path is hard-wired: the L7/TCP listeners build round-robin in the
binary (
RoundRobinUpstreams::newandRoundRobin::newincrates/lb/src/main.rs), and QUIC Mode A passthrough uses Maglev-by-CID incrates/lb-quic/src/passthrough.rs.
What is missing (the work):
- A config key. There is no
policyfield onListenerConfig/BackendConfigincrates/lb-config/src/lib.rs, and the schema is#[serde(deny_unknown_fields)]— so an operator literally cannot addpolicy = "p2c"today; it would be rejected at parse. Adding the field (and its validation) is step one. - Selecting the picker.
LbPolicyis currently referenced nowhere outsidelb-core. The binary would map the configured policy to the matchinglb-balancerpicker where it buildsRoundRobinUpstreams/RoundRobintoday, instead of always constructing round-robin. - Two caveats to handle honestly. The
ewmapolicy needs a per-request backend-latency feed that the request path does not record yet (the setter is#[cfg(test)]-only), and the per-backendweightis parsed but ignored by the live pickers — a weighted policy must actually consume it.
Document the result in ../features.md (the canonical home for
load-balancing reality) and update
../known-limitations.md when it lands.
audit/ is the program's permanent evidence trail — prior reasoning is recorded
there, not lost. When a comment or report cites a finding, that is where to read
the full story.
../../audit/by area:security/(the S38 security audit — inventory, findings, cross-reviews),perf/(the S39 perf baseline + burn-in),soak/(per-session soak reports + verdicts),reliability/(operability reviews),decisions/(audit-level design decisions), andrelease/(CI/doc inventories and session reports). The honest deferred-features list is../../audit/deferred.md; the executive summary is../../audit/FINAL_REPORT.md.- Finding IDs. You will see stable IDs in code comments and reports —
F-…for a finding from an audit pass (e.g.F-RES-1) andCF-…for a carried finding tracked across sessions (e.g.CF-S27-2). To find the reasoning behind one,grep -r <id> audit/. When you make a non-obvious correctness claim in a PR, cite the evidence underaudit/(or add it) — an ungrounded claim is treated as a defect. - ADRs. The architecture decision records are in
../decisions/(ADR-0001…0010 plus the eBPF toolchain split and the quinn→quiche migration). - The release gate's closed lists.
../../manifest/holdsrequired-tests.txtandrequired-artifacts.txt;scripts/halting-gate.shgreps the test output against them and they are sha256-locked, so a required test cannot silently disappear.
- Rust 1.88 is the MSRV and the pinned channel (
rust-toolchain.toml). It is a hard requirement (quiche 0.29.1 + tokio-quiche 0.19). Do not downgrade. - BoringSSL build dependencies. quiche links BoringSSL (via cmake) and
bindgen needs libclang, so a from-source build needs
cmake,clang,libclang-dev,llvm,pkg-config, andiproute2. The exact package list, the eBPF/nightly toolchains (only needed to rebuild the committed XDP ELF), and the systemd unit are inDEV-SETUP.md.