This engine controls real building equipment. A wrong result is not a failed test — it is a physical hazard (a stuck valve, a frozen coil, an overpressured loop). Testing here is where the rubber meets the road. It is a first-class deliverable on every PR, never an afterthought, and thin coverage is a blocking review defect — the same severity as a compile error.
This document is the contract every change is held to. It is enforced socially in review and
mechanically in CI (the development -> main release gate runs the full suite — see
CI: where tests run).
Every behavioral change ships tests in all four categories that apply to it. "It compiles and the happy path works" is not coverage.
Do not test the value you expect to work. Test the value that breaks the implementation. Enumerate the input domain and hit every boundary and degenerate case:
- Numeric boundaries:
0,-0.0,±1,i64::MIN/i64::MAX,i32limits (CDL Integer is 32-bit), values past 2^53 (wheref64silently loses integer precision — see the lesson below), subnormals,f64::MIN_POSITIVE. - Non-finite floats:
NaN,+inf,-inf— both as inputs and as results. Decide and test the intended behavior (propagate? reject with aDomainError? saturate?). - Sign and rounding: truncation-toward-zero vs floor, divisor-signed
modvs dividend-signedrem,sign(0) == 0, division by zero,sqrtof a negative. - Empty / degenerate structure: empty arrays, single-element arrays, zero connections, in-degree 0 vs 1 vs >1, self-loops, duplicate ids, missing required fields.
- Malformed input must produce a typed diagnostic, never a panic. Assert the specific
DiagCode/ error variant, not merely "an error occurred." Parsers and the resolver are total functions over arbitrary bytes: fuzz-grade hostility is the expectation.
Why this is non-negotiable — the C1 lesson. In M1-PR-1 the expression evaluator passed 22 tests and all CI gates while silently corrupting integer comparisons above 2^53 (relational ops routed
i64throughf64). No happy-path test caught it; an adversarial reviewer found it by reasoning about the type domain. The fix was to comparei64exactly — and to add a test at the boundary. Green CI is not evidence of correctness when the test that would fail was never written. Edge-case tests are how we write that test on purpose.
The engine's core contract is bit-identical determinism (CDL §7.16). A golden test turns that contract into an executable tripwire: a checked-in fixture, a checked-in expected output, and a comparison that fails if even one bit differs.
- Input fixtures live beside the crate (e.g.
crates/oce-cxf/tests/fixtures/*.jsonld). - Expected outputs are checked in next to them (e.g. a serialized
ModelGraph, a trace, a diagnostics list). They are reviewed artifacts — a diff to a golden file is a deliberate, scrutinized change, not noise. - Compare floats by bits, never with
==or an epsilon. UseValue::bit_eq(which comparesf64viato_bits, soNaN == NaNand+0.0 != -0.0as determinism demands). An epsilon comparison would mask exactly the drift a golden test exists to catch. - No snapshot magic. Goldens are explicit files compared by explicit code — reviewable and obvious. If a golden needs regenerating, do it deliberately and explain the diff in the PR.
Worked examples — the bar each upcoming PR must clear (none of these have landed yet; they are the targets this standard sets, not existing artifacts):
- PR-5 (resolver): golden
ModelGraphlowered fromminimal_loop.jsonld+ adecl_orderdeterminism golden + malformed-input edge cases each asserting theirDiagCode. - PR-6 (engine loop): golden converging trace for the feedback loop (
2.0 → … → 4.0), bit-exact at every tick. - PR-9 (arrays): round-trip goldens — preserved vs flattened forms compared bit-for-bit.
CDL has a normative reference (the Modelica Buildings library / OpenModelica). Where a block or expression has a reference result, cross-check against it rather than against our own re-derived expectation — otherwise we are grading our own homework.
- Oracle vectors will live in the
oce-conformancecrate — its planned role as the home for reference traces and the CDL §7.7.2 expression-semantics vectors (R10.x). (Today that crate is deferred:compare()is not yet implemented and it holds no vectors — this standard is what it gets built out to satisfy.) - Record the oracle's provenance (which tool, which version) alongside the vector so a future mismatch is debuggable.
- When no oracle exists for a construct, say so in the test and fall back to a hand-derived golden with the derivation documented.
Determinism is a testable property, not an assumption. For anything that ingests, orders, or executes:
- Re-run and byte-compare. Import/resolve the same fixture twice and assert the two
ModelGraphs are byte-identical (PR-11 will do this across the conformance corpus). This catchesHashMap-iteration-order leaks — ordering must derive from declaration/array order, never hash order. - Tick determinism. Run the same model for N ticks twice; the traces must be bit-equal.
- Diagnostic determinism. Diagnostics are sorted deterministically (e.g. by
ConnectorId); assert the order, not just the set.
A PR is not done until, for every unit of behavior it adds or changes:
- Edge cases enumerated and tested (boundaries, non-finite, sign, empty, malformed→typed-error).
- At least one golden where there is a meaningful output artifact (graph, trace, diagnostics).
- An oracle cross-check where a reference result exists.
- A determinism check where the code ingests, orders, or executes.
- Every error path asserts a specific typed variant /
DiagCode, and no input causes a panic (parsers/resolvers are total over arbitrary bytes).
Reviewers reject PRs that add behavior with only happy-path tests. "I couldn't think of an edge case" is itself a finding to resolve, not a pass.
- Test location: inline
#[cfg(test)] mod tests;for unit tests;crates/<crate>/tests/for integration tests;crates/<crate>/tests/fixtures/for input + golden files. - Float comparison:
Value::bit_eq(orf64::to_bits) — never==or(a-b).abs() < εin an engine assertion. - Error assertions: match the exact variant (
assert!(matches!(err, CxfError::Json(_)))), notis_err(). - No time/randomness in tests: deterministic inputs only; no wall-clock, no RNG.
- Name tests for the property, not the function:
mod_is_divisor_signed,resolve_is_byte_identical_across_two_imports,int_compare_is_exact_above_2_pow_53.
CI is dev-light / release-heavy (keep per-change PRs fast; save the heavy suite for releases):
| Gate | Trigger | Runs tests? |
|---|---|---|
ci.yml (light) |
PRs into development |
No engine tests — fmt, clippy -D warnings, build, rustdoc, file-size, no-secret, workspace-wide default-no-db, cargo-machete, stale crate-status header lint, gate-fixture smoke (+ cargo-deny on manifest change). |
release-gate.yml (heavy) |
PRs development -> main, daily cron against development, manual dispatch |
Yes — full nextest, release-codegen nextest, doctests, two armed per-crate public-api surface snapshots (oce-api and oce-store), plus a re-run of the light gates (including stale crate-status header lint) and an unconditional cargo-deny. |
advisories.yml |
Daily cron, manual dispatch | No — advisory/yanked scan only (cargo deny check advisories, yanked = "deny", ignore = []). |
Runner: cargo-nextest (pinned 0.9.133).
cargo nextest run --workspace # unit + integration tests (the whole workspace)
cargo nextest run --profile ci # reproduce the release gate's profile locally
cargo nextest run --profile ci --cargo-profile release # release-codegen panic-freedom pass
cargo test --workspace --doc # doctests — nextest CANNOT run these (separate step)
OCE_PUBLIC_API_NIGHTLY=nightly-2026-05-01 cargo test -p oce-api --test public_api --locked
OCE_PUBLIC_API_NIGHTLY=nightly-2026-05-01 cargo test -p oce-store --test public_api --lockednextest does not run doctests (a stable-Rust limitation). A complete local/CI test pass is therefore always two commands:
cargo nextest runandcargo test --doc.
The git hooks (pre-commit, pre-push) deliberately do not run tests — they stay fast.
Run the suite on demand when you touch behavior; the release gate and daily development-tip gate are
the enforcement points.
Profiles live in .config/nextest.toml: default for a fast local
fail-fast loop, ci for the gate (no fail-fast, no retries, slow-test guard). The release gate
passes --no-tests=fail, so a run that discovers zero tests hard-fails — catching the
regression where tests silently stop compiling or being discovered.
The armed public-API baselines are per crate. oce-api pins the embeddable facade, and oce-store
pins the re-exported storage port surface that the facade baseline sees only as an opaque
pub use oce_api::oce_store line. The release gate runs them as separate cargo nextest steps so a
renamed, ignored, or deleted test in either crate discovers zero tests and fails that crate's gate.