Skip to content

Unit checks: data-independent branch coverage (Tier 1 enumeration + Tier 2 path explorer) #134

@MImmesberger

Description

@MImmesberger

Context

Follow-up to GEP-10 (units & dimensionality). The environment-level unit check
fail_if_environment_units_are_inconsistent dry-runs each @policy_function /
@param_function body with pint scalars (Quantity(1.0, unit)) and compares the
inferred unit against the declaration. See src/ttsim/interface_dag_elements/unit_checks.py
(_verify_one_body).

Today the dry-run explores only two paths: all boolean inputs True, and all boolean
inputs False (unit_checks.py:671). Because units are data-independent (data only
selects which branch runs, never the units), the right goal is to verify the unit contract
on every reachable syntactic path — independent of any input data. This is strictly
stronger than, and preferred over, asserting units against the test-suite's input data
(which couples coverage to whichever inputs happen to exist).

Gaps in the current two-run

  • (a) Multiple independent booleans / multiple guarded returns. The two-run covers only
    (all-True, all-False); combinations in between are unchecked.
  • (b) Numeric-driven if/ternary. The representative 1.0 picks one arm; the other arm
    is never executed, so its unit contract is unverified.

Proposed work

Tier 1 — generalize the two-run to 2^N boolean enumeration

Enumerate boolean-input combinations (itertools.product) instead of just (all-True,
all-False). Closes gap (a). ~10 lines, cheap, cacheable per policy date, no false positives.

Tier 2 — path-exploring re-execution (concolic-lite)

Force every reachable path, including numeric-driven branches. Wrap representative values in
a proxy whose __bool__ / __gt__ / __lt__ / __eq__ return an oracle-controlled
bool, while the wrapped pint Quantity still carries its unit for the arithmetic. A
controller records the decision sequence and, across repeated runs, flips the last undriven
decision (DFS over the path tree) until all feasible paths are explored. Runs = number of
distinct paths, not 2^N-of-everything. (Essentially CrossHair without the SMT solver — we
don't need feasibility pruning, since we want to explore arms real data could never reach;
the unit contract must hold there too.) Closes gap (b).

Composes with the existing conservative rule: the if exempt: return 0.0 else: return <eur>
pattern stays benign because a dimensionless inference is already treated as
fallback-don't-flag (unit_checks.py:680-683); forcing both arms just adds the eur-arm check.

Caveats / non-goals

  • Vectorized-op wall is unchanged. Bodies calling piecewise_polynomial, join,
    lookup tables, or raw xnp ops throw under pint and fall back to the declaration
    (unit_checks.py:676-679). Neither tier can execute those. Only a fully static AST
    checker could cross that wall, and only by hand-encoding unit rules per op — out of scope
    (large build + permanent maintenance).
  • Path explosion is theoretically exponential. Policy bodies are shallow in practice,
    but Tier 2 must cap path count and log() any truncation rather than silently stopping.
  • The literal-zero arm is unit-polymorphic by design; keep it as fallback, not a flag.

Suggested approach

Land Tier 1 first (nearly free, closes the most-cared-about gap). Treat Tier 2 as the real
target for true per-line, input-independent coverage.

Refs: GEP-10 (gettsim docs/geps/gep-10.md), parked design notes from the GEP-10 review session.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions