Context
Follow-up to GEP-10 (units & dimensionality). The environment-level unit check
fail_if_environment_units_are_inconsistent dry-runs each @policy_function /
@param_function body with pint scalars (Quantity(1.0, unit)) and compares the
inferred unit against the declaration. See src/ttsim/interface_dag_elements/unit_checks.py
(_verify_one_body).
Today the dry-run explores only two paths: all boolean inputs True, and all boolean
inputs False (unit_checks.py:671). Because units are data-independent (data only
selects which branch runs, never the units), the right goal is to verify the unit contract
on every reachable syntactic path — independent of any input data. This is strictly
stronger than, and preferred over, asserting units against the test-suite's input data
(which couples coverage to whichever inputs happen to exist).
Gaps in the current two-run
- (a) Multiple independent booleans / multiple guarded returns. The two-run covers only
(all-True, all-False); combinations in between are unchecked.
- (b) Numeric-driven
if/ternary. The representative 1.0 picks one arm; the other arm
is never executed, so its unit contract is unverified.
Proposed work
Tier 1 — generalize the two-run to 2^N boolean enumeration
Enumerate boolean-input combinations (itertools.product) instead of just (all-True,
all-False). Closes gap (a). ~10 lines, cheap, cacheable per policy date, no false positives.
Tier 2 — path-exploring re-execution (concolic-lite)
Force every reachable path, including numeric-driven branches. Wrap representative values in
a proxy whose __bool__ / __gt__ / __lt__ / __eq__ return an oracle-controlled
bool, while the wrapped pint Quantity still carries its unit for the arithmetic. A
controller records the decision sequence and, across repeated runs, flips the last undriven
decision (DFS over the path tree) until all feasible paths are explored. Runs = number of
distinct paths, not 2^N-of-everything. (Essentially CrossHair without the SMT solver — we
don't need feasibility pruning, since we want to explore arms real data could never reach;
the unit contract must hold there too.) Closes gap (b).
Composes with the existing conservative rule: the if exempt: return 0.0 else: return <eur>
pattern stays benign because a dimensionless inference is already treated as
fallback-don't-flag (unit_checks.py:680-683); forcing both arms just adds the eur-arm check.
Caveats / non-goals
- Vectorized-op wall is unchanged. Bodies calling
piecewise_polynomial, join,
lookup tables, or raw xnp ops throw under pint and fall back to the declaration
(unit_checks.py:676-679). Neither tier can execute those. Only a fully static AST
checker could cross that wall, and only by hand-encoding unit rules per op — out of scope
(large build + permanent maintenance).
- Path explosion is theoretically exponential. Policy bodies are shallow in practice,
but Tier 2 must cap path count and log() any truncation rather than silently stopping.
- The literal-zero arm is unit-polymorphic by design; keep it as fallback, not a flag.
Suggested approach
Land Tier 1 first (nearly free, closes the most-cared-about gap). Treat Tier 2 as the real
target for true per-line, input-independent coverage.
Refs: GEP-10 (gettsim docs/geps/gep-10.md), parked design notes from the GEP-10 review session.
Context
Follow-up to GEP-10 (units & dimensionality). The environment-level unit check
fail_if_environment_units_are_inconsistentdry-runs each@policy_function/@param_functionbody with pint scalars (Quantity(1.0, unit)) and compares theinferred unit against the declaration. See
src/ttsim/interface_dag_elements/unit_checks.py(
_verify_one_body).Today the dry-run explores only two paths: all boolean inputs
True, and all booleaninputs
False(unit_checks.py:671). Because units are data-independent (data onlyselects which branch runs, never the units), the right goal is to verify the unit contract
on every reachable syntactic path — independent of any input data. This is strictly
stronger than, and preferred over, asserting units against the test-suite's input data
(which couples coverage to whichever inputs happen to exist).
Gaps in the current two-run
(all-True, all-False); combinations in between are unchecked.
if/ternary. The representative1.0picks one arm; the other armis never executed, so its unit contract is unverified.
Proposed work
Tier 1 — generalize the two-run to 2^N boolean enumeration
Enumerate boolean-input combinations (
itertools.product) instead of just (all-True,all-False). Closes gap (a). ~10 lines, cheap, cacheable per policy date, no false positives.
Tier 2 — path-exploring re-execution (concolic-lite)
Force every reachable path, including numeric-driven branches. Wrap representative values in
a proxy whose
__bool__/__gt__/__lt__/__eq__return an oracle-controlledbool, while the wrapped pint
Quantitystill carries its unit for the arithmetic. Acontroller records the decision sequence and, across repeated runs, flips the last undriven
decision (DFS over the path tree) until all feasible paths are explored. Runs = number of
distinct paths, not 2^N-of-everything. (Essentially CrossHair without the SMT solver — we
don't need feasibility pruning, since we want to explore arms real data could never reach;
the unit contract must hold there too.) Closes gap (b).
Composes with the existing conservative rule: the
if exempt: return 0.0 else: return <eur>pattern stays benign because a dimensionless inference is already treated as
fallback-don't-flag (
unit_checks.py:680-683); forcing both arms just adds the eur-arm check.Caveats / non-goals
piecewise_polynomial,join,lookup tables, or raw
xnpops throw under pint and fall back to the declaration(
unit_checks.py:676-679). Neither tier can execute those. Only a fully static ASTchecker could cross that wall, and only by hand-encoding unit rules per op — out of scope
(large build + permanent maintenance).
but Tier 2 must cap path count and
log()any truncation rather than silently stopping.Suggested approach
Land Tier 1 first (nearly free, closes the most-cared-about gap). Treat Tier 2 as the real
target for true per-line, input-independent coverage.
Refs: GEP-10 (gettsim docs/geps/gep-10.md), parked design notes from the GEP-10 review session.