Skip to content

GEP 10: Units and Dimensionality (draft)#1193

Draft
MImmesberger wants to merge 25 commits into
mainfrom
gep-10-units-and-dimensionality
Draft

GEP 10: Units and Dimensionality (draft)#1193
MImmesberger wants to merge 25 commits into
mainfrom
gep-10-units-and-dimensionality

Conversation

@MImmesberger

@MImmesberger MImmesberger commented Jun 3, 2026

Copy link
Copy Markdown
Member

Draft of GEP 10 — Units and Dimensionality for community review on Zulip.

Adopts pint to add, in the spirit of GEP 9:

  • data-independent dimensionality checks across the DAG — two layers: per-function bodies run on synthetic Quantitys plus an edge-consistency pass at build time, with optional boundary assertions on user-supplied pint-tagged inputs;
  • automatic currency resolution (DM↔EUR) — historical values stored in their legal currency and converted to the run currency at build time, removing the manual conversions behind Add RV Kenngrößen #1174;
  • a pint-backed reimplementation of the time-conversion arithmetic, keeping the GEP-1 suffix automation and law-to-code naming unchanged.

pint runs only at environment-build time and at the input boundary; it never wraps a live array, so the JAX/NumPy runtime and the GEP-9 type vocabulary are untouched.

Declaration model

Every node carries a unit= token from a closed vocabulary:

  • Kind tokens, period-abstracted so the name suffix / reference_period supplies the period: CURRENCY_FLOW, CURRENCY_STOCK, SHARE_FLOW, YEARS, HOURS_FLOW, SQUARE_METERS, HECTARES, CURRENCY_PER_SQUARE_METER_FLOW. null is the dimensionless declaration; counts are dimensionless (no [count] dimension).
  • Currency as a union. register_currency derives concrete declaration tokens per currency (DM_FLOW, EURO_STOCK, …). Parameters must name their legal currency; columns and functions may only use the agnostic CURRENCY_* tokens — that is what makes them provably currency-agnostic. For dimensionality checks every concrete token resolves to its agnostic counterpart; the concrete token additionally drives the build-time numeric conversion. A DM→EUR changeover is written as a per-entry unit override in the parameter's history.
  • Function-like parameters (piecewise_*, lookup tables, phase-in/out) declare input_unit: / output_unit: per axis, which resolves which of a schedule's numbers a currency conversion rescales.

Status — implementation complete

The full implementation is pushed and CI-green; only review + Zulip acceptance remain. The text is deliberately ahead of the merge: no ttsim PR merges before the gettsim rollout (#1192) is ready.

Reading order for reviewers

Conceptual spec is docs/geps/gep-10.md — §Usage for the daily-driver view, §Currency for the union model. To see it run: ttsim-dev/ttsim#125 tests/test_currency_knob.py (the DM/EUR mechanic, in mettsim's castar/silver_penny) → ttsim-dev/ttsim#126 mettsim YAMLs (annotations at scale) → then the machinery bottom-up #122#126. The DM/EUR examples in the doc are illustrative until the gettsim rollout (#1192) lands.

Tracking issues:

🤖 Generated with Claude Code

Draft GEP introducing pint-based unit handling to ttsim/gettsim:
data-independent dimensionality checks, DM/EUR currency resolution,
and a pint-backed reimplementation of the time-conversion arithmetic
(keeping the GEP-1 suffix automation and naming). Status: Draft.

Tracking issues: ttsim #117-#121, gettsim #1190-#1192.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@read-the-docs-community

read-the-docs-community Bot commented Jun 3, 2026

Copy link
Copy Markdown

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

MImmesberger and others added 24 commits June 5, 2026 16:38
…abulary

Record the design decisions from the ttsim implementation (#122#126):

- Drop the [count] dimension: counting quantities are dimensionless
  (SI/pint convention). Move it to Alternatives with the accepted
  trade-off (missing per-capita scaling is no longer a unit error).
- Dimensionless quantities declare no unit: unit=None / unit: null.
  The spellings "dimensionless"/"count" are rejected — the unit
  vocabulary is closed at the token level (reject anything TTSIM
  does not know about).
- Document the boolean exemption + two-run dry-run strategy and
  per-leaf units for heterogeneous dict parameters.
- Align details with the implementation: source_currency lives at the
  parameter level; missing units fail at build (not decoration);
  schema copy migrates with the YAMLs in #1192.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Supersedes the pint-string declaration surface: unit= takes one member of
a closed Unit enum (CURRENCY_FLOW, CURRENCY_STOCK, SHARE_FLOW, YEARS, ...).
Flow tokens are completed by the name suffix or reference_period under a
strict-coincidence rule (disagreement is an error, never precedence).
null never combines with reference_period; the old pint-string scheme and
its neighbouring design corners move to Alternatives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Align the GEP text with the implemented design from grilling session #3:

- register_currency derives concrete declaration tokens (DM_*, EURO_*);
  the agnostic CURRENCY_* tokens denote the union of registered currencies.
- Parameters must be concrete, columns/functions must stay agnostic;
  the separate source_currency: key is gone (moved to Alternatives with
  the reasons it fell).
- Function-like parameter types declare input_unit:/output_unit: — one
  token per axis — with per-axis conversion semantics (bounds x f_in,
  intercepts x f_out, order-j coefficients x f_out/f_in^j).
- Per-dated-entry unit overrides express a changeover within one
  parameter's history; updates_previous cannot cross one.
- Derived nodes inherit the agnostic counterpart of a concrete source
  token; per-package schema enumeration; mettsim's 2020 currency reform
  cited as the end-to-end proof.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Abstract leads with what the GEP adds (incl. automatic conversions),
  not the problems; drop the build-time/boundary aside.
- Terminology: simplify the unit-token entry and define 'core tokens';
  split agnostic vs concrete currency tokens into independent entries;
  drop the 'flow' and 'dimensionless declaration' entries; plainer language.
- Scope: drop the confusing 'time-agnostic' phrasing.
- Currency section: rename away from 'knob'; state accurately that
  parameters are converted while input data is expected in the run
  currency (checked, not converted).
- Errors: de-jargon; rewrite the 'time suffix on a complete token' case.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead the usage section with parameters/input columns as the source of
units; cast a policy function's unit= as a checked restatement of the
inferred result (GEP-9 style), with the period suffix as the author's
real contribution. Keeps mandatory unit= on every node.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Abstract now states the mechanism (units declared on params/inputs,
  pint at build time baking factors, runtime untouched), which also
  removes the overlap with Motivation's problem statement.
- Detailed Description: drop 'union semantics' and 'currency knob'
  wording; clarify that the run currency converts parameters while
  input data is taken to already be in it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review decision: a pint-tagged input column is converted to the run
currency at the boundary (a DM-tagged column can feed a Euro run); bare
columns are still assumed to be in the run currency. Add an explicit
'Units at the boundary' subsection stating the input contract (optional,
converted) and that outputs are bare arrays, with pint-labelled outputs
recorded as future work.

NOTE: ttsim's Layer-2 boundary currently asserts (pint-120); it must be
updated to convert to match this design before the stack merges.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The wired boundary converts a tagged column's currency to the run currency
(period/area preserved); checking the tag against the column's declared
unit would need that unit at the boundary, which the interface DAG cannot
supply without a cycle, so it is recorded as future work.
The boundary checks a tag's period against the column's GEP-1 time suffix
(read off the name, no declared unit needed) and converts currency only.
Correct the earlier claim that full validation is blocked by a DAG cycle:
only the spec env depends on the data, so the declared *dimension* check is
deferred by choice, not impossible.
…_FLOW

- Dimensionless quantities declare DIMENSIONLESS (was unit: null); the
  per-period form is DIMENSIONLESS_FLOW (renamed from SHARE_FLOW). No
  DIMENSIONLESS_STOCK — only currency forces the explicit _STOCK/_FLOW split.
- Real gettsim example for DIMENSIONLESS_FLOW: zugangsfaktor_veränderung_pro_jahr.
- Boundary: currency conversion + strict period guard; null moved to Alternatives.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… `null`)

The DIMENSIONLESS rename had not reached the Detailed Description: six
spots still spelled a dimensionless declaration as `null` (the overview,
the scalar/dict/axis declaration rules, dict leaves), contradicting the
terminology section and the shipped implementation. Reword them to
`DIMENSIONLESS`, and drop the "declares no unit at all" phrasing that
mixed the old null framing with the new token. `reference_period: null`
and the Alternatives section (which documents `null` as the rejected
design) are left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the _STOCK suffix (a currency stock is the bare CURRENCY token); remove
the "kind" column and HECTARES from the vocabulary; use uppercase EUR. Scalars
take their period from a name suffix, with reference_period reserved for
integer-keyed dict leaves and schedule axes; document the dict union-over-dates
mapping and a rename example. ANY/ALL aggregations are exempt, not dimensionless.
Replace the literal-tagging idiom with the verify_units=False body opt-out plus
DIMENSIONLESS-ordinal guidance. Add the parameter-rename migration note.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reframe the branch-coverage paragraph around data-independent coverage:
units depend only on which branch exists, not on which data reaches it, so
the dry-run explores every syntactic branch via a proxy + path explorer
rather than dry-running booleans twice. Covers multi-condition guards,
multiple guarded returns, and numeric-driven branches; the literal-zero
guard arm stays unit-polymorphic (ttsim #134).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… missing-only

Reflects the ttsim change (#117/#119/#121 stack): drop the structural
exemption for identifiers and boolean nodes — both are dimensionless
quantities and declare DIMENSIONLESS like any other node. Only
group-creation functions and framework date nodes stay exempt, so
UNSET_UNIT has a single meaning (no declaration = error). ANY/ALL
aggregations auto-assign DIMENSIONLESS.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Group ids are identifiers (dimensionless); the @group_creation_function
decorator exposes no unit=, so it is auto-assigned DIMENSIONLESS rather
than exempted. Framework date nodes get their unit from the framework.
Every active node now has a unit; only the source differs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two operands of an addition, subtraction, or ordering comparison must be in
equivalent units. At run time the assembled DAG computes on bare arrays with no
pint, so mixing a monthly and a yearly flow (or a stock and a flow) is unit-blind
and silently wrong; pint's build-time auto-conversion of same-dimension operands
would otherwise mask it during the dry-run. Document this alongside an explicit
list of what the dry-run can and cannot catch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the dimensionless-result and ==/verify_units bullets: the dimensionless
fallback is already in the branch-coverage paragraph, the forgotten-*count
trade-off is in the [count] Alternatives, and verify_units is covered under
Literals. Keep wrong-magnitudes and un-dry-runnable operations, which are
stated nowhere else.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the full "cannot catch" enumeration (wrong magnitudes, un-dry-runnable
operations, dimensionless-but-wrong results incl. forgotten per-capita scaling,
and ==/!= plus verify_units opt-outs) so the limitations live in one bulleted
list rather than scattered across the branch-coverage paragraph, Literals, and
the [count] Alternatives.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A body the dry-run cannot evaluate (piecewise polynomial, lookup table, join,
raw xnp) is no longer silently trusted: the author must mark it
verify_units=False, so every un-verified body is an explicit, greppable choice.
Drop the forgotten-per-capita-scaling caveat (it follows naturally from counts
being dimensionless) and the redundant verify_units bullet.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the terse branch-coverage paragraph with the underlying logic: run the
body with units in place of numbers, read the unit off the return, and re-run
to steer every reachable branch (path explorer). A small branchy betrag_m
example shows the three reachable paths and why the count is reachable-paths,
not 2^n.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold repeated explanations (build-time-only, agnostic/concrete tokens,
verify_units opt-out, counting-dimensionless) to a single home each;
shorten "Usage and Impact" to a summary that links into the reference
sections; add a period-source matrix and a Layer-1/Layer-2 table; note
that a boundary tag built from an unknown token is rejected. No content
removed; ~5500 -> ~4600 words.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant