GEP 10: Units and Dimensionality (draft)#1193
Draft
MImmesberger wants to merge 25 commits into
Draft
Conversation
Draft GEP introducing pint-based unit handling to ttsim/gettsim: data-independent dimensionality checks, DM/EUR currency resolution, and a pint-backed reimplementation of the time-conversion arithmetic (keeping the GEP-1 suffix automation and naming). Status: Draft. Tracking issues: ttsim #117-#121, gettsim #1190-#1192. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documentation build overview
15 files changed ·
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
…abulary Record the design decisions from the ttsim implementation (#122–#126): - Drop the [count] dimension: counting quantities are dimensionless (SI/pint convention). Move it to Alternatives with the accepted trade-off (missing per-capita scaling is no longer a unit error). - Dimensionless quantities declare no unit: unit=None / unit: null. The spellings "dimensionless"/"count" are rejected — the unit vocabulary is closed at the token level (reject anything TTSIM does not know about). - Document the boolean exemption + two-run dry-run strategy and per-leaf units for heterogeneous dict parameters. - Align details with the implementation: source_currency lives at the parameter level; missing units fail at build (not decoration); schema copy migrates with the YAMLs in #1192. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Supersedes the pint-string declaration surface: unit= takes one member of a closed Unit enum (CURRENCY_FLOW, CURRENCY_STOCK, SHARE_FLOW, YEARS, ...). Flow tokens are completed by the name suffix or reference_period under a strict-coincidence rule (disagreement is an error, never precedence). null never combines with reference_period; the old pint-string scheme and its neighbouring design corners move to Alternatives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Align the GEP text with the implemented design from grilling session #3: - register_currency derives concrete declaration tokens (DM_*, EURO_*); the agnostic CURRENCY_* tokens denote the union of registered currencies. - Parameters must be concrete, columns/functions must stay agnostic; the separate source_currency: key is gone (moved to Alternatives with the reasons it fell). - Function-like parameter types declare input_unit:/output_unit: — one token per axis — with per-axis conversion semantics (bounds x f_in, intercepts x f_out, order-j coefficients x f_out/f_in^j). - Per-dated-entry unit overrides express a changeover within one parameter's history; updates_previous cannot cross one. - Derived nodes inherit the agnostic counterpart of a concrete source token; per-package schema enumeration; mettsim's 2020 currency reform cited as the end-to-end proof. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Abstract leads with what the GEP adds (incl. automatic conversions), not the problems; drop the build-time/boundary aside. - Terminology: simplify the unit-token entry and define 'core tokens'; split agnostic vs concrete currency tokens into independent entries; drop the 'flow' and 'dimensionless declaration' entries; plainer language. - Scope: drop the confusing 'time-agnostic' phrasing. - Currency section: rename away from 'knob'; state accurately that parameters are converted while input data is expected in the run currency (checked, not converted). - Errors: de-jargon; rewrite the 'time suffix on a complete token' case. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Lead the usage section with parameters/input columns as the source of units; cast a policy function's unit= as a checked restatement of the inferred result (GEP-9 style), with the period suffix as the author's real contribution. Keeps mandatory unit= on every node. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Abstract now states the mechanism (units declared on params/inputs, pint at build time baking factors, runtime untouched), which also removes the overlap with Motivation's problem statement. - Detailed Description: drop 'union semantics' and 'currency knob' wording; clarify that the run currency converts parameters while input data is taken to already be in it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Per review decision: a pint-tagged input column is converted to the run currency at the boundary (a DM-tagged column can feed a Euro run); bare columns are still assumed to be in the run currency. Add an explicit 'Units at the boundary' subsection stating the input contract (optional, converted) and that outputs are bare arrays, with pint-labelled outputs recorded as future work. NOTE: ttsim's Layer-2 boundary currently asserts (pint-120); it must be updated to convert to match this design before the stack merges. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The wired boundary converts a tagged column's currency to the run currency (period/area preserved); checking the tag against the column's declared unit would need that unit at the boundary, which the interface DAG cannot supply without a cycle, so it is recorded as future work.
The boundary checks a tag's period against the column's GEP-1 time suffix (read off the name, no declared unit needed) and converts currency only. Correct the earlier claim that full validation is blocked by a DAG cycle: only the spec env depends on the data, so the declared *dimension* check is deferred by choice, not impossible.
…_FLOW - Dimensionless quantities declare DIMENSIONLESS (was unit: null); the per-period form is DIMENSIONLESS_FLOW (renamed from SHARE_FLOW). No DIMENSIONLESS_STOCK — only currency forces the explicit _STOCK/_FLOW split. - Real gettsim example for DIMENSIONLESS_FLOW: zugangsfaktor_veränderung_pro_jahr. - Boundary: currency conversion + strict period guard; null moved to Alternatives. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… `null`) The DIMENSIONLESS rename had not reached the Detailed Description: six spots still spelled a dimensionless declaration as `null` (the overview, the scalar/dict/axis declaration rules, dict leaves), contradicting the terminology section and the shipped implementation. Reword them to `DIMENSIONLESS`, and drop the "declares no unit at all" phrasing that mixed the old null framing with the new token. `reference_period: null` and the Alternatives section (which documents `null` as the rejected design) are left as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Drop the _STOCK suffix (a currency stock is the bare CURRENCY token); remove the "kind" column and HECTARES from the vocabulary; use uppercase EUR. Scalars take their period from a name suffix, with reference_period reserved for integer-keyed dict leaves and schedule axes; document the dict union-over-dates mapping and a rename example. ANY/ALL aggregations are exempt, not dimensionless. Replace the literal-tagging idiom with the verify_units=False body opt-out plus DIMENSIONLESS-ordinal guidance. Add the parameter-rename migration note. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reframe the branch-coverage paragraph around data-independent coverage: units depend only on which branch exists, not on which data reaches it, so the dry-run explores every syntactic branch via a proxy + path explorer rather than dry-running booleans twice. Covers multi-condition guards, multiple guarded returns, and numeric-driven branches; the literal-zero guard arm stays unit-polymorphic (ttsim #134). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… missing-only Reflects the ttsim change (#117/#119/#121 stack): drop the structural exemption for identifiers and boolean nodes — both are dimensionless quantities and declare DIMENSIONLESS like any other node. Only group-creation functions and framework date nodes stay exempt, so UNSET_UNIT has a single meaning (no declaration = error). ANY/ALL aggregations auto-assign DIMENSIONLESS. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Group ids are identifiers (dimensionless); the @group_creation_function decorator exposes no unit=, so it is auto-assigned DIMENSIONLESS rather than exempted. Framework date nodes get their unit from the framework. Every active node now has a unit; only the source differs. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two operands of an addition, subtraction, or ordering comparison must be in equivalent units. At run time the assembled DAG computes on bare arrays with no pint, so mixing a monthly and a yearly flow (or a stock and a flow) is unit-blind and silently wrong; pint's build-time auto-conversion of same-dimension operands would otherwise mask it during the dry-run. Document this alongside an explicit list of what the dry-run can and cannot catch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Drop the dimensionless-result and ==/verify_units bullets: the dimensionless fallback is already in the branch-coverage paragraph, the forgotten-*count trade-off is in the [count] Alternatives, and verify_units is covered under Literals. Keep wrong-magnitudes and un-dry-runnable operations, which are stated nowhere else. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Restore the full "cannot catch" enumeration (wrong magnitudes, un-dry-runnable operations, dimensionless-but-wrong results incl. forgotten per-capita scaling, and ==/!= plus verify_units opt-outs) so the limitations live in one bulleted list rather than scattered across the branch-coverage paragraph, Literals, and the [count] Alternatives. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A body the dry-run cannot evaluate (piecewise polynomial, lookup table, join, raw xnp) is no longer silently trusted: the author must mark it verify_units=False, so every un-verified body is an explicit, greppable choice. Drop the forgotten-per-capita-scaling caveat (it follows naturally from counts being dimensionless) and the redundant verify_units bullet. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the terse branch-coverage paragraph with the underlying logic: run the body with units in place of numbers, read the unit off the return, and re-run to steer every reachable branch (path explorer). A small branchy betrag_m example shows the three reachable paths and why the count is reachable-paths, not 2^n. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fold repeated explanations (build-time-only, agnostic/concrete tokens, verify_units opt-out, counting-dimensionless) to a single home each; shorten "Usage and Impact" to a summary that links into the reference sections; add a period-source matrix and a Layer-1/Layer-2 table; note that a boundary tag built from an unknown token is rejected. No content removed; ~5500 -> ~4600 words. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft of GEP 10 — Units and Dimensionality for community review on Zulip.
Adopts pint to add, in the spirit of GEP 9:
Quantitys plus an edge-consistency pass at build time, with optional boundary assertions on user-supplied pint-tagged inputs;pint runs only at environment-build time and at the input boundary; it never wraps a live array, so the JAX/NumPy runtime and the GEP-9 type vocabulary are untouched.
Declaration model
Every node carries a
unit=token from a closed vocabulary:reference_periodsupplies the period:CURRENCY_FLOW,CURRENCY_STOCK,SHARE_FLOW,YEARS,HOURS_FLOW,SQUARE_METERS,HECTARES,CURRENCY_PER_SQUARE_METER_FLOW.nullis the dimensionless declaration; counts are dimensionless (no[count]dimension).register_currencyderives concrete declaration tokens per currency (DM_FLOW,EURO_STOCK, …). Parameters must name their legal currency; columns and functions may only use the agnosticCURRENCY_*tokens — that is what makes them provably currency-agnostic. For dimensionality checks every concrete token resolves to its agnostic counterpart; the concrete token additionally drives the build-time numeric conversion. A DM→EUR changeover is written as a per-entry unit override in the parameter's history.piecewise_*, lookup tables, phase-in/out) declareinput_unit:/output_unit:per axis, which resolves which of a schedule's numbers a currency conversion rescales.Status — implementation complete
The full implementation is pushed and CI-green; only review + Zulip acceptance remain. The text is deliberately ahead of the merge: no ttsim PR merges before the gettsim rollout (#1192) is ready.
ttsim-dev/ttsim#122(framework + tracer bullet),ttsim-dev/ttsim#123(time as a dimension),ttsim-dev/ttsim#124(mandatory units + edge check),ttsim-dev/ttsim#125(currency knob + Layer-2 boundary),ttsim-dev/ttsim#126(annotate mettsim + switch the check on + CI over all policy dates). mettsim is fully annotated as the end-to-end proof of the currency-agnostic / non-EUR path: a 2020 silver-penny→castar reform exercises the per-entry changeover and the run-currency knob.Reading order for reviewers
Conceptual spec is
docs/geps/gep-10.md— §Usage for the daily-driver view, §Currency for the union model. To see it run:ttsim-dev/ttsim#125tests/test_currency_knob.py(the DM/EUR mechanic, in mettsim's castar/silver_penny) →ttsim-dev/ttsim#126mettsim YAMLs (annotations at scale) → then the machinery bottom-up#122→#126. The DM/EUR examples in the doc are illustrative until the gettsim rollout (#1192) lands.Tracking issues:
ttsim-dev/ttsim#117,#118,#119,#120,#121🤖 Generated with Claude Code