Conversation
|
0404a2e to
42ca8a2
Compare
Conversion factors are stored as integer Numerator/Denominator pairs. ConvertUnitValue combines source and target factors over a common denominator and delegates to ApplyUnitConversion, which computes (value × numerator + offset) / denominator using Number arithmetic.
42ca8a2 to
4fc61a6
Compare
Split the offset argument into offsetNumerator/offsetDenominator and reduce conversionNumerator/conversionDenominator by their GCD before the floating-point arithmetic. ConvertUnitValue now passes the scale factor and offset as independent rationals.
gibson042
left a comment
There was a problem hiding this comment.
As I mentioned in the ECMA-402 meeting, I think this is doing far too much. If we pursue such an approach, I would instead expect it to take the form of expression-aware consumption of units.xml, from which point the specification could either require mathematical value calculations with a final conversion to Number or an up-front elementary arithmetic symbolic reduction followed by Number calculations. And in neither case would I expect to duplicate CLDR contents into large tables in ECMA specifications.
For a maximally complex example (i.e., involving both a non-unit factor and a non-zero offset), consider converting 100°C to Fahrenheit, which relies upon the following definitions in reverse order:
<convertUnit source='fahrenheit' baseUnit='kelvin' factor='5/9' offset='2298.35/9' systems="ussystem uksystem"/>
<convertUnit source='celsius' baseUnit='kelvin' offset='273.15' systems="si metric"/>- A naïve Number calculation gives the wrong answer:
((celsiusInput * (celsius.factor ?? 1) + (celsius.offset ?? 0)) - (fahrenheit.offset ?? 0)) / (fahrenheit.factor ?? 1)=((100 * (undefined ?? 1) + (273.15 ?? 0)) - (2298.35/9 ?? 0)) / (5/9 ?? 1)=((100 * 1 + 273.15) - 2298.35/9) / (5/9)=211.99999999999997= 211.99999999999997𝔽. - A mathematical value calculation gives the right answer: ((100 × 1 + 273.15) − 2298.35÷9) ÷ (5÷9) = ℝ(212) → 212𝔽.
- Number calculation of the elementary arithmetic reduction of the expression also gives the right answer: ((celsiusInput × 1 + 273.15) - 2298.35÷9) ÷ (5÷9) = ((celsiusInput × 9 + 273.15 × 9) - 2298.35) ÷ 5 = (celsiusInput × 9 + 160) ÷ 5 = celsiusInput × 1.8 + 32 →
100 * 1.8 + 32=212= 212𝔽.
Further, because arithmetic reduction in the mathematical value domain is guaranteed to be result-preserving, it is equally applicable to both sensible approaches—the above mathematical value calculation is exactly equivalent to 100 × 1.8 + 32 = ℝ(212) → 212𝔽. And since only linear conversions are in scope, that means conversion requires at most one multiplication and one addition—and even further still, given the current contents of units.xml, addition is necessary only for temperature conversions involving Celsius and/or Fahrenheit (every other non-special conversion is possible with just a single multiplication).
However, note that the mathematical value calculation and Number calculation of the elementary arithmetic reduction approaches are not definitionally equivalent—converting 80063993375475600°C with the former would produce 144115188075856128𝔽 (ℝ(144115188075856112) being snapped per the Number value for x from exactly halfway between two Number values with a mutual separation of 32 to that [higher] one with even significand), while the latter would produce 144115188075856096𝔽 (80063993375475600 * 1.8 first snapping midpoint ℝ(144115188075856080) down to even-significand 144115188075856064𝔽 in Number::multiply and then the subsequent + 32 effecting no further rounding).
Note also than even mathematical perfection will not eliminate surprises with rounding modes that are not based on "nearest" behavior (because e.g. 7 inches converts to 0.5833333333333333𝔽 feet, which converts to 6.999999999999999𝔽 inches). But I don't think such repeated conversions are in scope, which means both approaches are equally viable, especially given that only temperature conversions would be subject to intermediate rounding in the Number-calculations one, and even then the error would be like any other binary64 operation. What we're left with seems like immaterial end-user differences and a tradeoff between easy-to-specify mathematical value operations and easy-to-implement Number operations.
|
An earlier version of the spec (now since squashed out of existence) was literally nothing more than |
Remove embedded conversion factor tables from spec.emu and intl.emu, reference CLDR units.xml instead. Add CreateFormatterObject AO and resolve FormatNumericToString TODOs. Remove 402-specific convertTo.
|
@gibson042 Thanks for the review. I've reworked this quite a bit. The CLDR data tables have been removed entirely. In their place, the spec now normatively references CLDR's units.xml The approach is your option (B): reduction in Number-land. Agree that fidelity of multi-step conversions might still result in some loss of precision. In particular, we make no round-trip guarantees. But this is better than naive calculation with JS floats. I also took the opportunity to clean up things the 402 side, too, as discussed in the call. The |
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
Co-authored-by: Richard Gibson <richard.gibson@gmail.com>
There was a problem hiding this comment.
Looks like this file needs to be removed, and package.json and package-lock.json reverted.
| <emu-clause id="sec-amount-unit-conversion-data"> | ||
| <h1>Unit Conversion Data</h1> | ||
| <p>Unit conversion data is derived from CLDR file <a href="https://github.com/unicode-org/cldr/blob/main/common/supplemental/units.xml"><code>units.xml</code></a>. As described in <a href="https://unicode.org/reports/tr35/tr35-info.html#conversion-data">Unicode Technical Standard #35 Part 6 Supplemental, Conversion Data</a>, each <code><convertUnit></code> element defines how to convert a <code>source</code> unit into a compatible <code>baseUnit</code>. An ECMAScript implementation must ignore all <code>special</code> conversions and support all conversions based on <code>factor</code> and/or <code>offset</code>, interpreting the value for each as an arithmetic expression with mathematical value operands (noting the respective defaults of 1 and 0 and the implicit presence of an identity mapping for each unit identified as the value of a <code>baseUnit</code>).</p> | ||
| <p>Two units are in the same <dfn id="dfn-unit-category">unit category</dfn> if and only if they share the same <code>baseUnit</code> value in CLDR. A <dfn id="dfn-base-unit">base unit</dfn> is any unit that appears as a <code>baseUnit</code> value; its conversion factor is 1 and its offset is 0.</p> |
There was a problem hiding this comment.
I'm not going to push too hard on this here, but I don't think I see sufficient value in defining "unit category" and "base unit" terms.
| 1. If the CLDR unit conversion data specifies a conversion offset for _unit_, let _offset_ be the Number value closest to that rational offset; otherwise, let _offset_ be *+0*<sub>𝔽</sub>. | ||
| 1. If _unit_ is the <code>source</code> of a <code><convertUnit></code> element in the <emu-xref href="#sec-amount-unit-conversion-data">unit conversion data</emu-xref>, then | ||
| 1. Let _element_ be that <code><convertUnit></code> element. | ||
| 1. Let _category_ be the <code>baseUnit</code> of _element_. |
There was a problem hiding this comment.
Seeing this in practice, I think we should use baseUnit values directly rather than introducing a "category" concept.
There was a problem hiding this comment.
...but looking ahead, we will want the actual category for ECMA-402 unit conversion. That should come from a reference to UTS #35 Part 6 Supplemental, Compute the category.
| 1. Let _factor_ be the Number value closest to the rational conversion factor for _unit_ as specified by the CLDR unit conversion data. | ||
| 1. If the CLDR unit conversion data specifies a conversion offset for _unit_, let _offset_ be the Number value closest to that rational offset; otherwise, let _offset_ be *+0*<sub>𝔽</sub>. | ||
| 1. If _unit_ is the <code>source</code> of a <code><convertUnit></code> element in the <emu-xref href="#sec-amount-unit-conversion-data">unit conversion data</emu-xref>, then | ||
| 1. Let _element_ be that <code><convertUnit></code> element. |
There was a problem hiding this comment.
| 1. Let _element_ be that <code><convertUnit></code> element. | |
| 1. Let _element_ be that <code><convertUnit></code> element. | |
| 1. If _element_ has an attribute <code>special</code>, throw a *TypeError* exception. |
| 1. Let _factor_ be 1. | ||
| 1. Let _offset_ be 0. | ||
| 1. Else, | ||
| 1. Throw a *RangeError* exception. |
There was a problem hiding this comment.
I think TypeError is a better fit for attempts to convert unknown units.
| 1. Throw a *RangeError* exception. | |
| 1. Throw a *TypeError* exception. |
| 1. Let _sourceConv_ be ? GetUnitConversionFactor(_sourceUnit_). | ||
| 1. Let _targetConv_ be ? GetUnitConversionFactor(_targetUnit_). | ||
| 1. If SameValue(_sourceConv_.[[Category]], _targetConv_.[[Category]]) is *false*, throw a *RangeError* exception. | ||
| 1. If _sourceConv_.[[Category]] is not _targetConv_.[[Category]], throw a *RangeError* exception. |
There was a problem hiding this comment.
Here as well.
| 1. If _sourceConv_.[[Category]] is not _targetConv_.[[Category]], throw a *RangeError* exception. | |
| 1. If _sourceConv_.[[Category]] is not _targetConv_.[[Category]], throw a *TypeError* exception. |
| 1. If _sourceOffset_ = 0 and _targetOffset_ = 0, then | ||
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, return _value_. | ||
| 1. Else, | ||
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, set _value_ to *+0*<sub>𝔽</sub>. | ||
| 1. Return _value_ × 𝔽(_sourceFactor_ / _targetFactor_) + 𝔽((_sourceOffset_ - _targetOffset_) / _targetFactor_). | ||
| </emu-alg> | ||
| <emu-note> | ||
| <p>The conversion from _sourceUnit_ to _targetUnit_ through the base unit is: _result_ = _value_ × _sourceFactor_ / _targetFactor_ + (_sourceOffset_ − _targetOffset_) / _targetFactor_. The expressions _sourceFactor_ / _targetFactor_ and (_sourceOffset_ − _targetOffset_) / _targetFactor_ are computed as mathematical values, and only the multiplication and addition are performed as Number operations. For non-offset conversions (the vast majority), the second term is 𝔽(0) = *+0*<sub>𝔽</sub> and the addition is a no-op.</p> |
There was a problem hiding this comment.
Multiplication should preserve negative zero, but there's no need for this much special-casing.
| 1. If _sourceOffset_ = 0 and _targetOffset_ = 0, then | |
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, return _value_. | |
| 1. Else, | |
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, set _value_ to *+0*<sub>𝔽</sub>. | |
| 1. Return _value_ × 𝔽(_sourceFactor_ / _targetFactor_) + 𝔽((_sourceOffset_ - _targetOffset_) / _targetFactor_). | |
| </emu-alg> | |
| <emu-note> | |
| <p>The conversion from _sourceUnit_ to _targetUnit_ through the base unit is: _result_ = _value_ × _sourceFactor_ / _targetFactor_ + (_sourceOffset_ − _targetOffset_) / _targetFactor_. The expressions _sourceFactor_ / _targetFactor_ and (_sourceOffset_ − _targetOffset_) / _targetFactor_ are computed as mathematical values, and only the multiplication and addition are performed as Number operations. For non-offset conversions (the vast majority), the second term is 𝔽(0) = *+0*<sub>𝔽</sub> and the addition is a no-op.</p> | |
| 1. If _sourceOffset_ is _targetOffset_, then | |
| 1. NOTE: This preserves a _value_ of *-0*<sub>𝔽</sub>. | |
| 1. Return _value_ × 𝔽(_sourceFactor_ / _targetFactor_). | |
| 1. Return _value_ × 𝔽(_sourceFactor_ / _targetFactor_) + 𝔽((_sourceOffset_ - _targetOffset_) / _targetFactor_). | |
| </emu-alg> | |
| <emu-note> | |
| <p>The conversion from _sourceUnit_ to _targetUnit_ through the base unit is: _result_ = _value_ × (_sourceFactor_ / _targetFactor_) + (_sourceOffset_ − _targetOffset_) / _targetFactor_. The subexpressions _sourceFactor_ / _targetFactor_ and (_sourceOffset_ − _targetOffset_) / _targetFactor_ are computed as mathematical values, and then converted to Number values for the multiplication and addition. For non-offset conversions (the vast majority), the offset term is 0 and addition is skipped in order to preserve an input _value_ of *-0*<sub>𝔽</sub>.</p> |
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, return _value_. | ||
| 1. Else, | ||
| 1. If _value_ is *+0*<sub>𝔽</sub> or _value_ is *-0*<sub>𝔽</sub>, set _value_ to *+0*<sub>𝔽</sub>. | ||
| 1. Return _value_ × 𝔽(_sourceFactor_ / _targetFactor_) + 𝔽((_sourceOffset_ - _targetOffset_) / _targetFactor_). |
There was a problem hiding this comment.
Question: should this be (A)
𝔽(_sourceFactor_ / _targetFactor_)
or should it be (B)
(𝔽(_sourceFactor_) / 𝔽(_targetFactor_))
(A) means that engines need to perform MV steps before they perform the float steps, but this is for a fixed number of unit pairs so it could be theoretically cached.
(B) means that we are exposing a greater degree of floating point arithmetic madness, but it is probably easier to implement, since every unit can have a single pre-computed 64-bit number value.
Note: here is an example where (A) and (B) are different:
1e5 / 0.3
// 333333.3333333334
333333.333333333333
// 333333.3333333333I haven't yet found an example using actual CLDR conversion data but I haven't done an exhaustive search.
There was a problem hiding this comment.
It should be (A). With (B), 18 inches would convert to 1.5000000000000002 feet.
There was a problem hiding this comment.
It most certainly should be (A); that's the very purpose here. Converting 5 inches to feet should behave like 5 * (1/12) (0.41666666666666663), not like 5 * ((0.3048/12) / 0.3048) (0.41666666666666674).
Implementations are free to optimize performance by caching binary64 values for conversion pairs and even to compile in pre-computed values since there are so few (I count 155 <convertUnit> elements, split into 39 shared-baseUnit groups, most of which include just a single element and the largest of which includes 31, for a total of no more than 2276 conversion factors [accounting for forward and reverse permutations] but in practice more like 2028 because of the many unit factors—and even fewer if complex and/or trivial special cases are separated).
There was a problem hiding this comment.
It's a good point to avoid repeated factors like that.
The ICU implementation, I believe, cancels out identical factors and then performs floating point arithmetic on the remaining factors, although it might have the option of using a decimal library instead. Based on the direction of the proposal being Number-centric, it would be best to avoid requiring an engine to figure out how to do this math in MV space.
2276 factors to hard-code is a lot, and it will grow quadratically when we add more units.
There was a problem hiding this comment.
Thought: could we somehow specify that the factor computation uses something like Math.mulPrecise? (we have Math.sumPrecise since https://github.com/tc39/proposal-math-sum)
There was a problem hiding this comment.
Observation: we only have a problem when there are more than 2 non-identical factors. Also, factors of 10 are fairly easy for an engine to special-case.
Do we have an idea of how many convertible unit pairs involve 3 or more non-equal, non-power-of-10 factors?
Adds a lightweight unit conversion system ("baby CAS") to the Amount proposal. The spec text introduces a static conversion factor table covering length, mass, volume, temperature, area, speed, concentration, and digital units, along with the ConvertUnitValue abstract operation that performs exact rational arithmetic over these factors. A TypeScript generator script (
scripts/generate-conversion-table.ts) derives the ecmarkup table rows from CLDR unit data, ensuring the spec table stays in sync with upstream definitions.