feat(decimal): implement fraction grouping support compliant with TR35#7651
feat(decimal): implement fraction grouping support compliant with TR35#7651mega123-art wants to merge 13 commits intounicode-org:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds fractional-digit grouping support to icu_decimal (per UTS #35 / NIST guidance) by introducing a configurable fraction grouping size, implementing the grouping decision logic, and applying it during formatting with accompanying tests.
Changes:
- Added
fraction_grouping_size: Option<u8>toDecimalFormatterOptions. - Implemented
grouper::check_fractionand integrated it into fractional formatting output. - Added unit tests covering multiple fraction grouping sizes and Min2 suppression behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| components/decimal/src/provider.rs | Makes DecimalSymbols::new_en_for_testing() available under cfg(test) to support new unit tests. |
| components/decimal/src/options.rs | Introduces fraction_grouping_size option for opt-in fractional grouping behavior. |
| components/decimal/src/grouper.rs | Adds check_fraction algorithm and new fraction-grouping test cases. |
| components/decimal/src/decimal_formatter.rs | Inserts grouping separators into the fractional part when check_fraction indicates. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
components/decimal/src/options.rs
Outdated
| /// The size of groups in the fractional part of the number. | ||
| /// | ||
| /// If `None`, grouping separators will never be shown in the fractional part. | ||
| pub fraction_grouping_size: Option<u8>, |
There was a problem hiding this comment.
This should probably be Option<GroupingStrategy> since the grouping strategy options can equivalently be applied to the fraction part, correct?
There was a problem hiding this comment.
That makes sense. I've updated the field to fraction_grouping: Option<GroupingStrategy>. This allows the fractional part to respect the same Min2 logic as the integer part, which is core to the NIST/SI requirement of only grouping when there are at least 5 digits.
| ) && range.peek().is_some() | ||
| { | ||
| w.with_part(parts::GROUP, |w| { | ||
| w.write_str(self.symbols.grouping_separator()) |
There was a problem hiding this comment.
Should fractional grouping use the same symbol as integer grouping? Can you show where exactly in TR 35 this behavior is specified?
There was a problem hiding this comment.
Yes, they use the same symbol. TR 35 Part 3 specifies this in two places:
- In Section 3, Table 'Number Pattern Character Definitions', the
,(grouping separator) is defined as: 'May occur in both the integer part and the fractional part.' - In Section 3.10 'Number Patterns', it notes: 'The grouping separator may also occur in the fractional part, such as in “#,##0.###,#”. This is most commonly done where the grouping separator character is a thin, non-breaking space (U+202F)...'
There was a problem hiding this comment.
Those sections of the spec are talking about the pattern syntax. I'm asking about the localized grouping symbol. For example, are there any locales that use, say, a comma (,) for thousands but a NNBSP (U+202F) for fractional grouping? If the spec is not clear about this, we should bring it up with CLDR.
There was a problem hiding this comment.
there can only be a single grouping separator per locale: https://github.com/unicode-org/cldr/blob/main/common/main/root.xml#L3028
components/decimal/src/grouper.rs
Outdated
| min_grouping: u8, | ||
| grouping_size: u8, |
There was a problem hiding this comment.
nit: check accepts a GroupingSizes, check_fraction individual fields. please align
components/decimal/src/options.rs
Outdated
| /// The grouping strategy to use for the fractional part of the number. | ||
| /// | ||
| /// If `None`, grouping separators will never be shown in the fractional part. | ||
| pub fraction_grouping: Option<GroupingStrategy>, |
There was a problem hiding this comment.
question: do we actually want to expose this as a separate option? what's the use case for using a different grouping strategy for fractions than for integers?
There was a problem hiding this comment.
The main use case is scientific and technical formatting (e.g., NIST SP 811), where requirements for fractional grouping are often stricter or different than integer grouping.
For example, a user might want standard locale grouping for integers (1,000,000), but need to suppress fractional grouping (0.12345) or enforce the Min2 rule strictly for fractions (0.1234 not grouped, 0.12345 grouped), while keeping the integer part standard. separating the options provides the granularity needed to cover these mixed scenarios.
There was a problem hiding this comment.
but fraction grouping should only be used when the CLDR pattern explicitly opts in to it. it only makes sense when the grouping separator is a space, we don't want commas or periods in the fractional part
There was a problem hiding this comment.
for the NIST use case we would want a special constructor, it should not be achieved through options on localised patterns data
There was a problem hiding this comment.
I understand. The concern is that mixing locale data (commas) with fractional grouping options can easily produce invalid output.
However, DecimalFormatter needs some mechanism to enable fractional grouping when the data supports it (e.g. space separators). If we remove fraction_grouping from DecimalFormatterOptions, how should we expose this capability for locales that do use space separators?
Is the suggestion to make fraction_grouping strictly data-driven (i.e. if the pattern has grouping in the fraction part, we do it; otherwise we don't), without a user option to override?
Or should we hide fraction_grouping from the general options and only expose it via a specific constructor/setter for advanced use cases?
There was a problem hiding this comment.
Is the suggestion to make fraction_grouping strictly data-driven (i.e. if the pattern has grouping in the fraction part, we do it; otherwise we don't), without a user option to override?
Yes.
Or should we hide fraction_grouping from the general options and only expose it via a specific constructor/setter for advanced use cases?
I think we might need a constructor where you can set all the data-driven values (grouping separator, primary and secondary grouping lengths, fractional grouping, digits, signs), which can then be used to construct a DecimalFormatter for the NIST style.
There was a problem hiding this comment.
what's the use case for using a different grouping strategy for fractions than for integers?
It seems you'd want it on for integers and off for fractions, or on for integers and locale-default for fractions.
components/decimal/src/options.rs
Outdated
| fn from(grouping_strategy: GroupingStrategy) -> Self { | ||
| Self { | ||
| grouping_strategy: Some(grouping_strategy), | ||
| fraction_grouping: None, |
There was a problem hiding this comment.
issue: should this not set the fraction_grouping?
| self.symbols.grouping_sizes.min_grouping, | ||
| self.symbols.grouping_sizes.primary, |
There was a problem hiding this comment.
question: should this actually use the integer grouping values, or do we need to parse fractional grouping values from the patterns?
There was a problem hiding this comment.
TR 35 is unfortunately a bit vague, but typically fractional grouping is implied to mirror integer grouping size (usually 3). The spec says:
"The grouping separator may also occur in the fractional part, such as in “#,##0.###,#”. This is most commonly done where the grouping separator character is a thin, non-breaking space..."
In practice (and in CLDR data), fractional grouping sizes aren't explicitly defined separately from integer primary/secondary sizes in the standard patterns. So reusing the primary integer grouping size is the standard interpretation. If we want to support explicit fractional grouping sizes (e.g., grouping by 2s in fractions but 3s in integers), we'd likely need new data model support, but using the primary size covers the vast majority of cases (like the NIST 'group by 3' rule).
|
|
||
| // 5 digits fraction: grouping appears | ||
| let d5 = "12345.12345".parse().unwrap(); | ||
| assert_writeable_eq!(fmt.format(&d5), "12,345.123,45"); |
There was a problem hiding this comment.
this is not a desired output
There was a problem hiding this comment.
Agreed, grouping with commas in the fraction part is non-standard for general usage. This test case was intended to verify the programmatic control over the formatter (i.e., that fraction_grouping works), but I can update it to use a mock locale with space separators so the output is valid scientific notation (e.g., 12 345.123 45)
There was a problem hiding this comment.
the API should not allow constructing formatters that produce bogus output like this. this is worse than not having this functionality
components/decimal/src/options.rs
Outdated
| /// The grouping strategy to use for the fractional part of the number. | ||
| /// | ||
| /// If `None`, grouping separators will never be shown in the fractional part. | ||
| pub fraction_grouping: Option<GroupingStrategy>, |
There was a problem hiding this comment.
but fraction grouping should only be used when the CLDR pattern explicitly opts in to it. it only makes sense when the grouping separator is a space, we don't want commas or periods in the fractional part
- Remove fraction_grouping from DecimalFormatterOptions per reviewer feedback - Add fraction field to GroupingSizes for data-driven fraction grouping - Update check_fraction to use sizes.fraction (0 = no fraction grouping) - All CLDR baked data defaults to fraction: 0 (no fraction grouping) - Update tests to use thin space (U+202F) separator for valid NIST/SI output - Remove test_mixed_grouping that produced bogus comma-in-fraction output
Fix compilation in provider/source and ffi/capi that were missing the new fraction field in GroupingSizes.
- Update fingerprints.csv from cargo make bakeddata - Add fraction: 0 to provider/source and ffi/capi GroupingSizes - Regenerate hardcoded bn.blob in JS FFI test with new GroupingSizes format
- Add #[non_exhaustive] to GroupingSizes to prevent future semver breaks - Add GroupingSizes::new() constructor for external crate construction - Add semver-checks lint allow for constructible_struct_adds_field - Update provider/source and ffi/capi to use GroupingSizes::new() - Regenerate debug test data JSON files with fraction field
# Conflicts: # provider/source/data/debug/decimal/symbols/v1/arab/ar-EG.json # provider/source/data/debug/decimal/symbols/v1/en-001.json
This is needed because GroupingSizes is now non_exhaustive.
| /// The size of groups in the fractional part of the number. | ||
| /// | ||
| /// If 0, grouping separators will never be shown in the fractional part. | ||
| pub fraction: u8, |
There was a problem hiding this comment.
issue: this breaks deserialization of existing data. the way to solve this is to introduce a new data marker (V2) to use in try_new_unstable, and some code in try_new_with_buffer_provider that deserializes either the old or the new marker
There was a problem hiding this comment.
The use case is so super niche that I don't really want to disrupt data in 2.x, especially for a low-level component. We already know that zero locales have fractional grouping data. I prefer to copy the grouping sizes from the integer portion and leave an issue open to investigate this further and add data in 3.0.
components/decimal/src/provider.rs
Outdated
| #[cfg_attr(feature = "serde", derive(serde::Deserialize))] | ||
| #[cfg_attr(feature = "datagen", derive(serde::Serialize, databake::Bake))] | ||
| #[cfg_attr(feature = "datagen", databake(path = icu_decimal::provider))] | ||
| #[non_exhaustive] |
There was a problem hiding this comment.
code in provider is inherently unstable, so it doesn't need to be marked non-exhaustive
components/decimal/Cargo.toml
Outdated
|
|
||
| [package.metadata.cargo-semver-checks.lints] | ||
| workspace = true | ||
| constructible_struct_adds_field = "allow" |
There was a problem hiding this comment.
don't add these. we just ignore errors for provider modules
There was a problem hiding this comment.
Yaa actually I did these bec semver check wasn't passing
| 0x03, 0x04, 0x2d, 0x2b, 0x2e, 0x2c, 0x6c, 0x61, 0x74, 0x6e, 0x03, 0x02, | ||
| 0x01, 0x0e, 0x01, 0x01, 0x02, 0x02, 0x03, 0x04, 0x2d, 0x2b, 0x2e, 0x2c, | ||
| 0x6c, 0x61, 0x74, 0x6e, 0x03, 0x03, 0x01 | ||
| 0x03, 0x08, 0x31, 0x43, 0x45, 0x84, 0x99, 0x47, 0xd1, 0xf9, 0x35, 0x02, |
There was a problem hiding this comment.
this should not change, the old bytes need to keep working
|
@robertbastian Thanks for the review. I've reverted the Regarding the data compatibility:
|
|
I don't know, this is a tricky migration that is easier to do myself than explain to someone through Github. How about you merge the internal changes, leaving the datagen changes alone. You haven't implemented them anyway, you just set it to 0. |
|
Yes, I prefer implementing this without changing the data (which it seems like we can do) and keep any potential data changes to a standalone PR. |
This PR implements support for grouping fractional digits in
icu_decimal, following UTS #35 (LDML) and NIST/SI guidelines. fixes #7403Changes:
fraction_grouping_size: Option<u8>toDecimalFormatterOptions.check_fractioningrouper.rswith support formin_grouping(Min2 rule).format.rsto insert grouping separators in the fractional part when applicable.