Conversation
* AbstractCodePointTrie allows code to be generic over both typed and untyped tries. * UTF-8 accessors allow optimal access from within a UTF-8 decoder. * Latin1 accessor allows optimal access with Latin1.
|
This now statically assumes small tries in the collator, which around a 5% perf win give or take a % point. Our data pipeline has never supported generating fast tries. Putting on the discussion agenda for this point. |
We discussed this and concluded that statically assuming small-mode tries is not a semver break given what the data tooling has been like. |
| Ok(self | ||
| .icuexport()? | ||
| .list(&format!("collation/{}", self.collation_root_han()))? | ||
| .filter(|name| !name.contains("POSIX")) // No known use cases |
There was a problem hiding this comment.
I'd like to know more about this. If it's actually not needed, then ICU shouldn't support and export it, and CLDR shouldn't have data for it.
I'd prefer removing it from CLDR, or from ICU export data. We should not remove things here, because then there's no way to include them, even if the data is there and the caller selects this.
There was a problem hiding this comment.
I thought we were asked about it recently? Maybe I'm misremembering...
There was a problem hiding this comment.
I think there was a comment from someone external. @sffc might remember? But yeah this is something CLDR can figure out
|
WG discussion:
|
#7528 needs to land first, but opening a PR to let reviewers take a look before that.
The key insight here is that by far the most comparisons stop at the first primary difference at the position of the first code unit difference, and that comparison is most often between simple collation unit types.