Data struct with codepoint builder for radicals#7805
Data struct with codepoint builder for radicals#7805opnuub wants to merge 2 commits intounicode-org:mainfrom
Conversation
|
This is the third PR doing the same thing. There is significant discussion in #7646, why do you keep creating new PRs? |
because there's a merge conflict and I don't think I'm supposed to git rebase |
|
Ideally you'd resolve merge conflicts using a merge commit, but rebasing is better than creating a new PR that drops all previous context. |
Got it, thank you. I'll follow this practice in the future. |
| mod lstm; | ||
| mod radical; | ||
| pub use lstm::*; | ||
| pub use radical::*; |
There was a problem hiding this comment.
| mod lstm; | |
| mod radical; | |
| pub use lstm::*; | |
| pub use radical::*; | |
| mod lstm; | |
| #[cfg(feature = "unstable")] | |
| pub mod radical; | |
| pub use lstm::*; |
| ); | ||
|
|
||
| icu_provider::data_marker!( | ||
| /// `SegmenterUnihanRadicalV1` |
There was a problem hiding this comment.
@robertbastian commented on the old PR about these docs. Please write what the marker represents.
| use std::collections::HashMap; | ||
|
|
||
| fn load_irg_from_baked() -> &'static UnihanIrgData<'static> { | ||
| Baked::SINGLETON_SEGMENTER_UNIHAN_RADICAL_V1 |
There was a problem hiding this comment.
inline. you can even inline this all the way into Predictor::for_test
|
|
||
| let unihan_cache = self.unihan()?; | ||
| let ucd = self.ucd()?; | ||
| let irg_map = unihan_cache.irg_sources(ucd)?; |
There was a problem hiding this comment.
I still disagree with the meat of this implementation living outside this module, with the need of this having to be cached, and with the overhead of creating an intermediate LiteMap
|
|
||
| let unihan_cache = self.unihan()?; | ||
| let ucd = self.ucd()?; | ||
| let irg_map = unihan_cache.irg_sources(ucd)?; |
There was a problem hiding this comment.
you still have irg naming everywhere, these variables/functions should be called radicals now
| } | ||
|
|
||
| #[test] | ||
| fn test_chinese_irg_values_trie() { |
There was a problem hiding this comment.
this is a good test, because it tests the DataProvider<SegmenterUnihanRadicalV1> implementation. the test above test the same, however, but through accessing the internal cache; why?
Issue: #6941
Modified from #7722 and #7646
Changelog
icu_segmenter: Add unstable Unihan radical provider data and baked supporticu_segmenter::provider::UnihanIrgData<'data>,icu_segmenter::provider::SegmenterUnihanRadicalV1icu_segmenter::provider::Baked::SINGLETON_SEGMENTER_UNIHAN_RADICAL_V1experimental_segmenterexample now usesradaboostfor the Chinese radical model and addsthadaboostfor Thaiicu_provider_source: Add Unihan radical trie generation foricu_segmenter::provider::SegmenterUnihanRadicalV1SourceDataProvidercan now load this marker from Unihan IRG dataicu_provider_registry: Exporticu_segmenter::provider::SegmenterUnihanRadicalV1