Skip to content

Commit 164e249

Browse files
jabagaweemarkusicu
authored andcommitted
ICU-23360 Deduplicate subdivision suffix strings in ValiditySet
When ValiditySet constructs subdivisionData, it creates a separate String object for each subdivision suffix via substring(). Suffixes like "01", "02", "zzzz" repeat across many regions: of ~5,910 expanded entries, only 1,992 suffixes are unique, leaving 3,918 duplicate String objects (~195 KB). Fix: call intern() on each suffix string. The JVM's intern pool ensures that identical suffixes share a single String instance. On Android, ValidityData is initialized in the zygote process, which is forked to launch every app. Deduplicating these strings means the shared pages backing them are less likely to be copy-on-written post-fork, so the ~195 KB saving is effectively per-process across every running app. No parallel icu4c change needed. The C++ side doesn't have an equivalent ValidIdentifiers/ValiditySet class; subdivision validity goes through a different code path.
1 parent 8be6575 commit 164e249

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

icu4j/main/core/src/main/java/com/ibm/icu/impl/ValidIdentifiers.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ public ValiditySet(Set<String> plainData, boolean makeMap) {
6262
pos2 = pos = s.charAt(0) < 'A' ? 3 : 2;
6363
}
6464
final String key = s.substring(0, pos);
65-
final String subdivision = s.substring(pos2);
65+
final String subdivision = s.substring(pos2).intern();
6666

6767
Set<String> oldSet = _subdivisionData.get(key);
6868
if (oldSet == null) {

0 commit comments

Comments
 (0)