Skip to content

[stdlib] Add allDistinct APIs for iterables, sequences and arrays#6187

Open
Dmitry Nekrasov (DmitryNekrasov) wants to merge 4 commits into
masterfrom
rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct
Open

[stdlib] Add allDistinct APIs for iterables, sequences and arrays#6187
Dmitry Nekrasov (DmitryNekrasov) wants to merge 4 commits into
masterfrom
rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct

Conversation

@DmitryNekrasov

Copy link
Copy Markdown
Contributor

Adds experimental allDistinct() and allDistinctBy { } ("are all elements different from each other?") for Iterable, Sequence, and the object, primitive, and unsigned array families — the dual of allEqual. @ExperimentalStdlibApi, @SinceKotlin("2.4").

Distinctness goes through a hash set, so elements compare by equals/hashCode: NaN equals NaN, and -0.0 is not equal to 0.0 — the same as distinct()/toSet() and the sibling allEqual.

^KT-30270 Fixed

@kotlin-safemerge

kotlin-safemerge Bot commented Jun 9, 2026

Copy link
Copy Markdown

Code Owners

RuleOwnersApproval
/​libraries/​stdlib/​, /​libraries/​tools/​kotlin-​stdlib-​gen/​@DmitryNekrasov, @fzhinkin, @ilya-g, @qwwdfsad
@ilya-g, @fzhinkin
/​libraries/​tools/​binary-​compatibility-​validator/​
kotlin-libraries

@ilya-g, @fzhinkin

@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) force-pushed the rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct branch 3 times, most recently from 31228ac to ebfd3a6 Compare June 9, 2026 13:03
@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) marked this pull request as draft June 9, 2026 13:04
@DmitryNekrasov

Copy link
Copy Markdown
Contributor Author

/dry-run

@KotlinBuild

Build Server (KotlinBuild) commented Jun 9, 2026

Copy link
Copy Markdown

THIS IS A DRY RUN

Quality gate is triggered at https://buildserver.labs.intellij.net/build/969309422 — use this link to get full insight.

Quality gate was triggered with the following revisions:

kotlin
Branch: refs/merge/GITHUB-6187/safe-merge
Commit: 1ad454d


Quality gate failed. See https://buildserver.labs.intellij.net/build/969309422 to get full insight.

@DmitryNekrasov

Copy link
Copy Markdown
Contributor Author

The failing cases are allDistinctDifferentNaNBits{Double,Float} (and the By variants): they expect two NaNs with different bit patterns to count as duplicates. allDistinct dedups with a HashSet, and on JS the number hashCode disagrees with equals for NaN — equals treats all NaNs as equal, but getNumberHashCode hashes the raw IEEE bits without canonicalizing NaN, so two NaNs with different bits get different hashes, never share a bucket, and the set keeps both. JVM passes because Double.hashCode/Float.hashCode canonicalize NaN via doubleToLongBits. (allEqual is unaffected — it compares with equals, no hashing.)

It's engine-dependent: I checked Float — fails on Node and Chrome (both V8), passes on Safari and Firefox. I think Double takes the same path, I didn't check it across engines. Both js/node and js/browser fail. Canonical NaN and -0.0/0.0 pass everywhere.

Reproducer (https://pl.kotl.in/P6_xrfPbh):

val a = Float.NaN
val b = Float.fromBits(0xFFFC0000.toInt())  // a different NaN bit pattern
println(a.equals(b))                        // true everywhere
println(a.hashCode() == b.hashCode())       // JVM: true; JS: false on Node/Chrome, true on Safari/Firefox

Two options:

  1. Skip these cases on JS with testExceptOn(TestPlatform.Js); they keep running on the JVM, where NaN canonicalization makes the result well-defined. Optionally note the platform-dependence in the KDoc.
  2. File a Kotlin/JS issue to canonicalize NaN in getNumberHashCode — engine-independent and stdlib-side, but it touches every hash-based collection with Double/Float keys, so it's broader than this PR.

I'd take option 1 now; option 2 is a follow-up that would let us drop the guard.

@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) force-pushed the rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct branch from ebfd3a6 to ad57343 Compare June 9, 2026 17:41
@DmitryNekrasov

Copy link
Copy Markdown
Contributor Author

/dry-run

@KotlinBuild

Build Server (KotlinBuild) commented Jun 9, 2026

Copy link
Copy Markdown

THIS IS A DRY RUN

Quality gate is triggered at https://buildserver.labs.intellij.net/build/969660301 — use this link to get full insight.

Quality gate was triggered with the following revisions:

kotlin
Branch: refs/merge/GITHUB-6187/safe-merge
Commit: 005fd06


Quality gate finished successfully.

@ExperimentalStdlibApi
public fun BooleanArray.allDistinct(): Boolean {
if (size < 2) return true
val seen = HashSet<Boolean>()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For booleans, there are only three cases:

  • the size is < 2
  • the size is 2 and values are different
  • in all other cases, there are duplicates

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Totally agree.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@SinceKotlin("2.4")
@ExperimentalStdlibApi
public fun ByteArray.allDistinct(): Boolean {
if (size < 2) return true

@fzhinkin Filipp Zhinkin (fzhinkin) Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if the size is greater than 512, then there are certainly some duplicates :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The threshold is 256 — the number of distinct Byte values; ShortArray, UShortArray, and CharArray got the same check at 65536.

@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) marked this pull request as ready for review June 9, 2026 19:35
@ExperimentalStdlibApi
public fun ByteArray.allDistinct(): Boolean {
if (size < 2) return true
val seen = HashSet<Byte>()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not much we can do for other integer types, but all byte values could be captured using 4 longs, so it might be worth implementing teeny-tiny bitset with a single add operation.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines +1475 to +1477
seen.add(selector(first))
do {
if (!seen.add(selector(iterator.next()))) return false

@fzhinkin Filipp Zhinkin (fzhinkin) Jun 9, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selector will be inlined twice (kudos to @qwwdfsad for the hint), needlessly emitting additional bytecode at the allDistinctBy's call-sites.

So it makes sense to rewrite it into something like:

val iterator = iterator()
if (!iterator.hasNext()) return true
var element: T? = iterator.next()
if (!iterator.hasNext() return true
val seen = HashSet<K>()
while (true) {
    if (!seed.add(selector(element))) return false
    if (!iterator.hasNext()) break
    element = iterator.next()
}
return true

It would be nice to reduce the size of the loop's preamble, but not sure if there's a reasonable way to achieve it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Dmitry Nekrasov (DmitryNekrasov) added a commit that referenced this pull request Jun 10, 2026
Review follow-up for #6187:
- BooleanArray: only three outcomes exist (size < 2; a distinct pair;
  otherwise a guaranteed duplicate), so compute the answer directly
  with no HashSet.
- ByteArray/UByteArray: more than 256 elements can't be all-distinct
  (pigeonhole), return false right away; values that fit are tracked
  in a 256-bit set of four Longs instead of a HashSet, avoiding
  boxing and a hash-table allocation.
- ShortArray/UShortArray/CharArray: same pigeonhole shortcut at 65536.
  No bitset here: zeroing 8 KiB per call would penalize the common
  small-array case.

Wider element types (Int and above) can't benefit: their value domain
exceeds the maximum array size, so the size check would never fire.

Tests cover the domain boundaries (full domain distinct, same size
with a duplicate, domain size + 1), all BooleanArray shapes up to
size 2 exhaustively, and byte values that collide in the low six
bits of the bitset words.

KT-30270
@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) force-pushed the rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct branch 3 times, most recently from e59ec6c to 529aeb5 Compare June 10, 2026 07:13
Adds experimental `allDistinct()` and `allDistinctBy { }` ("are all
elements different from each other?") for `Iterable`, `Sequence`, and the
object, primitive, and unsigned array families — the dual of `allEqual`.
`@ExperimentalStdlibApi`, `@SinceKotlin("2.4")`.

Distinctness uses `equals`/`hashCode`, so for floating-point elements
`NaN` equals `NaN` and `-0.0` is not equal to `0.0`, consistent with
`Double.equals` and the existing `allEqual`/`isSorted`.

^KT-30270 Fixed
On JS, Double and Float hashCode don't canonicalize NaN, so a HashSet
keeps NaN values with different bit patterns apart.
Only the byte arrays get a bitset: zeroing the 8 KiB needed for the
16-bit types on every call would penalize the common small-array case.
The selector is inlined, so each call site used to receive two copies
of the lambda's bytecode.
@DmitryNekrasov Dmitry Nekrasov (DmitryNekrasov) force-pushed the rr/stdlib/dmitry.nekrasov/KT-30270-allDistinct branch from 529aeb5 to 5ab237c Compare June 10, 2026 07:18
@DmitryNekrasov

Copy link
Copy Markdown
Contributor Author

/dry-run

@KotlinBuild

Build Server (KotlinBuild) commented Jun 10, 2026

Copy link
Copy Markdown

THIS IS A DRY RUN

Quality gate is triggered at https://buildserver.labs.intellij.net/build/970073740 — use this link to get full insight.

Quality gate was triggered with the following revisions:

kotlin
Branch: refs/merge/GITHUB-6187/safe-merge
Commit: 90f8fc3


Quality gate failed. See https://buildserver.labs.intellij.net/build/970073740 to get full insight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants