Skip to content

Reimplement ArrayPools using StampedLock to control concurrent access#3158

Draft
fzhinkin wants to merge 3 commits intomasterfrom
fzhinkin/3016/final-implementation
Draft

Reimplement ArrayPools using StampedLock to control concurrent access#3158
fzhinkin wants to merge 3 commits intomasterfrom
fzhinkin/3016/final-implementation

Conversation

@fzhinkin
Copy link
Copy Markdown
Contributor

@fzhinkin fzhinkin commented Mar 3, 2026

No description provided.

To ensure better performance on Android,
synchronized-block will be used there instead.
ArrayList shows slightly better performance compared to ArrayDeque
@fzhinkin fzhinkin linked an issue Mar 3, 2026 that may be closed by this pull request
@fzhinkin
Copy link
Copy Markdown
Contributor Author

fzhinkin commented Mar 4, 2026

See benchmarking results below. tl;dr - on JVM, the proposed implementations show the same performance characteristics in single-threaded access scenario, and outperforms existing implementation if there's some contention.

===========================
= Baseline implementation =
===========================

Benchmark                       Mode  Cnt      Score     Error   Units
MemoryPoolBenchmark.encodeT1   thrpt   15  22322.747 ± 471.225  ops/ms
MemoryPoolBenchmark.encodeT12  thrpt   15   3223.363 ± 135.406  ops/ms
MemoryPoolBenchmark.encodeT16  thrpt   15   3125.608 ± 128.414  ops/ms
MemoryPoolBenchmark.encodeT2   thrpt   15   3629.936 ± 388.224  ops/ms
MemoryPoolBenchmark.encodeT24  thrpt   15   3069.192 ±  80.097  ops/ms
MemoryPoolBenchmark.encodeT4   thrpt   15   2938.843 ±  87.479  ops/ms
MemoryPoolBenchmark.encodeT8   thrpt   15   2999.605 ± 104.911  ops/ms

===========================
= Proposed implementation =
===========================

Benchmark                       Mode  Cnt      Score     Error   Units
MemoryPoolBenchmark.encodeT1   thrpt   15  22586.808 ± 394.487  ops/ms
MemoryPoolBenchmark.encodeT12  thrpt   15  15608.034 ± 471.527  ops/ms
MemoryPoolBenchmark.encodeT16  thrpt   15  14922.820 ± 358.430  ops/ms
MemoryPoolBenchmark.encodeT2   thrpt   15   9073.148 ± 599.410  ops/ms
MemoryPoolBenchmark.encodeT24  thrpt   15  15600.991 ± 284.742  ops/ms
MemoryPoolBenchmark.encodeT4   thrpt   15  14743.174 ± 361.471  ops/ms
MemoryPoolBenchmark.encodeT8   thrpt   15  14981.614 ± 206.089  ops/ms

https://jmh.morethan.io/?sources=https://gist.githubusercontent.com/fzhinkin/c8975199aec707ab3e99d1dba2c32339/raw/7e43ac41e2c4c344fe65156bef55f31bb812e3af/baseline.json,https://gist.githubusercontent.com/fzhinkin/c8975199aec707ab3e99d1dba2c32339/raw/7e43ac41e2c4c344fe65156bef55f31bb812e3af/proposed.json

Raw results:
baseline.json
proposed.json

@fzhinkin
Copy link
Copy Markdown
Contributor Author

fzhinkin commented Mar 4, 2026

While the proposed solution looks nice in theory, I don't see any improvements in TechEmpower benchmarks results (which is more or less expected, given that array acquisition from the pool take relatively small portion of time). However, when profile, TechEmpower benchmarks shows that StampedLock-based solution takes more time, compared to the baseline. It kinda contradicts results I see in microbenchmarks, and so far I don't understand why it is happening.

@sandwwraith
Copy link
Copy Markdown
Member

@e5l Do you have any other workloads besides TechEmpower that you can test this change on?

@e5l
Copy link
Copy Markdown
Member

e5l commented Apr 7, 2026

Nope, at the moment. But the most of the jsons are small, so it easy to spot it on medium load workload

@sandwwraith
Copy link
Copy Markdown
Member

While it is visible, changing the profile does not affect the score. So maybe it's not needed after all.

@e5l
Copy link
Copy Markdown
Member

e5l commented Apr 8, 2026

the bottle neck is shifted after this, we will keep pushing futher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Synchronize block cause performance drop

3 participants