Skip to content

feat: speed up PersistentHashMapDeserializer#92

Open
miikka wants to merge 2 commits intometosin:masterfrom
miikka:speed-up-phm
Open

feat: speed up PersistentHashMapDeserializer#92
miikka wants to merge 2 commits intometosin:masterfrom
miikka:speed-up-phm

Conversation

@miikka
Copy link
Copy Markdown
Contributor

@miikka miikka commented Mar 19, 2026

A speculative benchmark optimization for the decode code path. I found this by running pi-autoresearch with gpt-5.4-medium against the benchmark. The idea is similar in spirit to #17 (which didn't work) but the PersistentArrayMap construction is inlined

This is what I got for the jsonista.jmh/decode-jsonista benchmark on my laptop, master vs this branch:

Size Main PR Change
10b 9,285,018.601 ops/s 11,050,553.982 ops/s +19.01%
100b 2,026,489.218 ops/s 2,367,886.089 ops/s +16.85%
1k 363,486.397 ops/s 418,933.267 ops/s +15.25%
10k 32,911.114 ops/s 37,307.349 ops/s +13.36%
100k 3,342.422 ops/s 3,644.178 ops/s +9.03%

I'm not sure what to make of this - is this a good idea, or is it possibly overfitting to the benchmark dataset?

Version info
% java --version
openjdk 25.0.2 2026-01-20
OpenJDK Runtime Environment Homebrew (build 25.0.2)
OpenJDK 64-Bit Server VM Homebrew (build 25.0.2, mixed mode, sharing)
% clj --version
Clojure CLI version 1.12.4.1618

@opqdonut
Copy link
Copy Markdown
Member

Interesting that this would work while #17 didn't. How could we figure out if it makes sense to ship this?

@opqdonut opqdonut moved this to 📬 Inbox in Metosin Open Source Backlog Mar 20, 2026
@miikka
Copy link
Copy Markdown
Contributor Author

miikka commented Mar 20, 2026

We could try running a benchmark against a bit bigger benchmark suite - I need to go spelunking the Internet to see if there are any ready-to-use ones - and see if the results still hold (+ possibly see about different Java versions). The change seems sensible to me so if it holds on a bigger benchmark, then I'd ship it.

@opqdonut
Copy link
Copy Markdown
Member

That sounds good. The code makes sense and is a nice surgical change. I'd like to simplify the diff by removing the nested whiles and the <<2s before merging though.

@opqdonut
Copy link
Copy Markdown
Member

Oh and a comment explaining that this is an optimisation would probably also be apropos.

@miikka
Copy link
Copy Markdown
Contributor Author

miikka commented Mar 20, 2026

I'll edit it for clarity if the benchmarking pays off

Comment on lines +60 to +62
for (int i = 0; i < size << 1; i += 2) {
t = t.assoc(entries[i], entries[i + 1]);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me like this for loop is run again for each kv-pair after the size of temporary Object size goes over 8? Though I would think the temporary Object contents need to be copied into the transient only once.

Or maybe there is something I don't understand.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay now I think I got it.

There is similar while loop after this copy step, reading until JSON Object end. So the above while loop will not continue when this else branch is hit.

Copy link
Copy Markdown
Member

@opqdonut opqdonut Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is why I said I want to simplify the control flow in the diff. I imagine it could be something like

while (size < 8) {
   if END_OBJECT { return PersistentArrayMap(...) }
   Object key = ...
   Object val = ...
   entries[size/2] = key
   entries[size/2+1] = val
}
PersistentHashMap phm = ...
while (! END_OBJECT) {
   phm.assoc(key, val)
}
return phm

@miikka
Copy link
Copy Markdown
Contributor Author

miikka commented Mar 31, 2026

I gathered a new benchmark suite from two sources:

The benchmark setup can be seen here: miikka#2.

I ran the benchmarks on a Linux server with different JDKs (Oracle/Temurin/Corretto, 21/25/26). The results were positive but mixed: while the patch improved the average performance across the whole suite in almost all runs1, typically the performance suffered in some cases. Native JSON Benchmark's canada.json seems to consistently suffer from this patch.

In any case, I have now collected a bunch of data and I'm realizing that I'm out of my depth in making sense of microbenchmark results! I still think the patch makes sense, but the situation is much less clear-cut than I hoped for.


Examples of benchmark results:

Benchmark results, Temurin 25

Run 1: 2026-03-25 09:05:55 · c393cbd2 chore: now with all benchmarks · gob-jsonista · Temurin 25.0.2
Run 2: 2026-03-25 10:05:38 · ede5c6d8 feat: speed up PersistentHashMapDeserializer · gob-jsonista · Temurin 25.0.2

Benchmark Params Run 1 (ops/s) Run 2 (ops/s) Δ (%)
decode "json-size-benchmark/circleciblank" 2.5M 2.83M +13.4%
decode "json-size-benchmark/circlecimatrix" 504K 607K +20.5%
decode "json-size-benchmark/commitlint" 590K 642K +8.7%
decode "json-size-benchmark/commitlintbasic" 2.81M 2.97M +5.6%
decode "json-size-benchmark/epr" 190K 212K +11.3%
decode "json-size-benchmark/eslintrc" 84.6K 93.2K +10.1%
decode "json-size-benchmark/esmrc" 695K 822K +18.3%
decode "json-size-benchmark/geojson" 137K 137K +0.2%
decode "json-size-benchmark/githubfundingblank" 587K 608K +3.6%
decode "json-size-benchmark/githubworkflow" 254K 307K +21.1%
decode "json-size-benchmark/gruntcontribclean" 623K 722K +15.9%
decode "json-size-benchmark/imageoptimizerwebjob" 728K 755K +3.7%
decode "json-size-benchmark/jsonereversesort" 602K 732K +21.5%
decode "json-size-benchmark/jsonesort" 1.32M 1.43M +8.4%
decode "json-size-benchmark/jsonfeed" 304K 346K +13.5%
decode "json-size-benchmark/jsonresume" 53.4K 57.6K +7.9%
decode "json-size-benchmark/netcoreproject" 136K 129K -5.0%
decode "json-size-benchmark/nightwatch" 72.5K 73.6K +1.6%
decode "json-size-benchmark/openweathermap" 149K 163K +9.6%
decode "json-size-benchmark/openweatherroadrisk" 167K 184K +10.4%
decode "json-size-benchmark/packagejson" 66.8K 69.9K +4.7%
decode "json-size-benchmark/packagejsonlintrc" 82.5K 81.1K -1.7%
decode "json-size-benchmark/sapcloudsdkpipeline" 1.77M 2.07M +17.2%
decode "json-size-benchmark/travisnotifications" 278K 383K +37.9%
decode "json-size-benchmark/tslintbasic" 1.04M 1.4M +34.4%
decode "json-size-benchmark/tslintextend" 1.4M 1.75M +24.6%
decode "json-size-benchmark/tslintmulti" 642K 773K +20.3%
decode "json100b" 716K 854K +19.2%
decode "json100k" 1.42K 1.67K +17.3%
decode "json10b" 3.31M 4.02M +21.4%
decode "json10k" 14K 16.7K +18.8%
decode "json1k" 145K 172K +18.3%
decode "nativejson-benchmark/canada" 20.29 20.07 -1.1%
decode "nativejson-benchmark/citm_catalog" 135.7 156.9 +15.6%
decode "nativejson-benchmark/twitter" 301.2 304.4 +1.1%
+12.4%
Benchmark results, Oracle Java 26

Run 1: 2026-03-26 06:59:54 · c393cbd2 chore: now with all benchmarks · gob-jsonista · java version "26" 2026-03-17
Run 2: 2026-03-26 07:59:35 · ede5c6d8 feat: speed up PersistentHashMapDeserializer · gob-jsonista · java version "26" 2026-03-17

Benchmark Params Run 1 (ops/s) Run 2 (ops/s) Δ (%)
decode "json-size-benchmark/circleciblank" 2.58M 3.04M +17.9%
decode "json-size-benchmark/circlecimatrix" 478K 610K +27.6%
decode "json-size-benchmark/commitlint" 616K 654K +6.1%
decode "json-size-benchmark/commitlintbasic" 2.93M 3.59M +22.3%
decode "json-size-benchmark/epr" 195K 221K +13.2%
decode "json-size-benchmark/eslintrc" 92.3K 102K +11.0%
decode "json-size-benchmark/esmrc" 648K 875K +34.9%
decode "json-size-benchmark/geojson" 142K 137K -3.5%
decode "json-size-benchmark/githubfundingblank" 599K 574K -4.2%
decode "json-size-benchmark/githubworkflow" 259K 310K +19.6%
decode "json-size-benchmark/gruntcontribclean" 651K 753K +15.8%
decode "json-size-benchmark/imageoptimizerwebjob" 744K 803K +8.0%
decode "json-size-benchmark/jsonereversesort" 638K 779K +22.1%
decode "json-size-benchmark/jsonesort" 1.32M 1.49M +13.2%
decode "json-size-benchmark/jsonfeed" 306K 355K +15.8%
decode "json-size-benchmark/jsonresume" 56.3K 57.1K +1.5%
decode "json-size-benchmark/netcoreproject" 120K 134K +11.2%
decode "json-size-benchmark/nightwatch" 71K 76.2K +7.3%
decode "json-size-benchmark/openweathermap" 152K 157K +3.3%
decode "json-size-benchmark/openweatherroadrisk" 169K 191K +13.4%
decode "json-size-benchmark/packagejson" 69.3K 69.8K +0.8%
decode "json-size-benchmark/packagejsonlintrc" 86.2K 85.2K -1.1%
decode "json-size-benchmark/sapcloudsdkpipeline" 1.78M 2.1M +17.9%
decode "json-size-benchmark/travisnotifications" 289K 362K +25.6%
decode "json-size-benchmark/tslintbasic" 1.11M 1.49M +34.3%
decode "json-size-benchmark/tslintextend" 1.64M 1.85M +13.4%
decode "json-size-benchmark/tslintmulti" 668K 770K +15.3%
decode "json100b" 800K 874K +9.3%
decode "json100k" 1.5K 1.6K +7.0%
decode "json10b" 3.34M 3.87M +15.9%
decode "json10k" 14.5K 16K +9.9%
decode "json1k" 152K 177K +16.6%
decode "nativejson-benchmark/canada" 33.23 33.16 -0.2%
decode "nativejson-benchmark/citm_catalog" 142.2 160.7 +13.0%
decode "nativejson-benchmark/twitter" 317.9 320.8 +0.9%
+12.0%

Footnotes

  1. I reported to @opqdonut that in my first run with Java 26, the existing code was faster than the patch! However, this turned out to be a fluke and in the runs afterwards, the patch was faster.

@Deraen
Copy link
Copy Markdown
Member

Deraen commented Apr 1, 2026

One example of performance critical JSON parsing case: reading large GeoJSON files. They also repeat a lot of patterns, so would be interesting to see if that has an effect.


And well, the nativejson-benchmark already has 2MB canada.json file. I might be also interested if there is a difference in ~100-1000MB files.

Copy link
Copy Markdown
Member

@opqdonut opqdonut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm happy with this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📬 Inbox

Development

Successfully merging this pull request may close these issues.

3 participants