feat: speed up PersistentHashMapDeserializer#92
feat: speed up PersistentHashMapDeserializer#92miikka wants to merge 2 commits intometosin:masterfrom
Conversation
|
Interesting that this would work while #17 didn't. How could we figure out if it makes sense to ship this? |
|
We could try running a benchmark against a bit bigger benchmark suite - I need to go spelunking the Internet to see if there are any ready-to-use ones - and see if the results still hold (+ possibly see about different Java versions). The change seems sensible to me so if it holds on a bigger benchmark, then I'd ship it. |
|
That sounds good. The code makes sense and is a nice surgical change. I'd like to simplify the diff by removing the nested whiles and the |
|
Oh and a comment explaining that this is an optimisation would probably also be apropos. |
|
I'll edit it for clarity if the benchmarking pays off |
| for (int i = 0; i < size << 1; i += 2) { | ||
| t = t.assoc(entries[i], entries[i + 1]); | ||
| } |
There was a problem hiding this comment.
It seems to me like this for loop is run again for each kv-pair after the size of temporary Object size goes over 8? Though I would think the temporary Object contents need to be copied into the transient only once.
Or maybe there is something I don't understand.
There was a problem hiding this comment.
Okay now I think I got it.
There is similar while loop after this copy step, reading until JSON Object end. So the above while loop will not continue when this else branch is hit.
There was a problem hiding this comment.
Yep, this is why I said I want to simplify the control flow in the diff. I imagine it could be something like
while (size < 8) {
if END_OBJECT { return PersistentArrayMap(...) }
Object key = ...
Object val = ...
entries[size/2] = key
entries[size/2+1] = val
}
PersistentHashMap phm = ...
while (! END_OBJECT) {
phm.assoc(key, val)
}
return phm
|
I gathered a new benchmark suite from two sources:
The benchmark setup can be seen here: miikka#2. I ran the benchmarks on a Linux server with different JDKs (Oracle/Temurin/Corretto, 21/25/26). The results were positive but mixed: while the patch improved the average performance across the whole suite in almost all runs1, typically the performance suffered in some cases. Native JSON Benchmark's In any case, I have now collected a bunch of data and I'm realizing that I'm out of my depth in making sense of microbenchmark results! I still think the patch makes sense, but the situation is much less clear-cut than I hoped for. Examples of benchmark results: Benchmark results, Temurin 25Run 1: 2026-03-25 09:05:55 ·
Benchmark results, Oracle Java 26Run 1: 2026-03-26 06:59:54 ·
Footnotes |
|
One example of performance critical JSON parsing case: reading large GeoJSON files. They also repeat a lot of patterns, so would be interesting to see if that has an effect. And well, the nativejson-benchmark already has 2MB canada.json file. I might be also interested if there is a difference in ~100-1000MB files. |
A speculative benchmark optimization for the decode code path. I found this by running pi-autoresearch with
gpt-5.4-mediumagainst the benchmark. The idea is similar in spirit to #17 (which didn't work) but the PersistentArrayMap construction is inlinedThis is what I got for the
jsonista.jmh/decode-jsonistabenchmark on my laptop,mastervs this branch:I'm not sure what to make of this - is this a good idea, or is it possibly overfitting to the benchmark dataset?
Version info