ring: reduce allocs and ~5–10% faster Sign/Verify (precomputed H_p, pooled scratch, EncodeInto)#1
Conversation
…oled scratch, EncodeInto, ensureHP)
0858c33 to
2f93470
Compare
Olshansk
left a comment
There was a problem hiding this comment.
There's A LOT of intricate logic in here so hard to review without going into all the details.
That said, I'm sue it'll help with performance!
- Added @noot in case she has time to review.
- Requested a couple of minor changes
This will go very well hand-in-hand with some of the other changes we're working on here: pokt-network/shannon-sdk#47
| BenchmarkVerify32_Ed25519-32 246 4,801,361 ns/op 51,904 B/op 704 allocs/op | ||
| BenchmarkVerify64_Ed25519-32 121 9,620,050 ns/op 104,307 B/op 1,407 allocs/op | ||
| BenchmarkVerify128_Ed25519-32 60 19,432,527 ns/op 211,098 B/op 2,869 allocs/op | ||
|
|
There was a problem hiding this comment.
BenchmarkVerify128_Ed25519-32 60 19,432,527 ns/op 211,098 B/op 2,869 allocs/op
#### After (v0.2.0)
```bash
-------------------------
goos: linux
goarch: amd64
| ``` | ||
| Before (v0.1.0) | ||
| ------------------------- | ||
| goos: linux |
| **Summary:** | ||
|
|
||
| - **Secp256k1**: ~**5–10% faster** across typical sizes. Examples: | ||
| - Sign32: **11.70ms → 10.60ms** (~**9.4%** faster) | ||
| - Sign64: **22.93ms → 21.02ms** (~**8.3%** faster) | ||
| - Verify32: **11.32ms → 10.32ms** (~**8.9%** faster) | ||
| - Verify64: **22.91ms → 20.69ms** (~**9.7%** faster) | ||
| - **Ed25519**: similar gains on average (**~5–8%** for common sizes). | ||
| - Verify32: **4.80ms → 4.41ms** (~**8.0%**) | ||
| - Sign32: **4.96ms → 4.62ms** (~**7.0%**) | ||
| - **Allocations**: consistently down **15–30%** depending on bench. | ||
| (e.g., Verify32 secp256k1: **843 → 682 allocs**; Sign2 secp256k1: **84 → 75 allocs**) |
There was a problem hiding this comment.
| **Summary:** | |
| - **Secp256k1**: ~**5–10% faster** across typical sizes. Examples: | |
| - Sign32: **11.70ms → 10.60ms** (~**9.4%** faster) | |
| - Sign64: **22.93ms → 21.02ms** (~**8.3%** faster) | |
| - Verify32: **11.32ms → 10.32ms** (~**8.9%** faster) | |
| - Verify64: **22.91ms → 20.69ms** (~**9.7%** faster) | |
| - **Ed25519**: similar gains on average (**~5–8%** for common sizes). | |
| - Verify32: **4.80ms → 4.41ms** (~**8.0%**) | |
| - Sign32: **4.96ms → 4.62ms** (~**7.0%**) | |
| - **Allocations**: consistently down **15–30%** depending on bench. | |
| (e.g., Verify32 secp256k1: **843 → 682 allocs**; Sign2 secp256k1: **84 → 75 allocs**) | |
| #### Summary | |
| **Secp256k1**: ~**5–10% faster** across typical sizes. Examples: | |
| - Sign32: **11.70ms → 10.60ms** (~**9.4%** faster) | |
| - Sign64: **22.93ms → 21.02ms** (~**8.3%** faster) | |
| - Verify32: **11.32ms → 10.32ms** (~**8.9%** faster) | |
| - Verify64: **22.91ms → 20.69ms** (~**9.7%** faster) | |
| **Ed25519**: similar gains on average (**~5–8%** for common sizes). | |
| - Verify32: **4.80ms → 4.41ms** (~**8.0%**) | |
| - Sign32: **4.96ms → 4.62ms** (~**7.0%**) | |
| **Allocations**: consistently down **15–30%** depending on bench. | |
| (e.g., Verify32 secp256k1: **843 → 682 allocs**; Sign2 secp256k1: **84 → 75 allocs**) |
|
|
||
| toolchain go1.24.3 | ||
|
|
||
| replace github.com/athanorlabs/go-dleq => github.com/jorgecuesta/go-dleq v0.0.0-20250918223310-7a1fc288336f |
There was a problem hiding this comment.
Please add the following:
- TODO to remove this
- Link to the PR we are waiting to be approved
- Explain why a fork was needed
There was a problem hiding this comment.
- ok
- ok
- The PR explain it self, is redundant, but the general idea is to reduce allocations.
| type Ring struct { | ||
| pubkeys []types.Point | ||
| curve types.Curve | ||
| // hp[i] = hashToCurve(pubkeys[i]); precomputed once to avoid recomputing in Sign/Verify loops. |
There was a problem hiding this comment.
No, just update to explain why it exists. Removing the part that looks like code.
| } | ||
|
|
||
| // PublicKeysRef returns a read-only view of the ring's public keys without copying. | ||
| // NOTE: Do not mutate the returned slice or its elements. |
There was a problem hiding this comment.
Probably the wording is wrong, the point on this is avoid clone the PublicKey into a new object. Basically use Reference to reduce allocations.
Olshansk
left a comment
There was a problem hiding this comment.
@pokt-network/ring-go can you reopen a pull request from ring-go proper?
I merged my most recent changes, so this will be a good addition.
This PR tunes the hot paths in ring signature Sign/Verify without changing any public APIs.
What changed
H_p(P_i)inRingand reuse in Sign/Verify loops.c[]challenge slice to cut GC churn.types.PointEncodeInto(falls back toEncode()).PublicKeys()copy semantics, addPublicKeysRef()for zero-copy reads.Why
H_p(P_i)and allocated on each iteration.Results (Ryzen 9 5950X, Go1.24.3 linux/amd64)
Secp256k1
Ed25519
Full before/after benchmark output
**Before** ```text goos: linux goarch: amd64 pkg: github.com/pokt-network/ring-go cpu: AMD Ryzen 9 5950X 16-Core Processor BenchmarkSign2_Secp256k1-32 1174 1,026,322 ns/op 5,021 B/op 84 allocs/op BenchmarkSign4_Secp256k1-32 674 1,733,693 ns/op 8,536 B/op 140 allocs/op BenchmarkSign8_Secp256k1-32 373 3,251,426 ns/op 15,567 B/op 252 allocs/op BenchmarkSign16_Secp256k1-32 201 6,046,313 ns/op 29,633 B/op 476 allocs/op BenchmarkSign32_Secp256k1-32 92 11,699,021 ns/op 57,812 B/op 925 allocs/op BenchmarkSign64_Secp256k1-32 49 22,926,795 ns/op 114,472 B/op 1,825 allocs/op BenchmarkSign128_Secp256k1-32 24 45,918,672 ns/op 227,992 B/op 3,634 allocs/op BenchmarkVerify2_Secp256k1-32 1726 725,005 ns/op 3,404 B/op 53 allocs/op BenchmarkVerify4_Secp256k1-32 849 1,384,662 ns/op 6,814 B/op 105 allocs/op BenchmarkVerify8_Secp256k1-32 423 2,760,680 ns/op 13,646 B/op 209 allocs/op BenchmarkVerify16_Secp256k1-32 210 5,600,718 ns/op 27,366 B/op 419 allocs/op BenchmarkVerify32_Secp256k1-32 104 11,323,208 ns/op 55,044 B/op 843 allocs/op BenchmarkVerify64_Secp256k1-32 49 22,912,945 ns/op 111,577 B/op 1,708 allocs/op BenchmarkVerify128_Secp256k1-32 24 47,007,356 ns/op 228,492 B/op 3,502 allocs/opBenchmarkSign2_Ed25519-32 2856 417,582 ns/op 4,672 B/op 70 allocs/op
BenchmarkSign4_Ed25519-32 1622 721,830 ns/op 8,032 B/op 119 allocs/op
BenchmarkSign8_Ed25519-32 888 1,305,862 ns/op 14,706 B/op 214 allocs/op
BenchmarkSign16_Ed25519-32 481 2,494,322 ns/op 28,041 B/op 403 allocs/op
BenchmarkSign32_Ed25519-32 241 4,961,154 ns/op 54,882 B/op 791 allocs/op
BenchmarkSign64_Ed25519-32 123 9,775,632 ns/op 109,027 B/op 1,579 allocs/op
BenchmarkSign128_Ed25519-32 57 19,487,283 ns/op 216,505 B/op 3,103 allocs/op
BenchmarkVerify2_Ed25519-32 3784 294,033 ns/op 3,217 B/op 44 allocs/op
BenchmarkVerify4_Ed25519-32 1953 584,609 ns/op 6,436 B/op 87 allocs/op
BenchmarkVerify8_Ed25519-32 975 1,209,195 ns/op 12,993 B/op 180 allocs/op
BenchmarkVerify16_Ed25519-32 500 2,421,635 ns/op 25,969 B/op 356 allocs/op
BenchmarkVerify32_Ed25519-32 246 4,801,361 ns/op 51,904 B/op 704 allocs/op
BenchmarkVerify64_Ed25519-32 121 9,620,050 ns/op 104,307 B/op 1,407 allocs/op
BenchmarkVerify128_Ed25519-32 60 19,432,527 ns/op 211,098 B/op 2,869 allocs/op
Notes
types.PointEncodeIntois used when available; otherwise we fall back toEncode().