Drop in SmallFp Goldilocks by z-tech · Pull Request #242 · WizardOfMenlo/whir

z-tech · 2026-03-26T13:56:46Z

What does this PR do?

Just drops in the newly merged SmallFp impl of the Goldilocks field
I don't exactly know how these benches work but the ones I found show some improvement
Presumably there are extended benches that could be run to see more results

Results of running cargo bench

SmallFp vs Fp64<MontBackend> for the Goldilocks field ($p = 2^{64} - 2^{32} + 1$). Same machine, single-threaded, --release.

Interleaved RS Encode (median)

Size	`Fp64`	`SmallFp`	Δ
(16, 2, 2)	1.241 ms	1.053 ms	−15%
(16, 4, 3)	2.450 ms	1.909 ms	−22%
(18, 2, 2)	4.579 ms	4.521 ms	−1%
(18, 4, 3)	7.857 ms	8.662 ms	+10%
(20, 2, 3)	17.17 ms	16.18 ms	−6%
(20, 4, 4)	35.58 ms	33.92 ms	−5%
(22, 4, 4)	163.5 ms	160.5 ms	−2%

Sumcheck First Round (median)

Size	`Fp64`	`SmallFp`	Δ
65536	322.9 µs	230.7 µs	−29%
262144	577.1 µs	460.9 µs	−20%
1048576	1.490 ms	1.228 ms	−18%

Switch the Goldilocks field (p = 2^64 - 2^32 + 1) from Fp64<MontBackend<FConfig64, 1>> to SmallFp<FConfig64> using #[derive(SmallFpConfig)]. This leverages native u64 Montgomery arithmetic for 15-22% faster sumcheck and 4-12% faster NTT operations. Key changes: - fields.rs: SmallFp field definition with const fn goldilocks_mont() for compile-time Montgomery constant computation in Fp2/Fp3 extension configs - Cargo.toml: patch ark-ff/ark-std/ark-serialize to git master for SmallFp; update spongefish to patched version fixing BigInt<2> encoding/deserialization - cooley_tukey.rs: fix test using BigInt<2> (16 bytes) instead of BigInt<1> (8 bytes) Spongefish patches (in /tmp/spongefish-patched): - Fix impl_encoding! macro: drain leading zeros from BigInt<2>.to_bytes_be() - Fix impl_deserialize! macro: use BigInt::from_bits_be + from_bigint instead of from_be_bytes_mod_order (broken for SmallFp due to from_raw Montgomery bug)

… fix - spongefish: z-tech/spongefish rev 2613967 (encoding + NargDeserialize fixes for SmallFp extension fields) - ark-ff: algebra a2d4d660 (includes #1082: fix SmallFp from_random_bytes Montgomery confusion)

codspeed-hq · 2026-03-26T14:02:36Z

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 22 skipped benchmarks¹

_{Comparing smallfp-goldilocks (48c7c8e) with main (790bdf0)}

22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

recmo · 2026-03-27T21:18:22Z

+[patch.crates-io]
+ark-ff = { git = "https://github.com/arkworks-rs/algebra" }
+ark-std = { git = "https://github.com/arkworks-rs/std" }
+ark-serialize = { git = "https://github.com/arkworks-rs/algebra" }


Let's wait until this is in a published ark-ff crate before merging?
I prefer not depending on main branches directly (even though we already do it for spongefish, but at least we pin a rev).

Depending on an unpegged main branch is pretty fragile. For example, everything would break if ark-std decides to finally upgrade their rand version.

We'd be happy to get a list of items you think could be cleaned up if you want to chat btw.

recmo · 2026-03-27T21:21:44Z

+/// Since p ≈ 2^64, we have k = 64 and R = 2^64.
+///
+/// Montgomery form of `v` is `v · R mod p`, computed in u128 to avoid overflow.
+const fn goldilocks_mont(v: u64) -> u64 {


Why is this not part of SmallFp? I would expect an

impl SmallFp { pub const fn from_u64(n: u64) -> Self; }

or similar being generated by the macro that does exactly this.

It's also leaks the abstraction in that it assumes SmallFp uses Montogomery representation, which is not obvious for small field (i.e. M31 works better without IIRC).

Fair point wrote the little helper here: arkworks-rs/algebra#1084

Also, we're happy to further optimize any Mersenne primes when/ if you have a specific need: https://andrewzitek.xyz/smallfp-site/

I'm confident this would be a reasonable change in the framework we've created.

Also, we're happy to further optimize any Mersenne primes when/ if you have a specific need: https://andrewzitek.xyz/smallfp-site/

The main performance benefits from small fields come from using SIMD instructions (and their bigger cousins, GPUs). Unfortunately supporting this requires a very different field Trait than ark-ff::Field (I think Plonky3 got this part figured out well). In fact, we also found SIMD to be beneficial for large fields: doing 4 bn254 multiplications in parallel can be done 2x faster than 4 sequentially. In the Whir crate we are prepared to use such an API: all the large batch operations have dedicated parallel fns in the algebra module. This is where SIMD would apply. What's missing is ark-ff support.

One easy way to help us is if ark-ff::Field implements the zerocopy traits. Right now certain optimizations are blocked by not being able to cast from &[F] to &[[u64;LIMBS]]. Hashing a large vector of field elements for example. Mutable casting is a bit trickier as the values are no longer guaranteed reduced, but would be very useful to have as well (you can do this cleanly using a MaybeReduced field type, similar to MaybeInit). That would allow us to implement our own SIMD methods for example. It gives us an efficient and safe backdoor into the field internals. (Also vec![F::ZERO; large_size] is very slow right now, implementing zerocopy::FromZeros would give us a workaround).

This SIMDS support is critical to us (ProveKit). If we can't find a way to do this cleanly in ark-ff we will be forced to write our own field impls.

Hi, thanks.

I've been able to squeeze good performance in my efficient sumcheck repo (maybe of independent interest btw) but I am transmuting the memory block there and I agree that zerocopy would be better. Here the idea is if the user calls the sumcheck lib with specific primes they get autodispatched into the vectorized path without ever knowing what that is or how it works.

That's kind of the vision for how vectorization would be ideally supported in Arkworks where the user is unaware like how is summarized on slide 20 here: https://andrewzitek.xyz/images/small_fp_slides.pdf#page=20 (slides are bit outdated otherwise btw). We have students working on this and it's of high interest to me.

That being said, there are things to work out before arriving there and if you're able to point me toward what functionality is most important for your efforts I can do my best to prioritize these. Ideally from my end, we could select some discrete pieces and collab.

Related but different:

I am also doing a rather large effort on Merkle Trees with both security and perfomance enchancements. I think it's relevant to your projects if you want a sneak peek lmk.

Generally speaking, the hope for all of these components (vectorization, sumcheck, vector commitments) is that they are easy to integrate and should well-support what you're doing. Appreciate the feedback and would like to have a closer loop in the future.

z-tech added 2 commits March 26, 2026 09:46

deps: point spongefish at z-tech fork, update ark-ff to include #1082…

83d2897

… fix - spongefish: z-tech/spongefish rev 2613967 (encoding + NargDeserialize fixes for SmallFp extension fields) - ark-ff: algebra a2d4d660 (includes #1082: fix SmallFp from_random_bytes Montgomery confusion)

z-tech requested review from Bisht13, WizardOfMenlo and recmo March 26, 2026 13:58

z-tech added 3 commits March 26, 2026 15:04

fmt

9e062f3

clippy

2b5c459

fix ci

939f02a

recmo requested changes Mar 27, 2026

View reviewed changes

z-tech added 3 commits March 29, 2026 11:46

const smallfp

b33ae53

Merge branch 'main' into smallfp-goldilocks

c65615e

autodetect subgroup patch

48c7c8e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop in SmallFp Goldilocks#242

Drop in SmallFp Goldilocks#242
z-tech wants to merge 8 commits intomainfrom
smallfp-goldilocks

z-tech commented Mar 26, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Mar 26, 2026 •

edited

Loading

Uh oh!

recmo Mar 27, 2026 •

edited

Loading

Uh oh!

z-tech Mar 29, 2026

Uh oh!

recmo Mar 27, 2026

Uh oh!

z-tech Mar 29, 2026

Uh oh!

recmo Apr 1, 2026 •

edited

Loading

Uh oh!

z-tech Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

z-tech commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Results of running cargo bench

Interleaved RS Encode (median)

Sumcheck First Round (median)

Uh oh!

codspeed-hq Bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Footnotes

Uh oh!

recmo Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

z-tech Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

recmo Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

z-tech Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

recmo Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

z-tech Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

z-tech commented Mar 26, 2026 •

edited

Loading

codspeed-hq Bot commented Mar 26, 2026 •

edited

Loading

recmo Mar 27, 2026 •

edited

Loading

recmo Apr 1, 2026 •

edited

Loading

z-tech Apr 2, 2026 •

edited

Loading