Skip to content

Drop in SmallFp Goldilocks#242

Open
z-tech wants to merge 8 commits intomainfrom
smallfp-goldilocks
Open

Drop in SmallFp Goldilocks#242
z-tech wants to merge 8 commits intomainfrom
smallfp-goldilocks

Conversation

@z-tech
Copy link
Copy Markdown
Collaborator

@z-tech z-tech commented Mar 26, 2026

What does this PR do?

  • Just drops in the newly merged SmallFp impl of the Goldilocks field
  • I don't exactly know how these benches work but the ones I found show some improvement
  • Presumably there are extended benches that could be run to see more results

Results of running cargo bench

SmallFp vs Fp64<MontBackend> for the Goldilocks field ($p = 2^{64} - 2^{32} + 1$). Same machine, single-threaded, --release.

Interleaved RS Encode (median)

Size Fp64 SmallFp Δ
(16, 2, 2) 1.241 ms 1.053 ms −15%
(16, 4, 3) 2.450 ms 1.909 ms −22%
(18, 2, 2) 4.579 ms 4.521 ms −1%
(18, 4, 3) 7.857 ms 8.662 ms +10%
(20, 2, 3) 17.17 ms 16.18 ms −6%
(20, 4, 4) 35.58 ms 33.92 ms −5%
(22, 4, 4) 163.5 ms 160.5 ms −2%

Sumcheck First Round (median)

Size Fp64 SmallFp Δ
65536 322.9 µs 230.7 µs −29%
262144 577.1 µs 460.9 µs −20%
1048576 1.490 ms 1.228 ms −18%

z-tech added 2 commits March 26, 2026 09:46
Switch the Goldilocks field (p = 2^64 - 2^32 + 1) from Fp64<MontBackend<FConfig64, 1>>
to SmallFp<FConfig64> using #[derive(SmallFpConfig)]. This leverages native u64
Montgomery arithmetic for 15-22% faster sumcheck and 4-12% faster NTT operations.

Key changes:
- fields.rs: SmallFp field definition with const fn goldilocks_mont() for
  compile-time Montgomery constant computation in Fp2/Fp3 extension configs
- Cargo.toml: patch ark-ff/ark-std/ark-serialize to git master for SmallFp;
  update spongefish to patched version fixing BigInt<2> encoding/deserialization
- cooley_tukey.rs: fix test using BigInt<2> (16 bytes) instead of BigInt<1> (8 bytes)

Spongefish patches (in /tmp/spongefish-patched):
- Fix impl_encoding! macro: drain leading zeros from BigInt<2>.to_bytes_be()
- Fix impl_deserialize! macro: use BigInt::from_bits_be + from_bigint instead
  of from_be_bytes_mod_order (broken for SmallFp due to from_raw Montgomery bug)
… fix

- spongefish: z-tech/spongefish rev 2613967 (encoding + NargDeserialize fixes for SmallFp extension fields)
- ark-ff: algebra a2d4d660 (includes #1082: fix SmallFp from_random_bytes Montgomery confusion)
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented Mar 26, 2026

Merging this PR will not alter performance

✅ 10 untouched benchmarks
⏩ 22 skipped benchmarks1


Comparing smallfp-goldilocks (48c7c8e) with main (790bdf0)

Open in CodSpeed

Footnotes

  1. 22 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Comment thread Cargo.toml Outdated
[patch.crates-io]
ark-ff = { git = "https://github.com/arkworks-rs/algebra" }
ark-std = { git = "https://github.com/arkworks-rs/std" }
ark-serialize = { git = "https://github.com/arkworks-rs/algebra" }
Copy link
Copy Markdown
Collaborator

@recmo recmo Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's wait until this is in a published ark-ff crate before merging?
I prefer not depending on main branches directly (even though we already do it for spongefish, but at least we pin a rev).

Depending on an unpegged main branch is pretty fragile. For example, everything would break if ark-std decides to finally upgrade their rand version.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd be happy to get a list of items you think could be cleaned up if you want to chat btw.

Comment thread src/algebra/fields.rs Outdated
/// Since p ≈ 2^64, we have k = 64 and R = 2^64.
///
/// Montgomery form of `v` is `v · R mod p`, computed in u128 to avoid overflow.
const fn goldilocks_mont(v: u64) -> u64 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this not part of SmallFp? I would expect an

impl SmallFp {
   pub const fn from_u64(n: u64) -> Self;
}

or similar being generated by the macro that does exactly this.

It's also leaks the abstraction in that it assumes SmallFp uses Montogomery representation, which is not obvious for small field (i.e. M31 works better without IIRC).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point wrote the little helper here: arkworks-rs/algebra#1084

Also, we're happy to further optimize any Mersenne primes when/ if you have a specific need: https://andrewzitek.xyz/smallfp-site/

I'm confident this would be a reasonable change in the framework we've created.

Copy link
Copy Markdown
Collaborator

@recmo recmo Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we're happy to further optimize any Mersenne primes when/ if you have a specific need: https://andrewzitek.xyz/smallfp-site/

The main performance benefits from small fields come from using SIMD instructions (and their bigger cousins, GPUs). Unfortunately supporting this requires a very different field Trait than ark-ff::Field (I think Plonky3 got this part figured out well). In fact, we also found SIMD to be beneficial for large fields: doing 4 bn254 multiplications in parallel can be done 2x faster than 4 sequentially. In the Whir crate we are prepared to use such an API: all the large batch operations have dedicated parallel fns in the algebra module. This is where SIMD would apply. What's missing is ark-ff support.

One easy way to help us is if ark-ff::Field implements the zerocopy traits. Right now certain optimizations are blocked by not being able to cast from &[F] to &[[u64;LIMBS]]. Hashing a large vector of field elements for example. Mutable casting is a bit trickier as the values are no longer guaranteed reduced, but would be very useful to have as well (you can do this cleanly using a MaybeReduced field type, similar to MaybeInit). That would allow us to implement our own SIMD methods for example. It gives us an efficient and safe backdoor into the field internals. (Also vec![F::ZERO; large_size] is very slow right now, implementing zerocopy::FromZeros would give us a workaround).

This SIMDS support is critical to us (ProveKit). If we can't find a way to do this cleanly in ark-ff we will be forced to write our own field impls.

Copy link
Copy Markdown
Collaborator Author

@z-tech z-tech Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks.

I've been able to squeeze good performance in my efficient sumcheck repo (maybe of independent interest btw) but I am transmuting the memory block there and I agree that zerocopy would be better. Here the idea is if the user calls the sumcheck lib with specific primes they get autodispatched into the vectorized path without ever knowing what that is or how it works.

That's kind of the vision for how vectorization would be ideally supported in Arkworks where the user is unaware like how is summarized on slide 20 here: https://andrewzitek.xyz/images/small_fp_slides.pdf#page=20 (slides are bit outdated otherwise btw). We have students working on this and it's of high interest to me.

That being said, there are things to work out before arriving there and if you're able to point me toward what functionality is most important for your efforts I can do my best to prioritize these. Ideally from my end, we could select some discrete pieces and collab.

Related but different:

I am also doing a rather large effort on Merkle Trees with both security and perfomance enchancements. I think it's relevant to your projects if you want a sneak peek lmk.

Generally speaking, the hope for all of these components (vectorization, sumcheck, vector commitments) is that they are easy to integrate and should well-support what you're doing. Appreciate the feedback and would like to have a closer loop in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants