Skip to content

feat: add parsigex#291

Open
varex83 wants to merge 26 commits intomainfrom
bohdan/dkg-parsigex
Open

feat: add parsigex#291
varex83 wants to merge 26 commits intomainfrom
bohdan/dkg-parsigex

Conversation

@varex83
Copy link
Copy Markdown
Collaborator

@varex83 varex83 commented Mar 19, 2026

Summary

Implements the partial signature exchange (parsigex) libp2p protocol — the mechanism by which distributed validator nodes share the partial BLS signatures they produce for each duty before threshold-combining them into a full aggregate.

New crate: pluto-parsigex

Component Description
Behaviour libp2p NetworkBehaviour that manages per-peer connections and routes inbound/outbound events
Handler Per-connection ConnectionHandler; opens an outbound substream per broadcast, accepts inbound substreams for receives
protocol.rs Wire layer: length-delimited protobuf framing over a raw libp2p stream (/charon/parsigex/2.0.0)
error.rs Failure (handler ↔ behaviour), VerifyError (caller-supplied verifier), Error (public)
Handle Cloneable async control handle — broadcast(duty, data_set) fans a signed set out to all connected peers; subscribe(cb) registers a callback for verified inbound sets

Receive pipeline (handler side):

  1. Read length-delimited bytes from the inbound stream
  2. Decode protobuf → (Duty, ParSignedDataSet)
  3. Gate the duty via caller-supplied DutyGater
  4. Verify every partial signature via caller-supplied async Verifier
  5. Emit Event::Received to the behaviour on success, Event::Error on any failure

Core domain types (crates/core)

New types in types.rs with full protobuf round-trip support:

  • DutyType — enum covering all validator duties (attester, proposer, builder, randao, exit, sync, …)
  • SlotNumber / Duty — slot + duty type pair that identifies a signing round
  • PubKey — 48-byte BLS public key newtype
  • ParSignedData — a Box<dyn SignedData> paired with a share index
  • ParSignedDataSetHashMap<PubKey, ParSignedData> (one partial sig per validator key)

New parsigex_codec.rs — JSON serialise/deserialise for every SignedData variant, with type-erased dispatch via Any::downcast_ref. Fallback order for Attester and Aggregator duties matches Go's Charon for wire compatibility.

Other changes

  • crates/p2p — wires parsigex Behaviour into the combined pluto swarm behaviour
  • crates/peerinfo — removes the local MAX_MESSAGE_SIZE constant in favour of the shared one from pluto-p2p

Test plan

  • cargo test --workspace --all-features
  • cargo clippy --workspace --all-targets --all-features -- -D warnings
  • cargo +nightly fmt --all --check
  • Run parsigex example end-to-end against a local cluster

@varex83 varex83 marked this pull request as ready for review March 26, 2026 14:49
@varex83 varex83 changed the title feat: add parsigex [wip] feat: add parsigex Mar 26, 2026
@varex83 varex83 mentioned this pull request Apr 3, 2026
Base automatically changed from bohdan/parasigdb to main April 3, 2026 15:47
@emlautarom1
Copy link
Copy Markdown
Collaborator

Could you merge main to facilitate reviewing?

@emlautarom1 emlautarom1 linked an issue Apr 3, 2026 that may be closed by this pull request
@varex83 varex83 marked this pull request as draft April 6, 2026 10:36
@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented Apr 6, 2026

Found a few issues while testing with pluto, so converted to draft

@varex83agent
Copy link
Copy Markdown
Collaborator

Code Review

Summary

This PR adds the parsigex crate — a libp2p NetworkBehaviour/ConnectionHandler pair that implements the /charon/parsigex/2.0.0 partial-signature exchange protocol — together with proto conversion helpers in pluto-core and a multi-node example. The approach is sound and the structure closely follows the peerinfo pattern already in the repo. A few correctness and parity issues are noted below.


Findings

[Critical] Cargo.toml workspace path typo — build error

Impact: The workspace will fail to build because the path points to a non-existent directory.
Evidence: Cargo.toml:103pluto-parsigex = { path = "crates/parasigex" } (note "parasigex" vs actual dir crates/parsig**ex**). The workspace member on line 3 correctly spells "crates/parsigex".
Recommendation: path = "crates/parsigex"


[High] Max message size mismatch: parsigex uses 16 MB, Go uses 128 MB

Impact: Any cluster where a ParSigExMsg exceeds 16 MB will be rejected by the Rust node but accepted by Go peers. Large clusters or future duty types with verbose signed data may trigger this.
Evidence: crates/parsigex/src/protocol.rs:17const MAX_MESSAGE_SIZE: usize = 16 * 1024 * 1024;
Go reference: charon/p2p/sender.go:27maxMsgSize = 128 << 20 // 128MB
Recommendation: Use pluto_p2p::proto::MAX_MESSAGE_SIZE (128 MB, now exported) instead of the local constant, matching Go and the recent peerinfo unification.


[High] Handler only processes one inbound stream at a time — silent drop under concurrency

Impact: If a second inbound stream is fully negotiated while the first is still being read/verified, on_connection_event overwrites self.inbound, silently dropping the first receive future. Go's handler processes each inbound stream in its own goroutine; there is no such cap.
Evidence: crates/parsigex/src/handler.rsinbound: Option<RecvFuture> is a single slot. ConnectionEvent::FullyNegotiatedInbound unconditionally assigns self.inbound = Some(recv_message(...)) with no check for an existing future.
Go reference: charon/p2p/receive.go:40-65SetStreamHandlerMatch forks a goroutine per stream.
Recommendation: Use a VecDeque<RecvFuture> (mirroring outbound_queue) and drive all pending inbound futures in poll, emitting events as each one completes.


[Medium] SignedDataError::Custom is not Send + Sync

Impact: Box<dyn std::error::Error> is not Send + Sync. Any async code that holds a SignedDataError::Custom across an await point will fail to compile.
Evidence: crates/core/src/signeddata.rs:51Custom(Box<dyn std::error::Error>)
Recommendation: Custom(Box<dyn std::error::Error + Send + Sync>)


[Medium] Signature error boxed as a fake serde_json::Error — API abuse

Impact: Error information is discarded and the type contract is violated. serde_json::Error::io is intended for I/O failures during JSON serialization, not arbitrary domain errors. Downstream callers catching ParSigExCodecError::Serialize for JSON-decode failures will also receive what are actually signature errors.
Evidence: crates/core/src/types.rs:597-601

let signature = data.signed_data.signature().map_err(|err| {
    ParSigExCodecError::Serialize(serde_json::Error::io(std::io::Error::other(
        err.to_string(),
    )))
})?;

Recommendation: Add a dedicated ParSigExCodecError::Signature(String) (or SignedDataError) variant, or re-use ParSigExCodecError::InvalidParSignedProto and attach the message via a separate field.


[Medium] Behaviour::poll processes at most one command per wake cycle

Impact: Under high broadcast load the command channel can starve: each poll call dequeues at most one Command from rx, even if many are queued. This can add per-message latency proportional to the queue depth.
Evidence: crates/parsigex/src/behaviour.rspoll calls self.rx.poll_recv(cx) exactly once; handle_command pushes to pending_actions but only one action is returned per poll invocation (the check on pending_actions pops only one before returning Ready).
Recommendation: Drain rx in a while let Poll::Ready(Some(cmd)) loop (bounded by a budget, e.g. 32) before returning, similar to how libp2p's own behaviours handle this.


[Medium] .with_quic_enabled(true) added to relay-node builder without explanation

Impact: One Node::new_* variant now enables QUIC when it previously did not. QUIC-over-relay is not a standard libp2p transport combination and may cause unexpected connection failures or interop issues with Go peers.
Evidence: crates/p2p/src/p2p.rs:341.with_quic_enabled(true) added to the builder that uses .with_relay_client(...).
Recommendation: Confirm this is intentional and add a comment explaining why this variant needs QUIC enabled.


[Low] Redundant peer_id parameter in Behaviour::new

Impact: None at runtime, but the API is confusing — callers must pass the same value twice.
Evidence: crates/parsigex/src/behaviour.rs:167pub fn new(config: Config, peer_id: PeerId) while config.peer_id already holds the same value; enforced only via debug_assert_eq!.
Recommendation: Remove the second peer_id parameter and derive it from config.peer_id internally.


[Low] DutySentinel serialization is asymmetric

Impact: From<&DutyType> maps DutySentinel(_) => 14, but TryFrom<i32> maps 14 (and anything ≥ 14) to InvalidDuty. Sentinels are never transmitted so this has no operational impact, but it's a latent footgun.
Evidence: crates/core/src/types.rs:84 and types.rs:105.
Recommendation: Either exclude DutySentinel from the From impl (unreachable!/panic) or round-trip it correctly.


Parity Matrix

Component Go Rust Match Notes
Protocol ID /charon/parsigex/2.0.0 /charon/parsigex/2.0.0 yes
Wire format protobuf length-delimited protobuf length-delimited yes
Max message size 128 MB 16 MB no See High finding above
DutyType integer mapping 0–13 0–13 yes
Empty data set rejection yes yes yes InvalidParSignedDataSetFields
Duty validity check on recv yes yes yes duty_gater
Per-entry signature verification yes yes yes verifier callback
Concurrent inbound streams yes (goroutine per stream) no (single slot) no See High finding above
BuilderProposer rejected yes (deprecated) yes (DeprecatedBuilderProposer) yes

Tests

Tests were not run locally. The example (crates/parsigex/examples/parsigex.rs) provides an integration-level smoke test for the broadcast path but requires a live multi-node setup. Unit tests for encode_message/decode_message round-trips and codec edge cases (empty set, unknown duty type, oversized message) appear absent — these would be worth adding.


Open Questions

  1. Is the 16 MB cap intentional (memory-safety limit) or an oversight vs. Go's 128 MB?
  2. Was .with_quic_enabled(true) on the relay node builder intentional? The other three Node::new_* builders in the same file were not changed.
  3. SignedDataError::Custom is only added in signeddata.rs but never constructed in this PR — is it intended for follow-up work, or can it be deferred?

@varex83 varex83 marked this pull request as ready for review April 13, 2026 14:29
Comment thread crates/core/src/parsigex_codec.rs
Comment thread crates/parsigex/src/behaviour.rs Outdated
Comment thread crates/parsigex/src/error.rs Outdated
Comment thread crates/parsigex/src/error.rs Outdated
Comment thread crates/core/Cargo.toml Outdated
Comment thread crates/core/src/parsigex_codec.rs Outdated
Comment thread crates/parsigex/src/behaviour.rs Outdated
Comment thread crates/parsigex/src/handler.rs Outdated
Comment thread crates/parsigex/src/protocol.rs Outdated
Copy link
Copy Markdown
Collaborator

@varex83agent varex83agent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review: feat: add parsigex

This PR ports the Charon parsigex libp2p protocol to Rust. The architecture is sound — the Behaviour/Handler split, the per-stream one-shot send/receive design, and the duty codec fallback order (Attestation before VersionedAttestation, SignedAggregateAndProof before versioned) all match Go correctly.

There are 3 bugs that must be fixed before merge:

  1. The codec is JSON-only — Go writes SSZ-first since v0.17, causing a hard interoperability failure for all major duty types (crates/core/src/parsigex_codec.rs)
  2. DutySentinel encodes as 14 but TryFrom<i32> has no arm for 14, making the round-trip fail with InvalidDuty (crates/core/src/types.rs)
  3. Broadcast with no connected peers silently drops the request_id without emitting a terminal event, leaving callers blocked forever (crates/parsigex/src/behaviour.rs:304)

Additionally there are 8 major issues: the peerinfo max-message-size regression (64 KiB → 128 MiB), pending_broadcasts leaking on connection close, unbounded tokio::spawn in notify_subscribers, FuturesUnordered with no concurrency cap, VerifyError being silently discarded, SignedDataError::Custom not being Send + Sync, a roundabout error conversion through serde_json::Error::io, and a panic!-eligible .expect() in non-test code. Inline comments are filed for each issue individually.

Comment thread crates/core/src/parsigex_codec.rs
Comment thread crates/core/src/types.rs Outdated
Comment thread crates/parsigex/src/behaviour.rs Outdated
Comment thread crates/parsigex/src/behaviour.rs Outdated
/// Each subscriber is invoked in a spawned task since `poll()` is
/// synchronous. This matches Go's intended behaviour (see Go TODO to call
/// subscribers async).
fn notify_subscribers(&self, duty: Duty, data_set: ParSignedDataSet) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Major: notify_subscribers spawns an unbounded number of Tokio tasks — DoS amplification and head-of-line blocking

Every verified inbound message calls tokio::spawn with no bound. Under sustained inbound traffic, the number of live tasks grows unboundedly, consuming memory proportional to message rate.

Additionally, all subscriber callbacks are awaited sequentially inside a single task. If any subscriber is slow (e.g., blocked on a back-pressured channel), all subsequent subscribers in the list are stalled for that entire duration.

Fix: feed messages through a bounded tokio::sync::mpsc channel to a dedicated delivery task. Back-pressure is then applied at the receive side rather than accumulating as spawned tasks:

// In Behaviour:
dispatch_tx: mpsc::Sender<(Duty, ParSignedDataSet)>,  // bounded capacity
// In a dedicated spawned task:
while let Some((duty, set)) = rx.recv().await {
    for sub in subs.read().await.iter() { sub(duty.clone(), set.clone()).await; }
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the fact that sequential execution of subscriber callbacks is not good. Still, I believe that we can drop the entire subscribe mechanism given how Rust's libp2p already uses events.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would revisit it a little bit later. I've tried exploring other implementations, but came up that the current is the best way to go with. The main issue arise when we will use it in the exchanger, which heavily relies on the subscriber pattern

Comment thread crates/parsigex/src/behaviour.rs Outdated
Comment thread crates/parsigex/src/handler.rs Outdated
Comment thread crates/parsigex/src/lib.rs
Comment thread crates/parsigex/src/protocol.rs Outdated
Comment thread crates/core/src/parsigex_codec.rs Outdated
@NethermindEth NethermindEth deleted a comment from claude bot Apr 15, 2026
@varex83
Copy link
Copy Markdown
Collaborator Author

varex83 commented Apr 15, 2026

/claude-review

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 15, 2026

Claude finished @varex83's task in 12m 36s —— View job


PR #291 — parsigex Review

  • Gather PR context and diff
  • Run parallel specialist reviews (functional, security, Rust style, code quality)
  • Deduplicate and assess findings
  • Post review findings
  • Final summary

Verdict: REQUEST_CHANGES — 3 bugs + 7 major findings

The architecture is sound: Behaviour/Handler split mirrors Charon, duty-codec fallback order (Attestation before VersionedAttestation, SignedAggregateAndProof before versioned) matches Go, and the protocol ID /charon/parsigex/2.0.0 is correct. However, the issues below must be addressed before merge.


Bugs (must fix)

1. behaviour.rs:304 — Silent terminal event drop when no peers connected

When targeted == 0, the function returns early without emitting BroadcastComplete or BroadcastFailed. Any caller awaiting a terminal event for the request_id will block forever.

if targeted == 0 {
    self.pending_events
        .push_back(ToSwarm::GenerateEvent(Event::BroadcastFailed { request_id }));
    return;
}

2. crates/core/src/types.rs:91DutySentinel round-trip broken

From<&DutyType> for i32 maps DutySentinel(_) => 14, but TryFrom<i32> has no arm for 14 — it falls through to _ => Err(InvalidDuty). In Go, dutySentinel = 14 is an unexported constant never transmitted on the wire. DutySentinel should not be serialisable to the wire format at all.

Fix: unreachable!() or return an error in From for DutySentinel; add 14 => Err(UnsupportedDutyType) in TryFrom<i32>.

3. crates/core/src/signeddata.rs:53SignedDataError::Custom missing Send + Sync

Box<dyn std::error::Error> is not Send + Sync. signature() is called inside the parsigex async pipeline across .await points in spawned tasks.

// Fix:
Custom(Box<dyn std::error::Error + Send + Sync>),

Major Findings

4. behaviour.rs:431on_swarm_event ignores ConnectionClosed causing pending_broadcasts to leak

fn on_swarm_event(&mut self, _event: FromSwarm) {} is a no-op. When a connection drops mid-broadcast, the handler is destroyed but finish_broadcast_result is never called. PendingBroadcast entries persist forever; callers waiting for completion events are blocked indefinitely.

Fix: handle FromSwarm::ConnectionClosed and finalise any pending broadcast to the disconnected peer with a failure result.

5. handler.rs:81FuturesUnordered has no concurrency cap — stream-flood enables memory exhaustion

Every inbound stream pushes a BoxFuture with no limit. A peer can open 2048 concurrent streams (Yamux cap), each holding a Verifier clone, DutyGater clone, and a live stream buffer up to MAX_MESSAGE_SIZE = 128 MiB.

Fix: check active_futures.len() before pushing and reject streams above a per-connection ceiling (e.g. 16).

6. behaviour.rs:395notify_subscribers spawns unbounded Tokio tasks

Every verified inbound message unconditionally calls tokio::spawn with no bound. Under sustained traffic task count grows without bound. All subscriber callbacks are also awaited sequentially inside each task, so a slow subscriber head-of-line blocks all subsequent ones.

Fix: use a bounded mpsc channel to a dedicated dispatch task for back-pressure.

7. protocol.rs:21.expect() in non-test code violates the project no-panic rule

AGENTS.md prohibits expect()/unwrap()/panic!() outside test code. Use the infallible encode_to_vec() from prost:

pub fn encode_protobuf<M: Message>(message: &M) -> Vec<u8> {
    message.encode_to_vec()
}

8. crates/core/src/parsigex_codec.rs:55 — JSON-only codec while Go uses SSZ-first since v0.17

serialize_signed_data / deserialize_signed_data use serde_json exclusively. Go's marshal() (charon/core/proto.go) tries ssz.Marshaler first (sszMarshallingEnabled = true by default since v0.17). Types Go writes as SSZ: VersionedSignedProposal, Attestation, VersionedAttestation, SignedAggregateAndProof, VersionedSignedAggregateAndProof, SignedSyncMessage, SignedSyncContributionAndProof. A Go peer writes SSZ; Rust returns InvalidPayload and silently drops — complete interop failure. (Tracked in #322.)

9. error.rs:26Failure::Io(String) discards io::ErrorKind

Converting io::Error to String loses ErrorKind, making it impossible to distinguish connection resets from timeouts. Consider Io(Arc<std::io::Error>) to preserve ErrorKind while remaining Clone.

10. types.rs:601 — Signature extraction error laundered through serde_json::Error::io

A SignedDataError from signature() is tunnelled through String -> io::Error -> serde_json::Error -> ParSigExCodecError::Serialize. A signature extraction failure is not a JSON error. Add a dedicated ExtractSignature(String) variant and map to it directly.


Minor Findings

Location Finding
handler.rs:263 VerifyError silently discarded — rogue share attribution impossible. Log before mapping with tracing::warn!.
handler.rs:256 Decode error silently discarded — log at debug level before `map_err(
types.rs:619 share_idx = 0 should be rejected early — Charon share indices are 1-based.
behaviour.rs:125 SharedSubs uses Mutex for read-mostly subscriber list — use RwLock instead.
behaviour.rs:213 Redundant peer_id parameter in Behaviour::new — only used in debug_assert, never stored; config already carries it.
handler.rs:90 _peer: PeerId accepted but immediately discarded — remove from Handler::new.
lib.rs:22 protocols() is dead public API — never called anywhere; wire it in or remove it.

Nits

  • protocol.rs:15Result as ParasigexResult — misspelling, should be ParsigexResult
  • behaviour.rs:454while let drains entire command channel per poll call — consider a budget of ~32 to bound executor latency

Parity Matrix

Component Go Rust Match
Protocol ID /charon/parsigex/2.0.0 /charon/parsigex/2.0.0
Wire format protobuf length-delimited protobuf length-delimited
Max message size 128 MiB 128 MiB (via pluto_p2p::proto)
DutyType integer mapping 0–13 0–13
Duty validity check on recv yes yes (duty_gater)
Per-entry signature verification yes yes (verifier callback)
Fallback codec order (Attester, Aggregator) yes yes
SSZ-first serialization yes (v0.17+) no (JSON only)
Empty data set rejection yes yes
Concurrent inbound streams (goroutine per stream) yes no (unbounded FuturesUnordered)

Copy link
Copy Markdown
Collaborator

@emlautarom1 emlautarom1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general LGTM, I cannot fully comment on the correctness of the Handler though.

The example works as intended (with the updated command) but there are some small things to sort out before merging.

Comment thread crates/core/Cargo.toml Outdated
Comment thread crates/core/src/types.rs Outdated
Comment thread crates/core/src/types.rs Outdated
Comment thread crates/peerinfo/src/protocol.rs Outdated
Comment thread crates/parsigex/src/protocol.rs Outdated
Comment thread crates/parsigex/src/behaviour.rs Outdated
/// Each subscriber is invoked in a spawned task since `poll()` is
/// synchronous. This matches Go's intended behaviour (see Go TODO to call
/// subscribers async).
fn notify_subscribers(&self, duty: Duty, data_set: ParSignedDataSet) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree on the fact that sequential execution of subscriber callbacks is not good. Still, I believe that we can drop the entire subscribe mechanism given how Rust's libp2p already uses events.

Comment thread crates/parsigex/src/handler.rs Outdated
Comment thread crates/parsigex/src/handler.rs
Comment thread crates/parsigex/src/handler.rs Outdated
Copy link
Copy Markdown
Collaborator

@iamquang95 iamquang95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, the only missing part is #322

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement core/parsigex

4 participants