Add Devnet-4 metrics: block production, gossip sizes, sync status#279
Add Devnet-4 metrics: block production, gossip sizes, sync status#279pablodeymo wants to merge 3 commits intomainfrom
Conversation
…d bucket update (leanMetrics PR #29) - Block production: building time, payload aggregation time, aggregated payload count, success/failure counters - Gossip message sizes: block, attestation, and aggregation compressed bytes - Sync status gauge with idle/syncing/synced labels (wiring deferred to PR #246) - Update committee_signatures_aggregation buckets from [0.005..1s] to [0.05..4s]
🤖 Codex Code ReviewNo blocking findings. The patch appears telemetry-only: it does not modify fork choice, attestation validation, STF, SSZ decoding, or XMSS verification behavior. I did not see a consensus or security regression in the added code paths. One non-blocking observability note:
Residual risk: I reviewed the diff and surrounding code, but I did not run tests. Automated review by OpenAI Codex · gpt-5.4 · custom prompt |
🤖 Kimi Code ReviewThe PR adds comprehensive metrics for block building and gossipsub message sizes. Overall the implementation is correct and follows Rust best practices, but there are two items to address: 1. Type safety issue in sync status metric ( The // Current: typo results in all zeros
pub fn set_sync_status(status: &str) {
for label in &["idle", "syncing", "synced"] {
LEAN_NODE_SYNC_STATUS
.with_label_values(&[label])
.set(i64::from(*label == status)); // All false if typo
}
}Recommendation: Use an enum to enforce valid states at compile time: pub enum SyncStatus { Idle, Syncing, Synced }
pub fn set_sync_status(status: SyncStatus) {
let status_str = match status { /* ... */ };
for label in &["idle", "syncing", "synced"] {
let value = i64::from(label == &status_str);
LEAN_NODE_SYNC_STATUS.with_label_values(&[label]).set(value);
}
}2. Missing initialization for P2P metrics ( The new gossip size histograms ( Minor observations:
Automated review by Kimi (Moonshot AI) · kimi-k2.5 · custom prompt |
Greptile SummaryThis PR adds nine new Prometheus metrics covering block production timing, gossip wire sizes, and node sync status, plus a bucket-range correction for Confidence Score: 5/5
|
| Filename | Overview |
|---|---|
| crates/blockchain/src/lib.rs | Adds block-building RAII timer and success/failure counters to propose_block(); the sign_attestation failure path is missing a inc_block_building_failures() call, creating a counter/histogram count mismatch. |
| crates/blockchain/src/metrics.rs | Adds 5 new block-production metrics (2 histograms, 2 counters, 1 payload histogram) and 1 IntGaugeVec for sync status; all are force-initialized in init() and have appropriate bucket ranges. Bucket update for lean_committee_signatures_aggregation_time_seconds is correct. |
| crates/blockchain/src/store.rs | Adds payload-aggregation RAII timer around build_block() and observes signatures.len() (one proof per attestation, confirmed by verify_signatures invariant) as the payload count; explicit drop to stop timer before the cheap Ok return is clean and correct. |
| crates/net/p2p/src/gossipsub/handler.rs | Wires three gossip-size observations before decompression on each topic branch; measuring compressed wire size is consistent with the PR's stated intent. |
| crates/net/p2p/src/metrics.rs | Adds three gossip-size histograms with sensible bucket ranges (block 10KB–5MB, attestation 512B–16KB, aggregation 1KB–1MB); follows existing lazy-init pattern of the p2p metrics module. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[propose_block] -->|start| T1["_block_timing = time_block_building()"]
T1 --> B[produce_block_with_signatures]
B -->|start| T2["_payload_timing = time_block_building_payload_aggregation()"]
T2 --> C[build_block]
C --> D["drop(_payload_timing) — stops payload timer"]
D --> E["observe_block_aggregated_payloads(signatures.len())"]
E -->|Error| F["inc_block_building_failures() + return"]
E -->|Ok| G[sign_attestation]
G -->|Error| H["return ⚠️ no counter"]
G -->|Ok| I[process_block]
I -->|Error| J["inc_block_building_failures() + return"]
I -->|Ok| K["inc_block_building_success()"]
K --> L[publish to gossip]
L --> M["_block_timing drops — records full duration"]
Comments Outside Diff (1)
-
crates/blockchain/src/lib.rs, line 259-268 (link)Missing failure counter on
sign_attestationerror pathIf
sign_attestationfails,_block_timingis dropped and records a duration, but neitherinc_block_building_failures()norinc_block_building_success()is called. This meanslean_block_building_time_seconds._countwill exceedlean_block_building_success_total + lean_block_building_failures_total, making it impossible to reconcile the two metrics accurately.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/blockchain/src/lib.rs Line: 259-268 Comment: **Missing failure counter on `sign_attestation` error path** If `sign_attestation` fails, `_block_timing` is dropped and records a duration, but neither `inc_block_building_failures()` nor `inc_block_building_success()` is called. This means `lean_block_building_time_seconds._count` will exceed `lean_block_building_success_total + lean_block_building_failures_total`, making it impossible to reconcile the two metrics accurately. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/blockchain/src/lib.rs
Line: 259-268
Comment:
**Missing failure counter on `sign_attestation` error path**
If `sign_attestation` fails, `_block_timing` is dropped and records a duration, but neither `inc_block_building_failures()` nor `inc_block_building_success()` is called. This means `lean_block_building_time_seconds._count` will exceed `lean_block_building_success_total + lean_block_building_failures_total`, making it impossible to reconcile the two metrics accurately.
```suggestion
let Ok(proposer_signature) = self
.key_manager
.sign_attestation(validator_id, &proposer_attestation.data)
.inspect_err(|err| {
metrics::inc_block_building_failures();
error!(%slot, %validator_id, %err, "Failed to sign proposer attestation");
})
else {
return;
};
```
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "Add Devnet-4 metrics: block production, ..." | Re-trigger Greptile
🤖 Claude Code ReviewNow I have enough context to write a thorough review. Review: PR #279 — Devnet-4 MetricsSummaryClean, well-scoped implementation. The structure is consistent with existing metrics patterns, the Findings
Non-issues
Automated review by Claude (Anthropic) · sonnet · custom prompt |
crates/blockchain/src/store.rs
Outdated
|
|
||
| let known_block_roots = store.get_block_roots(); | ||
|
|
||
| let _payload_timing = metrics::time_block_building_payload_aggregation(); |
There was a problem hiding this comment.
This should go only when we are aggregating payloads
There was a problem hiding this comment.
Done. Moved the timer inside build_block, so it only starts when !aggregated_payloads.is_empty().
…et_sync_status Address PR #279 review comments: the payload aggregation timing guard now only starts when there are payloads to aggregate, and the unused set_sync_status function and LEAN_NODE_SYNC_STATUS metric are removed.
Motivation
Implements the metrics defined in leanMetrics PR #29 (Devnet-4 metrics). These metrics provide observability into block production performance, gossip message sizes, and node sync status — critical for monitoring multi-client devnets.
Description
Block Production Metrics (5 new)
Instrumented in
propose_block()(lib.rs) andproduce_block_with_signatures()(store.rs):lean_block_building_time_secondspropose_block()lean_block_building_payload_aggregation_time_secondsbuild_block()calllean_block_aggregated_payloadsbuild_block()returnslean_block_building_success_totalprocess_block()lean_block_building_failures_totalproduce_block_with_signaturesorprocess_blockerrorGossip Message Size Metrics (3 new)
Instrumented in
handle_gossipsub_message()(handler.rs), measuring compressed (wire) size:lean_gossip_block_size_byteslean_gossip_attestation_size_byteslean_gossip_aggregation_size_bytesSync Status Gauge (1 new)
lean_node_sync_statusstatus=idle|syncing|syncedThe metric and API (
set_sync_status()) are defined but not yet wired — will be activated when #246 (skip validator duties while syncing) merges.Bucket Update (1 change)
lean_committee_signatures_aggregation_time_secondsbuckets updated from[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 0.75, 1.0]to[0.05, 0.1, 0.25, 0.5, 0.75, 1.0, 2.0, 3.0, 4.0]— the old upper bound of 1s was too low for production aggregation times.Files changed
crates/blockchain/src/metrics.rscrates/blockchain/src/lib.rspropose_block()crates/blockchain/src/store.rscrates/net/p2p/src/metrics.rscrates/net/p2p/src/gossipsub/handler.rsTest plan
cargo fmt --all -- --checkpassescargo clippy --workspace -- -D warningspassescargo test --workspace --releasepasses (all 97 tests)/metricsendpoint