Skip to content

api: add static IP management failure metrics#841

Open
KevinSailema wants to merge 2 commits intoNVIDIA:mainfrom
KevinSailema:fix/issue-837-static-ip-metrics
Open

api: add static IP management failure metrics#841
KevinSailema wants to merge 2 commits intoNVIDIA:mainfrom
KevinSailema:fix/issue-837-static-ip-metrics

Conversation

@KevinSailema
Copy link
Copy Markdown

Description

This PR adds observability for static IP reservation/assignment management failures to cover issue #837.

It introduces a new counter metric:

  • carbide_static_ip_management_failures_total
  • Labels:
    • operation
    • reason

The metric is emitted across failure paths in static IP workflows (preallocate/update/assign/remove) and in DHCP discover when a reserved segment has no matching static reservation.

It also updates tests to validate metric emission for key failure scenarios.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

  • Full crate test run used for validation in this branch:
    • cargo test -p carbide-api --no-default-features
    • Result: 1155 passed, 0 failed, 4 ignored
  • Static IP scope tests also validated as part of the run.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot AI review requested due to automatic review settings April 7, 2026 21:09
@KevinSailema KevinSailema requested a review from a team as a code owner April 7, 2026 21:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new Prometheus-exported counter for static IP reservation/assignment management failures, with operation and reason labels, and wires it into key failure paths across static IP workflows and DHCP discover (reserved segments).

Changes:

  • Introduces ApiMetricsEmitter::record_static_ip_management_failure() backed by a new OTel counter (carbide_static_ip_management_failures → exported as _total).
  • Emits the failure metric across multiple error paths in static IP preallocation/update/assign/remove handlers.
  • Extends static address management tests to assert the failure metric is emitted for representative failure scenarios (invalid IP, missing interface_id, reserved segment without reservation).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
crates/api/src/api/metrics.rs Adds the static IP management failures counter and a helper method to record it with labels.
crates/api/src/handlers/machine_interface_address.rs Records the new failure metric across error paths in static IP preallocate/update/assign/remove flows.
crates/api/src/handlers/expected_switch.rs Passes the metrics emitter into static IP preallocation/update calls.
crates/api/src/handlers/expected_power_shelf.rs Passes the metrics emitter into static IP preallocation/update calls.
crates/api/src/dhcp/discover.rs Records a failure metric when DHCP discover fails due to reserved segment without a matching reservation.
crates/api/src/tests/static_address_management.rs Adds integration tests asserting metric emission for key failure cases.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add metrics related to static IP reservation/assignment management

2 participants