api: add static IP management failure metrics#841
Open
KevinSailema wants to merge 2 commits intoNVIDIA:mainfrom
Open
api: add static IP management failure metrics#841KevinSailema wants to merge 2 commits intoNVIDIA:mainfrom
KevinSailema wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new Prometheus-exported counter for static IP reservation/assignment management failures, with operation and reason labels, and wires it into key failure paths across static IP workflows and DHCP discover (reserved segments).
Changes:
- Introduces
ApiMetricsEmitter::record_static_ip_management_failure()backed by a new OTel counter (carbide_static_ip_management_failures→ exported as_total). - Emits the failure metric across multiple error paths in static IP preallocation/update/assign/remove handlers.
- Extends static address management tests to assert the failure metric is emitted for representative failure scenarios (invalid IP, missing interface_id, reserved segment without reservation).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/api/src/api/metrics.rs | Adds the static IP management failures counter and a helper method to record it with labels. |
| crates/api/src/handlers/machine_interface_address.rs | Records the new failure metric across error paths in static IP preallocate/update/assign/remove flows. |
| crates/api/src/handlers/expected_switch.rs | Passes the metrics emitter into static IP preallocation/update calls. |
| crates/api/src/handlers/expected_power_shelf.rs | Passes the metrics emitter into static IP preallocation/update calls. |
| crates/api/src/dhcp/discover.rs | Records a failure metric when DHCP discover fails due to reserved segment without a matching reservation. |
| crates/api/src/tests/static_address_management.rs | Adds integration tests asserting metric emission for key failure cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR adds observability for static IP reservation/assignment management failures to cover issue #837.
It introduces a new counter metric:
carbide_static_ip_management_failures_totaloperationreasonThe metric is emitted across failure paths in static IP workflows (preallocate/update/assign/remove) and in DHCP discover when a reserved segment has no matching static reservation.
It also updates tests to validate metric emission for key failure scenarios.
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes
cargo test -p carbide-api --no-default-features1155 passed, 0 failed, 4 ignored