feat: add flagger_canary_phase metric with granular phase values#1927
Open
Softer wants to merge 1 commit into
Open
feat: add flagger_canary_phase metric with granular phase values#1927Softer wants to merge 1 commit into
Softer wants to merge 1 commit into
Conversation
flagger_canary_status collapses the 11 canary phases into 3 values (0 running, 1 successful, 2 failed), so dashboards cannot tell WaitingPromotion, Promoting, Finalising or Succeeded apart on a Grafana state-timeline. Add a new flagger_canary_phase gauge that exposes each phase as a unique value (0=Initializing ... 10=Terminated) via a deterministic phase-to-value map. SetStatus now also sets the new gauge, so every existing call site is covered without touching the scheduler. flagger_canary_status is left unchanged to avoid breaking existing dashboards and alerts. The Terminating (9) phase is recorded from the finalizer and the Terminated (10) phase from the informer delete handler, so deleted canaries keep emitting a filterable value (flagger_canary_phase < 9) instead of leaving a stale series. This addresses the stale-metric problem from fluxcd#1029 without deleting metrics, which was flagged as a breaking change in fluxcd#1856. Signed-off-by: Softer <sft.nik@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
flagger_canary_statuscollapses the 11 canary phases into only 3 values (0running,1successful,2failed). On a Grafana state-timeline this makes it impossible to distinguishWaitingPromotion,Promoting,FinalisingandSucceeded— they all map to1.Changing
flagger_canary_statuswould break every existing dashboard and alert, so this PR adds a new metric instead and leaves the existing one untouched.What this PR does
Adds a new gauge
flagger_canary_phase(labels:name,namespace) that exposes each phase as a unique value via a deterministic phase-to-value map:SetStatusnow also sets the new gauge (viaSetPhase), so every existing call site is covered without changing the scheduler.flagger_canary_statusis not modified.Terminating(9) phase is recorded from the finalizer (forrevertOnDeletion: truecanaries), andTerminated(10) from the informer delete handler for any deletion. A deleted canary therefore keeps emitting a filterable value, so queries can exclude removed canaries withflagger_canary_phase < 9.This gives a non-breaking answer to the stale-metric problem in #1029: instead of deleting metrics on canary removal (flagged as a breaking change in #1856), the phase metric exposes a terminated sentinel that dashboards/alerts can
filter on. It is also relevant to #1819 where distinguishing
WaitingPromotionmatters.