Skip to content

fix metrics type metadata when some quantiles are cleaned due to summary window#121

Merged
hagen1778 merged 3 commits intomasterfrom
fix/metrics-metadata-after-cleanup
Apr 3, 2026
Merged

fix metrics type metadata when some quantiles are cleaned due to summary window#121
hagen1778 merged 3 commits intomasterfrom
fix/metrics-metadata-after-cleanup

Conversation

@jiekun
Copy link
Copy Markdown
Member

@jiekun jiekun commented Mar 30, 2026

fix #120

For summary data, it contains *Summary and multiple *quantileValue. When printing, they'll be sorted by their metricType:

  1. *Summary has unsupported.
  2. *quantileValue has summary.

So *quantileValue are always printed first.

However, the *quantileValue could be removed if stale for 5 minutes, in this case, the metricType of *Summary will be printed (as unsupported), while the expectation would be summary.

This pull request specific the metric type for both to summary and sort them by isAux.


Summary by cubic

Fixes TYPE metadata for summary metrics when quantiles expire. Always emits TYPE summary and sorts quantiles before _sum/_count using an auxiliary flag, not metric-type sorting.

  • Bug Fixes
    • Set Summary.metricType() to return "summary" so metadata stays correct even if only _sum/_count remain.
    • Use isAux to sort quantiles before _sum/_count within the same family; remove metric-type-based sorting.
    • Add a unit test for a short-window SummaryExt to verify TYPE summary after quantiles expire; add an isAux comment to clarify the sorting behavior.

Written for commit ef2c19a. Summary will update on new commits.

@jiekun jiekun requested review from Copilot, hagen1778 and makasim March 30, 2026 10:20
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.75%. Comparing base (368479b) to head (ef2c19a).
⚠️ Report is 5 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
+ Coverage   88.34%   89.75%   +1.40%     
==========================================
  Files          12       12              
  Lines        1399     1572     +173     
==========================================
+ Hits         1236     1411     +175     
- Misses        112      115       +3     
+ Partials       51       46       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 4 files

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect # TYPE ... unsupported metadata for summary metric families when quantile samples are no longer emitted (e.g., after the summary window rotates), ensuring the family always exposes summary while preserving Prometheus-like ordering (quantiles before _sum/_count).

Changes:

  • Set Summary.metricType() to return summary (matching quantileValue) so metadata remains correct even when quantiles aren’t printed.
  • Replace metricType-based ordering with a summaryMetric/sortPriority() mechanism to keep quantiles ordered before the summary’s _sum/_count.
  • Extend ExampleExposeMetadata to cover the “quantiles expire” scenario and assert that TYPE remains summary.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
summary.go Unifies summary metric type and adds summary-specific sorting helper.
set.go Updates stable sorting to handle summary vs quantiles without relying on metricType differences.
metrics.go Introduces summaryMetric interface for summary-specific ordering.
set_example_test.go Adds example coverage for summary metadata correctness after quantiles stop emitting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread set_example_test.go Outdated
Comment thread set_example_test.go Outdated
Comment thread summary.go Outdated
Comment thread summary.go Outdated
Comment thread summary.go Outdated
Comment thread metrics.go Outdated
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread set.go Outdated
Copy link
Copy Markdown
Contributor

@hagen1778 hagen1778 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Comment thread metrics.go Outdated
}

// sortableMetric is a special metric that may require sorting (backed summary and quantileValue).
type sortableMetric interface {
Copy link
Copy Markdown
Contributor

@hagen1778 hagen1778 Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick fix. But I think it kinda gets overcomplicated.
I think we did not make the best decision by relying on logic based on unsupported type here https://github.com/VictoriaMetrics/metrics/pull/105/changes#diff-0543e8750caa826e80d331a4c6c9234e227ebaaacd232db41e060b0b9380a7eeR129. It is brittle and not reliable. And the linked issue to this PR only demonstrates it.

I suggest we try to make it simpler.
The {quantile=<>} metrics in this library are marked as auxilary (see isAux attr). These are only metrics that are marked like this, it seems. We need to guarantee the following sorting logic:

  1. Sort by metric family, so all my_metric_whatever metrics go together
  2. Sort by metric name, so all my_summary{quantile="0.5"} and my_summary{quantile="0.75"} go one after another, respecting quantile="" sorting
  3. Special sorting for summaries that consists of auxilary quantile metrics and main _sum and _count metrics.

p1 and p2 are easy to achieve. And the struggle was only with p3, so we added sorting by type relying on sorting order between summary and unsupported, which I don't like.

Let's try to solve p3 by simply saying that auxilary metrics go first:

	lessFunc := func(i, j int) bool {
		a, b := s.a[i], s.a[j]
		fName1, fName2 := getMetricFamily(a.name), getMetricFamily(b.name)
		if fName1 != fName2 {
			return fName1 < fName2
		}
		if a.isAux != b.isAux {
			return a.isAux
		}
		return a.name < b.name
	}

That should work for all described cases. @jiekun could you please check if my suggestion is viable?
If yes, we can get rid of all the rest complicated sorting logic.

Copy link
Copy Markdown
Member Author

@jiekun jiekun Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I had trouble understanding isAux by the git history, which was the motivation to use a seperate attribute for sorting, and it's clear now. Let me see if I can wrote a better one with this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I preserved the unit test, and it's all green for the suggested sorting logic. Thank you.

@jiekun jiekun force-pushed the fix/metrics-metadata-after-cleanup branch from 6d9232e to 1d02475 Compare April 2, 2026 08:47
@jiekun jiekun requested a review from hagen1778 April 2, 2026 08:49
Comment thread set.go
if mType1 != mType2 {
return mType1 < mType2
// if both has same family, check the auxiliary first.
// only summary quantile(s) is registered as auxiliary, and quantile(s) in summary should go first.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add comment to the isAux field in metric struct to note where and why it is used. So in future we can quickly figure out what it is for.

Comment thread set_example_test.go Outdated
@hagen1778
Copy link
Copy Markdown
Contributor

Once merged, this fix should be delivered to vm, vl and vt projects. It also should be backported to vm lts branches.

Copy link
Copy Markdown
Contributor

@f41gh7 f41gh7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment thread set_example_test.go Outdated
@hagen1778
Copy link
Copy Markdown
Contributor

Thanks!

@hagen1778 hagen1778 merged commit 5dbd3cd into master Apr 3, 2026
30 checks passed
@hagen1778 hagen1778 deleted the fix/metrics-metadata-after-cleanup branch April 3, 2026 07:16
@hagen1778
Copy link
Copy Markdown
Contributor

This change was included into https://github.com/VictoriaMetrics/metrics/releases/tag/v1.43.1
@jiekun pls take care of making corresponding update PRs to depending repos.

makasim added a commit to VictoriaMetrics/VictoriaMetrics that referenced this pull request Apr 3, 2026
…ummary (#10745)

Fix `unsupported` metric type display in exposed metric metadata for
summaries and quantiles by bumping `metrics` SDK version.

This `unsupported` type exists when a summary is not updated within a
certain time window. See VictoriaMetrics/metrics#120 and pull
request VictoriaMetrics/metrics#121 for details.

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
makasim added a commit to VictoriaMetrics/VictoriaMetrics that referenced this pull request Apr 3, 2026
…ummary (#10745)

Fix `unsupported` metric type display in exposed metric metadata for
summaries and quantiles by bumping `metrics` SDK version.

This `unsupported` type exists when a summary is not updated within a
certain time window. See VictoriaMetrics/metrics#120 and pull
request VictoriaMetrics/metrics#121 for details.

Signed-off-by: Zhu Jiekun <jiekun@victoriametrics.com>
Signed-off-by: Max Kotliar <kotlyar.maksim@gmail.com>
Co-authored-by: Max Kotliar <mkotlyar@victoriametrics.com>
NickPak pushed a commit to NickPak/metrics that referenced this pull request Apr 7, 2026
…ary window (VictoriaMetrics#121)

Summary metric type, when exposed in Prometheus format,
consists of quantiles, _sum and _count metrics. The order of these metrics
should be respected as listed above.

Before, this order was implemented by comparing `metricType`, which returned
`summary` for quantiles and `unsupported` for _sum and _count. The sorting
logic was based on the fact that quantiles are always present and they will
be always printed first, so their `metricType` will be printed first too.

This logic is complicated and unreliable, as shown in VictoriaMetrics#120.

This commit removes logic based on `metricType`.

The `{quantile=<>}` metrics are currently marked as auxilary (see `isAux` attr).
These are only metrics that are marked like this. We need to guarantee the following sorting logic:

1. Sort by metric family, so all my_metric_whatever metrics go together
2. Sort by metric name, so all my_summary{quantile="0.5"} and my_summary{quantile="0.75"} go one after another, respecting quantile="" sorting
3. Special sorting for summaries that consists of auxilary quantile metrics and main _sum and _count metrics.

The new sorting logic now relies on `isAux` field for printing quantiles before _sum and _count metrics
for summaries.
@makasim
Copy link
Copy Markdown
Member

makasim commented Apr 17, 2026

The fix is available in VictoriaMetrics since v1.140.0, v1.136.4, and v1.122.19.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Delivering no logs for a while leads to unsupported metric type for quantile metrics, causing Prometheus scrapes to fail

5 participants