You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TBD: I'll add more detailed a bit later: reproduction steps, our topology, configs, etc. For now it's just an umbrella issue for a few PRs.
We've already briefly discussed this with @MichaHoffmann, and found the root cause.
Thanos, Prometheus and Golang version used:
Thanos: v0.39.2 (an internal fork with a few patches. Mostly irrelevant to query component).
Object Storage Provider:
What happened:
I've noticed a significant performance degradation when running a global Thanos querier in "distributed" mode (--query.mode=distributed) compared to "local" mode, in an environment with very large numbers of external label (~1-2 million).
A simple instant query like this (bellow), takes ~30ms in local mode vs ~2-3s in distributed mode:
sum by (cluster, job) (up{
cluster="prod",
})
What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Important
TBD: I'll add more detailed a bit later: reproduction steps, our topology, configs, etc. For now it's just an umbrella issue for a few PRs.
We've already briefly discussed this with @MichaHoffmann, and found the root cause.
Thanos, Prometheus and Golang version used:
Thanos:
v0.39.2(an internal fork with a few patches. Mostly irrelevant toquerycomponent).Object Storage Provider:
What happened:
I've noticed a significant performance degradation when running a global Thanos querier in "distributed" mode (
--query.mode=distributed) compared to "local" mode, in an environment with very large numbers of external label (~1-2million).A simple instant query like this (bellow), takes
~30msin local mode vs~2-3sin distributed mode:What you expected to happen:
How to reproduce it (as minimally and precisely as possible):
Full logs to relevant components:
Anything else we need to know:
PRs:
remoteEndpointsto reuse computedMinT/MaxT/LabelSetsvalues acrossEngines()calls #8598query.remoteEndpoints.Engines()#8599 - this change dropped the latency (in our case) from ~3s down to ~40ms.RemoteEndpointsinterface to support remote engines pruning promql-engine#680remoteEndpointsfor remote engine pruning #8653CC: @MichaHoffmann, @SuperQ