This guide provides information about error budget management in the ML Systems Evaluation Framework, including definitions, calculation, monitoring, and best practices.
Error budgets define the acceptable amount of time a system can be unavailable or not meeting Service Level Objectives (SLOs). They help teams make informed decisions about deployments, changes, and risk-taking.
An error budget is the inverse of your SLO target. For example:
- If your SLO is 99.9% availability, your error budget is 0.1%
- If your SLO is 95% accuracy, your error budget is 5%
# Error budget calculation
# Always inferred from SLO target
error_budget = 1 - slo_targetNote: You do not specify error_budget in your configuration. It is always inferred from the SLO target.
slos:
availability:
target: 0.9999
window: "30d"
description: "System availability"
accuracy:
target: 0.95
window: "30d"
description: "Model accuracy"The framework automatically tracks error budget consumption, burn rate, and remaining budget for each SLO based on the target value.
You can configure alerts based on error budget consumption, burn rate, or remaining budget, but you do not specify error_budget in SLO config.
- Conservative: Deploy only if a large portion of the error budget remains.
- Aggressive: Deploy with a smaller remaining error budget for faster iteration.
- Balanced: Balance stability and innovation based on error budget consumption.
Error budgets are always calculated as 1.0 - target for each SLO, regardless of industry.
The framework provides dashboards and reports showing error budget consumption, burn rate, and remaining budget for each SLO.
- Set realistic SLO targets based on historical performance
- Monitor error budget consumption and burn rate
- Use error budget status to inform deployment and risk decisions
- Communicate error budget status to stakeholders
- Automate alerting and reporting based on error budget status
slos:
availability:
target: 0.999
window: "30d"
description: "System availability"
accuracy:
target: 0.95
window: "30d"
description: "Model accuracy"The error budget for each SLO is always calculated as 1.0 - target (e.g., 0.001 for 99.9% availability).