What Is an Error Budget?
Reviewed by Ionut Caval · Updated June 2026
An error budget is the amount of unreliability a Service Level Objective (SLO) permits, equal to 100% minus the SLO target. It is a spendable resource: a 99.9% availability SLO leaves a 0.1% budget, about 43 minutes of downtime in a 30-day month, that teams draw down with each failure and that gates how fast they can ship.
An error budget is the practical, day-to-day expression of a Service Level Objective (SLO). If the SLO is the target you steer toward, the error budget is the room you have to miss it before the SLO is breached. The arithmetic is simple: error budget = 100% minus the SLO target. A 99.9% availability SLO leaves a 0.1% budget, which over a 30-day month is roughly 43 minutes of allowed downtime. Tighten the SLO to 99.95% and the budget shrinks to 0.05%, about 22 minutes. Loosen it to 99% and the budget grows to about 7 hours 12 minutes. The stricter the target, the smaller the budget, and the less margin a bad week leaves you.
How error budgets gate shipping
The point of an error budget is that it makes reliability a shared, spendable resource rather than an argument. Every failed deploy, every uptime dip, and every slow-recovery incident draws the budget down; a clean period lets it refill as the measurement window rolls forward. Teams typically wire it into release policy:
- Budget healthy: ship at full speed and take more risk, since there is room to absorb a small failure.
- Budget low: slow releases, add review gates, and prioritise reliability work until the budget recovers.
- Budget exhausted: freeze non-essential changes; only fixes that improve reliability go out until the window rolls forward.
- Budget never touched: the SLO may be too strict, leaving speed on the table, so the target is worth revisiting.
Error budgets are not limited to availability. Any SLI can carry one. A latency SLO of "99% of responses under 300 ms" leaves a 1% budget for slow responses. A Core Web Vitals SLO might require LCP at or under 2.5s, INP at or under 200ms, and CLS at or under 0.1 at the 75th percentile of real visits, with the budget being the share of page views allowed to fall outside those thresholds. Tracking that share is exactly what real user monitoring measures from actual sessions.
Budget, recovery, and the customer promise
How fast you burn the budget depends on both how often failures happen and how long they last. The relationship is captured by Availability = MTBF / (MTBF + MTTR), so a slow recovery from a single incident can drain a tight budget in one outage. Because the budget is the gap between your SLO and 100%, teams deliberately set the internal SLO stricter than any customer-facing SLA: the difference is a second buffer that lets you spend the error budget without breaching the contract. Measuring availability continuously from outside your infrastructure is what keeps the count honest. Uptime and SLA monitoring records the data behind the budget, and what 99.9% uptime actually means breaks each "nine" down in full.
See also: Uptime & SLA monitoring
Frequently asked questions
-
How do you calculate an error budget?
Subtract the SLO target from 100%. A 99.9% SLO leaves a 0.1% budget; a 99.95% SLO leaves 0.05%. To turn that into time, multiply by the length of the measurement window: 0.1% of a 30-day month is roughly 43 minutes of allowed downtime, and 0.05% is about 22 minutes.
-
What does it mean to "burn" an error budget?
Burning the budget means consuming the allowed unreliability. Every minute of downtime, failed request, or out-of-threshold page view draws it down within the current window. When the budget is healthy, teams ship faster; when it is nearly spent, they slow or freeze releases and focus on reliability until the rolling window refills it.
-
What is the difference between an error budget and an SLO?
The SLO is the reliability target, such as 99.95% availability. The error budget is the inverse, the 0.05% of failure that target permits. They are two views of the same line: the SLO is what you aim to stay above, and the error budget is how far below you can fall before you miss it.
-
Can you have an error budget for performance, not just uptime?
Yes. Any SLI can carry a budget. A latency SLO of 99% of responses under 300 ms leaves a 1% budget for slow responses, and a Core Web Vitals SLO (LCP at or under 2.5s, INP at or under 200ms, CLS at or under 0.1 at the 75th percentile) gives a budget for the share of real page views allowed to fall outside those thresholds.
-
Put these metrics to work. Monitor your site free.
2-minute setup · Cancel any time
-
No credit card needed