Uptime & monitoring glossary
Plain-English definitions of the uptime, SLA and monitoring terms that actually matter.
$HTTP_PROTOCOL = (isset($_SERVER['HTTPS']) && ($_SERVER['HTTPS'] == 'on' || $_SERVER['HTTPS'] == 1)) || (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') ? 'https://' : 'http://'; $SITE_URL = $HTTP_PROTOCOL . $_SERVER['SERVER_NAME'] . '/'; ?>
Plain-English definitions of the uptime, SLA and monitoring terms that actually matter.
Alerting is the automatic process of notifying the right people the moment a monitored check fails or a metric crosses a defined threshold, so problems are caught and routed to an on-call responder before users start reporting them.
API monitoring is the continuous, automated checking of REST and GraphQL endpoints for availability, correctness, and speed.
Availability is the share of time a system is operational and usable, expressed as a percentage over a defined window.
CLS (Cumulative Layout Shift) is the Core Web Vital that measures visual stability: how much a page's content unexpectedly moves while it loads.
Core Web Vitals are Google's three field metrics for real-world page experience: LCP (loading), INP (responsiveness) and CLS (visual stability).
DNS monitoring is the continuous, external checking that a domain's DNS records resolve correctly, return the expected values, and respond fast, catching resolution failures, slow lookups, and unexpected record changes before they take an otherwise healthy site offline.
Domain expiration monitoring is the practice of tracking a domain name's registration expiry date through WHOIS or RDAP so the domain is renewed before it lapses.
Downtime is any period during which a website or service is unavailable or not working correctly for its users.
An error budget is the amount of unreliability a Service Level Objective (SLO) permits, equal to 100% minus the SLO target.
An escalation policy is a predefined set of rules that decides who is notified about an alert and how it advances to the next person or tier if no one acknowledges it within a set time.
A health check is an automated probe of an endpoint that reports whether a service is up and working correctly.
Heartbeat monitoring watches for a regular signal (a "heartbeat") sent by a job or service and raises an alert when that signal does not arrive on time.
An incident is an unplanned event that disrupts or degrades a service below its expected level, from a full outage to slow or partial failures.
Incident management is the structured process of detecting, triaging, responding to, resolving, and learning from any unplanned event that disrupts a service or degrades its quality.
Incident severity is a classification scheme that ranks an incident by its business impact, so teams know how urgently to respond, who to wake up, and how widely to communicate.
INP (Interaction to Next Paint) is the Core Web Vital that measures how quickly a page responds to user interactions, recording the delay between an action such as a tap or click and the next frame the browser paints in response.
Latency is the delay between a request being sent and the response beginning to arrive, usually measured in milliseconds.
LCP (Largest Contentful Paint) is the Core Web Vital that measures loading performance: the time from when a page starts loading until its largest visible element (usually a hero image, video, or heading block) is rendered.
MTBF (Mean Time Between Failures) is the average amount of time a system operates normally between one failure and the next.
MTTA (mean time to acknowledge) is the average time between an alert firing and a responder confirming they have seen it and are taking ownership.
MTTR (Mean Time to Recovery, sometimes Repair) is the average time it takes to restore a service after a failure, measured from the moment the failure begins to the moment normal operation returns.
On-call is the practice of designating engineers, on a rotating schedule, to respond to alerts and incidents at any hour, including nights and weekends.
Page load time is the total elapsed time for a web page to download, render, and become usable, measured from navigation start to the point the page is fully painted and interactive.
Ping monitoring sends ICMP echo requests to a host and waits for echo replies to confirm reachability and measure response speed.
A postmortem is a structured, blameless retrospective written after an incident to record what happened, how long it lasted, why it happened, and the concrete follow-up actions that will stop it recurring.
Real user monitoring (RUM) measures the experience of actual visitors by collecting performance data from their real browsing sessions.
Response time is the elapsed interval from the moment a client sends a request until it receives the response, usually measured for a single request.
A Service Level Agreement (SLA) is a formal, usually contractual commitment between a provider and its customers that defines the expected level of service, such as 99.9% uptime, and the remedies (like service credits) that apply if that level is not met.
A Service Level Indicator (SLI) is the specific, measured metric that quantifies a service level, such as the percentage of successful requests, the uptime percentage, or the share of responses served under a latency threshold.
A Service Level Objective (SLO) is the internal target a team sets for a particular service level, such as 99.95% uptime or a maximum response time, measured over a defined time window.
SSL certificate monitoring is the automated, continuous checking that a site's TLS certificate is valid, trusted, hostname-matched, and not close to expiry, with alerts sent days before an expired certificate can trigger downtime.
A status page is a public web page that shows the current operational state of a service and communicates outages, incidents and scheduled maintenance to users.
Synthetic monitoring runs automated, scripted checks against a website or API from outside your infrastructure on a fixed schedule.
TTFB (Time to First Byte) is the time between a browser sending an HTTP request and the first byte of the server's response arriving.
Uptime is the percentage of time a website or service is available and working as expected, measured over a defined period such as a month or a year.
2-minute setup · Cancel any time
No credit card needed
Stay online, all the time, with Pulsetic's uptime prime.
By Designmodo
Designmodo Inc. 169 Madison Ave, #79627, New York, NY 10016, United States
Copyright © 2010-2026