What Is Alerting?

Alerting is the automatic process of notifying the right people the moment a monitored check fails or a metric crosses a defined threshold, so problems are caught and routed to an on-call responder before users start reporting them.

Alerting sits between detection and response. A monitoring system runs checks on a schedule (every 30 to 60 seconds is common for HTTP health checks), evaluates each result, and decides whether the change is worth waking someone for. The decision logic, the channels, and the routing rules together are what make alerting useful instead of noise.

How alerts are triggered

Two trigger models dominate. Threshold alerts fire when a value crosses a fixed line: status code is not 200, response time exceeds 2000 ms, or certificate expiry is under 14 days. Anomaly or baseline alerts compare current behavior to a learned normal, so a checkout flow that usually returns in 300 ms triggers when it drifts to 900 ms even though no hard limit was set. Threshold alerts are predictable and easy to audit; baseline alerts catch slow degradations that a static number misses.

Most teams route alerts across several channels by urgency:

  • Email for low-severity or informational alerts that do not need an immediate response.
  • Slack or Microsoft Teams for team-visible warnings and acknowledgments.
  • SMS and phone call for high-severity, page-the-human events at 3 a.m.
  • Webhooks to push alerts into ticketing, ChatOps bots, or custom automation.
  • On-call tools (PagerDuty, Opsgenie) that apply rotation and escalation policy logic before notifying a person.

Suppressing false positives and alert fatigue

A single failed check is weak evidence. A monitor in one region can fail because of a local network blip, not a real outage. Confirming the failure from a second or third geographic location, or retrying after a short delay, removes most transient noise before an alert is sent. Many platforms require 2 to 3 consecutive failed checks, which trades a few extra seconds of detection time for far fewer false pages.

When alerts fire too often or carry too little signal, responders start ignoring them, and the one real incident gets lost in the stream. This is alert fatigue. The fixes are structural: deduplicate repeated alerts for the same root cause into one notification, group related alerts so a region-wide failure is one page instead of forty, and assign clear severity tiers so a SEV-1 reaches a phone while a SEV-4 lands quietly in email. Pulsetic supports multi-location confirmation, retries, and per-channel routing so that what reaches on-call engineers is worth their attention.

See also: Monitoring for DevOps teams

Frequently asked questions

  • What is the difference between alerting and monitoring?

    Monitoring is the continuous collection and evaluation of data, such as running an HTTP check every 30 seconds. Alerting is the layer that decides when a result is worth notifying a human and which channel to use. You can monitor without alerting, but the value of a check that fails at 3 a.m. is near zero if no one is paged.

  • How do I reduce false positive alerts?

    Confirm each failure from multiple geographic locations or retry after a short delay before firing, so a single regional network blip does not trigger a page. Requiring 2 to 3 consecutive failed checks eliminates most transient noise. Tuning thresholds to real baselines, rather than guessing, also cuts false alarms significantly.

  • What causes alert fatigue and how is it fixed?

    Alert fatigue happens when responders receive so many low-value alerts that they begin ignoring all of them, including the critical one. It is fixed structurally: deduplicate repeated alerts for the same cause, group related alerts into a single notification, and use severity tiers so only high-severity events trigger SMS or phone calls. Good routing keeps a noisy SEV-4 out of the same channel as a SEV-1 page.

  • Which alert channel should I use for critical outages?

    Use a channel that demands immediate attention, such as SMS, a phone call, or an on-call tool like PagerDuty that enforces escalation. Reserve email and Slack for warnings and informational alerts that do not need a response within minutes. A common setup routes anything above a 14-day SSL expiry warning or a full-site outage straight to phone and the on-call rotation.