What Is Latency?
Reviewed by Ionut Caval · Updated June 2026
Latency is the delay between a request being sent and the response beginning to arrive, usually measured in milliseconds. Because a few slow requests can hide behind a fast average, latency is best reported at percentiles such as p95 and p99 rather than as a mean.
Latency measures time spent waiting, not time spent transferring data (that is throughput or bandwidth). For a web request it covers the round trip across the network plus the time the server takes to start responding: DNS lookup, the TCP and TLS handshakes, the request travelling to the server, the server processing it, and the first byte of the response coming back. The moment that first byte arrives is captured by a closely related metric, TTFB (Time to First Byte), which is essentially the latency of the initial response as a browser experiences it.
Why latency is reported at percentiles
An average latency is misleading because response times are rarely symmetric: most requests are fast, but a long tail of slow ones drags real-world experience down. Percentiles describe that tail directly. A p95 of 400 ms means 95% of requests completed in 400 ms or less and the slowest 5% took longer; p99 captures the slowest 1%. Consider 100 requests where 99 return in 100 ms and one takes 5,000 ms: the mean is about 149 ms, which looks healthy, yet the p99 is 5,000 ms, which is the experience your unluckiest users actually get. This is why teams quote p95 and p99, and why the field Core Web Vitals are assessed at the 75th percentile (Google's "good" thresholds are LCP at or under 2.5 s, INP at or under 200 ms, and CLS at or under 0.1).
Latency as a monitored metric
Latency is one of the most common service level indicators. A latency SLI is usually expressed as the share of requests served under a threshold (for example, the percentage answered in under 300 ms), and many teams treat a response that exceeds its latency budget as a failure that counts against uptime, because a request slow enough to time out is effectively unavailable. There are two complementary ways to watch it:
- Synthetic checks run scripted requests from outside your infrastructure on a schedule, measuring latency consistently from fixed locations and alerting when it breaches a threshold.
- Real user monitoring records latency from genuine visitor sessions, capturing the device, network, and geographic variation a single test machine never sees.
The two approaches answer different questions, and the post synthetic monitoring vs. real user monitoring compares them in depth. Pulsetic's uptime and API monitoring records response-time latency on every check from multiple regions, so a slow endpoint surfaces as a trend at the percentiles that matter rather than being averaged away.
See also: API & uptime monitoring
Frequently asked questions
-
What is the difference between latency and bandwidth?
Latency is the delay before data starts arriving, measured in milliseconds; bandwidth (or throughput) is how much data can move per second once it is flowing. A connection can have high bandwidth and still feel slow if latency is high, because every request waits before any data transfers.
-
What are p95 and p99 latency?
They are percentiles of the response-time distribution. A p95 of 400 ms means 95% of requests finished in 400 ms or less and the slowest 5% took longer; p99 describes the slowest 1%. Percentiles are used instead of an average because a small number of very slow requests can hide behind a healthy-looking mean.
-
What is a good latency for a website or API?
It depends on the use case, but a common goal is a p95 server response under a few hundred milliseconds. For user-facing experience, Google's Core Web Vitals are a useful reference: at the 75th percentile, LCP should be 2.5 seconds or less and INP 200 ms or less. The right target is whatever keeps your slowest realistic users inside an acceptable experience.
-
Is latency the same as TTFB?
They are closely related but not identical. TTFB (Time to First Byte) measures the time from a browser issuing a request to the first byte of the response arriving, so it is the latency of the initial response as the client sees it. General latency can also refer to round-trip network delay or the response time of a single API call, without the full page-load context TTFB implies.
-
Put these metrics to work. Monitor your site free.
2-minute setup · Cancel any time
-
No credit card needed