Incident with Copilot

Minor · Started May 19, 2026 · 5:30 AM

Duration

< 1m
Severity

Minor
Detection lead

—
User reports

—

Summary

Incident with Copilot

On May 19, 2026, between 05:30 UTC and 14:50 UTC, some Copilot users experienced failures when using code completions, chat sessions, and cloud agent sessions. At peak impact, approximately 13% of Copilot API requests failed, and approximately 24% of remote sessions failed to initialize. A partial mitigation at 08:16 UTC reduced the Copilot API error rate to approximately 0.3%, but intermittent failures persisted until a full fix was deployed at 14:15 UTC and recovery was verified by 14:50 UTC. The incident was caused by rate limits being exceeded on a shared infrastructure component. A recently enabled feature increased call volume to this component, and the combined load exceeded capacity limits as traffic increased during business hours. We mitigated the incident by deploying a caching layer to reduce load on shared infrastructure. To prevent recurrence, we are separating rate limit scopes between services, adding monitoring for internal dependency rate limiting, and reducing redundant calls.

Started

May 19, 2026 · 5:30 AM
Resolved

May 19, 2026 · 5:30 AM
Duration

< 1m
Severity

None

Event timeline

How this incident unfolded

✓

Resolved
May 28 · 12:40 PM GitHub

On May 19, 2026, between 05:30 UTC and 14:50 UTC, some Copilot users experienced failures when using code completions, chat sessions, and cloud agent sessions. At peak impact, approximately 13% of Copilot API requests failed, and approximately 24% of remote sessions failed to initialize. A partial mitigation at 08:16 UTC reduced the Copilot API error rate to approximately 0.3%, but intermittent failures persisted until a full fix was deployed at 14:15 UTC and recovery was verified by 14:50 UTC. The incident was caused by rate limits being exceeded on a shared infrastructure component. A recently enabled feature increased call volume to this component, and the combined load exceeded capacity limits as traffic increased during business hours. We mitigated the incident by deploying a caching layer to reduce load on shared infrastructure. To prevent recurrence, we are separating rate limit scopes between services, adding monitoring for internal dependency rate limiting, and reducing redundant calls.

Pattern

Other recent incidents

Browse full incident history →

Get alerted before the next GitHub outage.

Pulsetic catches degradations minutes before vendors acknowledge them.

Start monitoring free

Stay online, all the time, with Pulsetic's uptime prime. Try Free

By Designmodo
MONITORING

SSL Monitoring

Ping Monitoring

Port Monitoring

TCP Monitoring

Keyword Monitoring

Cron Monitoring

Domain Monitoring
STATUS

ChatGPT

Claude

Cloudflare

GitHub

Shopify

Slack

Stripe

Supabase
SERVICE

Pricing

Status Pages

Status Badges

Is Website Down?

Privacy

Terms

DPA
COMPARE

Statuspage Atlassian

Better Stack

Uptimerobot

StatusCake

Freshping

Pingdom
ACCOUNT

Login

Help

Support

Lost Password

Affiliate Program