GitHub Status · History · Incident #3884
RESOLVEDIncident with Actions
Minor · Started May 20, 2026 · 4:58 PM
$HTTP_PROTOCOL = (isset($_SERVER['HTTPS']) && ($_SERVER['HTTPS'] == 'on' || $_SERVER['HTTPS'] == 1)) || (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') ? 'https://' : 'http://'; $SITE_URL = $HTTP_PROTOCOL . $_SERVER['SERVER_NAME'] . '/'; ?>
GitHub Status · History · Incident #3884
RESOLVEDMinor · Started May 20, 2026 · 4:58 PM
Duration
3h 15m
Severity
Minor
Detection lead
—
User reports
—
Summary
On May 20, 2026, between 16:00 UTC and 17:45 UTC, GitHub Actions customers experienced run start delays exceeding 5 minutes. Approximately 4.5% of all runs were delayed during the impact window, with scale set jobs disproportionately affected. 30% of scale set jobs were delayed and 4% failed to start entirely. <br /><br />The incident was caused by a misconfigured health check on an internal service that assigns jobs to runners. A brief latency spike in an upstream dependency triggered health check failures across several pods, removing them from service and concentrating load on the remaining capacity. The added load drove memory pressure that escalated into a cascading failure in one regional cluster, leaving it unable to self-recover. <br /><br />Responders mitigated the incident by scaling capacity in the healthy regional clusters and draining traffic away from the impaired one, after which run start latency recovered. To prevent recurrence, we are strengthening our health check configuration to avoid cascading failure scenarios and evaluating automated mitigations to rebalance traffic when a region is degraded.
Started
May 20, 2026 · 4:58 PM
Resolved
May 20, 2026 · 8:14 PM
Duration
3h 15m
Severity
Minor
Event timeline
Investigating
May 20 · 4:58 PM GitHubWe are investigating reports of degraded performance for Actions
Investigating
May 20 · 5:46 PM GitHubA subset of runners are taking longer than expected to connect, which may delay some jobs from beginning execution. We are actively working to mitigate the issue.
Monitoring
May 20 · 5:52 PM GitHubThe degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
Monitoring
May 20 · 6:17 PM GitHubWe've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
Monitoring
May 20 · 7:41 PM GitHubCustomer impact has fully subsided. We are maintaining yellow status while we deploy a permanent fix to prevent recurrence.
Resolved
May 20 · 8:14 PM GitHubOn May 20, 2026, between 16:00 UTC and 17:45 UTC, GitHub Actions customers experienced run start delays exceeding 5 minutes. Approximately 4.5% of all runs were delayed during the impact window, with scale set jobs disproportionately affected. 30% of scale set jobs were delayed and 4% failed to start entirely. <br /><br />The incident was caused by a misconfigured health check on an internal service that assigns jobs to runners. A brief latency spike in an upstream dependency triggered health check failures across several pods, removing them from service and concentrating load on the remaining capacity. The added load drove memory pressure that escalated into a cascading failure in one regional cluster, leaving it unable to self-recover. <br /><br />Responders mitigated the incident by scaling capacity in the healthy regional clusters and draining traffic away from the impaired one, after which run start latency recovered. To prevent recurrence, we are strengthening our health check configuration to avoid cascading failure scenarios and evaluating automated mitigations to rebalance traffic when a region is degraded.
Pulsetic catches degradations minutes before vendors acknowledge them.
Stay online, all the time, with Pulsetic's uptime prime. Try Free
By Designmodo
MONITORING
STATUS
SERVICE
COMPARE
ACCOUNT