Upstash Status · History · Incident #2151
RESOLVEDQStash US Region Service Disruption
Critical · Started May 8, 2026 · 9:46 AM
$HTTP_PROTOCOL = (isset($_SERVER['HTTPS']) && ($_SERVER['HTTPS'] == 'on' || $_SERVER['HTTPS'] == 1)) || (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') ? 'https://' : 'http://'; $SITE_URL = $HTTP_PROTOCOL . $_SERVER['SERVER_NAME'] . '/'; ?>
Upstash Status · History · Incident #2151
RESOLVEDCritical · Started May 8, 2026 · 9:46 AM
Duration
22m
Severity
Critical
Detection lead
—
User reports
—
Summary
**Root Cause Analysis** On **April 24**, we deployed a more optimized scheduler implementation in the **US East \(N. Virginia\)** region. On **May 8**, a user who had active schedules deleted their account. Under normal behavior, scheduled tasks associated with a deleted account should wake up, detect that the account no longer exists, and exit after performing cleanup. Due to a bug introduced in the new scheduler implementation, this code path did not return early as intended. Execution continued and resulted in a nil pointer dereference. A second issue then amplified the impact. When a panic occurs in the scheduler, it is designed to be recovered, logged, and isolated so that the process remains healthy. Because of another bug in the panic recovery path, the panic was not properly caught, which caused the worker process handling the scheduled job to terminate. After that process exited, another worker picked up responsibility for delivering the same scheduled task. Since the same faulty execution path was still present, that worker also failed. This created a cascading failure pattern across workers attempting to process the affected schedules. **Resolution** We deployed two fixes: * Added the missing early return in the deleted-account cleanup path, preventing the nil pointer dereference. * Corrected the panic recovery logic so that future panics are safely recovered, logged, and reported without causing worker processes to terminate. With these changes in place, the affected execution path is now safe. Even if a future bug triggers a panic in this area, it will be isolated and reported rather than causing process-level failure.
Started
May 8, 2026 · 9:46 AM
Resolved
May 8, 2026 · 10:08 AM
Duration
22m
Severity
Critical
Event timeline
Investigating
May 8 · 9:46 AM UpstashWe are currently investigating the issue.
Investigating
May 8 · 9:46 AM UpstashWe are continuing to investigate this issue.
Monitoring
May 8 · 10:06 AM UpstashA fix has been implemented and we are monitoring the results.
Resolved
May 8 · 10:08 AM UpstashThis incident has been resolved, we will publish RCA soon.
Postmortem
May 12 · 8:25 AM Upstash**Root Cause Analysis** On **April 24**, we deployed a more optimized scheduler implementation in the **US East \(N. Virginia\)** region. On **May 8**, a user who had active schedules deleted their account. Under normal behavior, scheduled tasks associated with a deleted account should wake up, detect that the account no longer exists, and exit after performing cleanup. Due to a bug introduced in the new scheduler implementation, this code path did not return early as intended. Execution continued and resulted in a nil pointer dereference. A second issue then amplified the impact. When a panic occurs in the scheduler, it is designed to be recovered, logged, and isolated so that the process remains healthy. Because of another bug in the panic recovery path, the panic was not properly caught, which caused the worker process handling the scheduled job to terminate. After that process exited, another worker picked up responsibility for delivering the same scheduled task. Since the same faulty execution path was still present, that worker also failed. This created a cascading failure pattern across workers attempting to process the affected schedules. **Resolution** We deployed two fixes: * Added the missing early return in the deleted-account cleanup path, preventing the nil pointer dereference. * Corrected the panic recovery logic so that future panics are safely recovered, logged, and reported without causing worker processes to terminate. With these changes in place, the affected execution path is now safe. Even if a future bug triggers a panic in this area, it will be isolated and reported rather than causing process-level failure.
Pattern
Pulsetic catches degradations minutes before vendors acknowledge them.
Stay online, all the time, with Pulsetic's uptime prime. Try Free
By Designmodo
MONITORING
STATUS
SERVICE
COMPARE
ACCOUNT