Upstash Status · History · Incident #2151

RESOLVED

QStash US Region Service Disruption

Critical · Started May 8, 2026 · 9:46 AM

  • Duration

    22m

  • Severity

    Critical

  • Detection lead

  • User reports

Summary

QStash US Region Service Disruption

**Root Cause Analysis** On **April 24**, we deployed a more optimized scheduler implementation in the **US East \(N. Virginia\)** region. On **May 8**, a user who had active schedules deleted their account. Under normal behavior, scheduled tasks associated with a deleted account should wake up, detect that the account no longer exists, and exit after performing cleanup. Due to a bug introduced in the new scheduler implementation, this code path did not return early as intended. Execution continued and resulted in a nil pointer dereference. A second issue then amplified the impact. When a panic occurs in the scheduler, it is designed to be recovered, logged, and isolated so that the process remains healthy. Because of another bug in the panic recovery path, the panic was not properly caught, which caused the worker process handling the scheduled job to terminate. After that process exited, another worker picked up responsibility for delivering the same scheduled task. Since the same faulty execution path was still present, that worker also failed. This created a cascading failure pattern across workers attempting to process the affected schedules. **Resolution** We deployed two fixes: * Added the missing early return in the deleted-account cleanup path, preventing the nil pointer dereference. * Corrected the panic recovery logic so that future panics are safely recovered, logged, and reported without causing worker processes to terminate. With these changes in place, the affected execution path is now safe. Even if a future bug triggers a panic in this area, it will be isolated and reported rather than causing process-level failure.


  • Started

    May 8, 2026 · 9:46 AM

  • Resolved

    May 8, 2026 · 10:08 AM

  • Duration

    22m

  • Severity

    Critical

Event timeline

How this incident unfolded

  • Investigating

    May 8 · 9:46 AM Upstash

    We are currently investigating the issue.

  • Investigating

    May 8 · 9:46 AM Upstash

    We are continuing to investigate this issue.

  • Monitoring

    May 8 · 10:06 AM Upstash

    A fix has been implemented and we are monitoring the results.

  • Resolved

    May 8 · 10:08 AM Upstash

    This incident has been resolved, we will publish RCA soon.

  • Postmortem

    May 12 · 8:25 AM Upstash

    **Root Cause Analysis** On **April 24**, we deployed a more optimized scheduler implementation in the **US East \(N. Virginia\)** region. On **May 8**, a user who had active schedules deleted their account. Under normal behavior, scheduled tasks associated with a deleted account should wake up, detect that the account no longer exists, and exit after performing cleanup. Due to a bug introduced in the new scheduler implementation, this code path did not return early as intended. Execution continued and resulted in a nil pointer dereference. A second issue then amplified the impact. When a panic occurs in the scheduler, it is designed to be recovered, logged, and isolated so that the process remains healthy. Because of another bug in the panic recovery path, the panic was not properly caught, which caused the worker process handling the scheduled job to terminate. After that process exited, another worker picked up responsibility for delivering the same scheduled task. Since the same faulty execution path was still present, that worker also failed. This created a cascading failure pattern across workers attempting to process the affected schedules. **Resolution** We deployed two fixes: * Added the missing early return in the deleted-account cleanup path, preventing the nil pointer dereference. * Corrected the panic recovery logic so that future panics are safely recovered, logged, and reported without causing worker processes to terminate. With these changes in place, the affected execution path is now safe. Even if a future bug triggers a panic in this area, it will be isolated and reported rather than causing process-level failure.

Get alerted before the next Upstash outage.

Pulsetic catches degradations minutes before vendors acknowledge them.

Start monitoring free
Hey there 👋  Friends from designmodo are here to help!