Upstash Status · History · Incident #1894

RESOLVED

QStash US Region: Schedule Degradation

Minor · Started May 5, 2026 · 2:24 PM

  • Duration

    17h 40m

  • Severity

    Minor

  • Detection lead

  • User reports

Summary

QStash US Region: Schedule Degradation

# **Incident Postmortem: Scheduled Jobs Inconsistency in US Region** On May 1, 2026, we experienced an incident affecting a subset of schedules in the US region following a recent infrastructure update. The issue has been resolved, and all affected schedules have been restored. ## **Summary** As part of an ongoing scalability improvement, we recently updated scheduling infrastructure in the US region to a new architecture. During this transition, a legacy execution path remained in the codebase as a fallback mechanism. On May 1, a bug caused the system to revert to the legacy path. This resulted in inconsistent state between the old and new scheduling systems for some users. ## **Impact** The incident affected a limited number of users in the US region. **Most users were not affected**, and the vast majority of schedules continued operating normally throughout the incident. Users who did not update schedules during the transition window continued operating normally throughout the incident. **A subset of users who created, edited, paused, or deleted schedules between April 24 and May 1 may have experienced one or more of the following:** * Schedule updates not being reflected * Paused schedules becoming active again * Deleted schedules reappearing * Newly created schedules not executing as expected Schedules created after the transition may have stopped executing briefly before recovery. ## **Root Cause** During the transition, the new scheduling infrastructure became the source of truth for schedule state. Due to a bug, the system unexpectedly reverted traffic to the legacy scheduling path, which began accepting updates independently from the new system. This caused the two systems to diverge and resulted in inconsistent schedule state for affected users. ## **Resolution** After identifying the issue, we: 1. Restored the new scheduling system as the active source of truth 2. Reconciled data between the legacy and new systems 3. Updated missing schedule changes back into the new infrastructure 4. Performed conflict resolution to preserve user data and schedule continuity In some cases, schedules that had previously been paused or deleted were restored to avoid permanent data loss. ## **Preventive Measures** We are implementing several changes to prevent similar incidents: * Removing obsolete fallback execution paths after transitions complete * Adding automated safeguards and alerts for unexpected system fallback behavior * Improving consistency validation between systems * Expanding rollback and reconciliation testing We apologize for the disruption and appreciate everyone’s patience while we resolved the issue.


  • Started

    May 5, 2026 · 2:24 PM

  • Resolved

    May 6, 2026 · 8:05 AM

  • Duration

    17h 40m

  • Severity

    Minor

Event timeline

How this incident unfolded

  • Identified

    May 5 · 2:24 PM Upstash

    We are currently experiencing issues in the US region. - Duplicate Deliveries: During this period, some scheduled jobs may be executed twice. - Schedule Disruption: Schedules created between April 24, 2026 and May 2, 2026 are currently not running. Our team is actively working on a fix. Once the migration is complete, affected schedules will resume normal operation. We will provide updates as progress continues.

  • Identified

    May 5 · 2:29 PM Upstash

    We are continuing to work on a fix for this issue.

  • Monitoring

    May 5 · 3:44 PM Upstash

    A fix has been implemented and we are monitoring the results.

  • Monitoring

    May 5 · 9:31 PM Upstash

    Main schedule functionality is back to normal. We are currently checking if previously created schedules are delivered as expected before marking the incident as resolved.

  • Resolved

    May 6 · 8:05 AM Upstash

    This incident has been resolved.

  • Postmortem

    May 6 · 8:55 AM Upstash

    # **Incident Postmortem: Scheduled Jobs Inconsistency in US Region** On May 1, 2026, we experienced an incident affecting a subset of schedules in the US region following a recent infrastructure update. The issue has been resolved, and all affected schedules have been restored. ## **Summary** As part of an ongoing scalability improvement, we recently updated scheduling infrastructure in the US region to a new architecture. During this transition, a legacy execution path remained in the codebase as a fallback mechanism. On May 1, a bug caused the system to revert to the legacy path. This resulted in inconsistent state between the old and new scheduling systems for some users. ## **Impact** The incident affected a limited number of users in the US region. **Most users were not affected**, and the vast majority of schedules continued operating normally throughout the incident. Users who did not update schedules during the transition window continued operating normally throughout the incident. **A subset of users who created, edited, paused, or deleted schedules between April 24 and May 1 may have experienced one or more of the following:** * Schedule updates not being reflected * Paused schedules becoming active again * Deleted schedules reappearing * Newly created schedules not executing as expected Schedules created after the transition may have stopped executing briefly before recovery. ## **Root Cause** During the transition, the new scheduling infrastructure became the source of truth for schedule state. Due to a bug, the system unexpectedly reverted traffic to the legacy scheduling path, which began accepting updates independently from the new system. This caused the two systems to diverge and resulted in inconsistent schedule state for affected users. ## **Resolution** After identifying the issue, we: 1. Restored the new scheduling system as the active source of truth 2. Reconciled data between the legacy and new systems 3. Updated missing schedule changes back into the new infrastructure 4. Performed conflict resolution to preserve user data and schedule continuity In some cases, schedules that had previously been paused or deleted were restored to avoid permanent data loss. ## **Preventive Measures** We are implementing several changes to prevent similar incidents: * Removing obsolete fallback execution paths after transitions complete * Adding automated safeguards and alerts for unexpected system fallback behavior * Improving consistency validation between systems * Expanding rollback and reconciliation testing We apologize for the disruption and appreciate everyone’s patience while we resolved the issue.

Get alerted before the next Upstash outage.

Pulsetic catches degradations minutes before vendors acknowledge them.

Start monitoring free
Hey there 👋  Friends from designmodo are here to help!