Linode Status · History · Incident #4905
RESOLVEDEmerging Service Issue - GPU Instances - All Regions
Minor · Started Jun 24, 2026 · 6:53 PM
$HTTP_PROTOCOL = (isset($_SERVER['HTTPS']) && ($_SERVER['HTTPS'] == 'on' || $_SERVER['HTTPS'] == 1)) || (isset($_SERVER['HTTP_X_FORWARDED_PROTO']) && $_SERVER['HTTP_X_FORWARDED_PROTO'] == 'https') ? 'https://' : 'http://'; $SITE_URL = $HTTP_PROTOCOL . $_SERVER['SERVER_NAME'] . '/'; ?>
Linode Status · History · Incident #4905
RESOLVEDMinor · Started Jun 24, 2026 · 6:53 PM
Duration
1d 36m
Severity
Minor
Detection lead
—
User reports
—
Summary
On June 23, 2026, following a global software deployment, we identified an issue causing intermittent boot failures specifically for GPU Linodes. The impact was limited to instances where multiple GPU Linodes attempted to boot simultaneously. During the impact window customers could have experienced localized disruption or elevated error rates, particularly during automated node recycling or scaling events. During the investigation it was found that a recent software update created a conflict when multiple GPU servers tried to start up at the exact same time. The servers essentially blocked one another from loading, and our system did not automatically trigger a retry. This specific interaction only happens under heavy, simultaneous workloads, which is why it wasn't caught during our standard pre-release testing. In order to mitigate the issue, we successfully deployed a hotfix directly to all active GPU hosts across our fleet at 20:32 UTC on June 24th, 2026. The affected systems are currently operating normally as expected. In order to prevent this issue from happening in the future, we have developed and integrated a comprehensive fix into our upcoming software release scheduled to roll out globally over the next week. Additionally, we are actively prioritizing the procurement of dedicated GPU testing hardware for our development cloud to improve test coverage and ensure concurrent hardware workloads are fully simulated before future updates reach production. This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.
Started
Jun 24, 2026 · 6:53 PM
Resolved
Jun 25, 2026 · 7:29 PM
Duration
1d 36m
Severity
None
Event timeline
Investigating
Jun 24 · 6:53 PM LinodeOur team is investigating an emerging service issue that is causing intermittent boot failures on GPU Instances in all Regions. We will share additional updates as we have more information.
Investigating
Jun 24 · 8:00 PM LinodeWe continue to investigate the issue that is causing intermittent boot failures on GPU Instances in all Regions. We will share additional updates as we have more information.
Monitoring
Jun 24 · 9:12 PM LinodeAt 20:32 UTC on June 24th, 2026 we have been able to correct the issue that is causing intermittent boot failures on GPU Instances in all Regions. We will be monitoring this to ensure that it remains stable. If you continue to experience problems, please <a href="https://cloud.linode.com/support/tickets">open a Support ticket</a> for assistance.
Resolved
Jun 25 · 7:29 PM LinodeWe haven’t observed any additional boot failures on GPU Instances in all regions, and will now consider this incident resolved. If you continue to experience problems, please <a href="https://cloud.linode.com/support/tickets">open a Support ticket</a> for assistance.
Postmortem
Jun 26 · 6:14 PM LinodeOn June 23, 2026, following a global software deployment, we identified an issue causing intermittent boot failures specifically for GPU Linodes. The impact was limited to instances where multiple GPU Linodes attempted to boot simultaneously. During the impact window customers could have experienced localized disruption or elevated error rates, particularly during automated node recycling or scaling events. During the investigation it was found that a recent software update created a conflict when multiple GPU servers tried to start up at the exact same time. The servers essentially blocked one another from loading, and our system did not automatically trigger a retry. This specific interaction only happens under heavy, simultaneous workloads, which is why it wasn't caught during our standard pre-release testing. In order to mitigate the issue, we successfully deployed a hotfix directly to all active GPU hosts across our fleet at 20:32 UTC on June 24th, 2026. The affected systems are currently operating normally as expected. In order to prevent this issue from happening in the future, we have developed and integrated a comprehensive fix into our upcoming software release scheduled to roll out globally over the next week. Additionally, we are actively prioritizing the procurement of dedicated GPU testing hardware for our development cloud to improve test coverage and ensure concurrent hardware workloads are fully simulated before future updates reach production. This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.
Pulsetic catches degradations minutes before vendors acknowledge them.
Stay online, all the time, with Pulsetic's uptime prime.
By Designmodo
Designmodo Inc. 169 Madison Ave, #79627, New York, NY 10016, United States
Copyright © 2010-2026