Emerging Service Issue - GPU Instances - All Regions

Minor · Started Jun 24, 2026 · 6:53 PM

Duration

1d 36m
Severity

Minor
Detection lead

—
User reports

—

Summary

Emerging Service Issue - GPU Instances - All Regions

On June 23, 2026, following a global software deployment, we identified an issue causing intermittent boot failures specifically for GPU Linodes. The impact was limited to instances where multiple GPU Linodes attempted to boot simultaneously. During the impact window customers could have experienced localized disruption or elevated error rates, particularly during automated node recycling or scaling events. During the investigation it was found that a recent software update created a conflict when multiple GPU servers tried to start up at the exact same time. The servers essentially blocked one another from loading, and our system did not automatically trigger a retry. This specific interaction only happens under heavy, simultaneous workloads, which is why it wasn't caught during our standard pre-release testing. In order to mitigate the issue, we successfully deployed a hotfix directly to all active GPU hosts across our fleet at 20:32 UTC on June 24th, 2026. The affected systems are currently operating normally as expected. In order to prevent this issue from happening in the future, we have developed and integrated a comprehensive fix into our upcoming software release scheduled to roll out globally over the next week. Additionally, we are actively prioritizing the procurement of dedicated GPU testing hardware for our development cloud to improve test coverage and ensure concurrent hardware workloads are fully simulated before future updates reach production. This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.

Started

Jun 24, 2026 · 6:53 PM
Resolved

Jun 25, 2026 · 7:29 PM
Duration

1d 36m
Severity

None

Event timeline

How this incident unfolded

◐

Investigating
Jun 24 · 6:53 PM Linode

Our team is investigating an emerging service issue that is causing intermittent boot failures on GPU Instances in all Regions. We will share additional updates as we have more information.
◐

Investigating
Jun 24 · 8:00 PM Linode

We continue to investigate the issue that is causing intermittent boot failures on GPU Instances in all Regions. We will share additional updates as we have more information.
◉

Monitoring
Jun 24 · 9:12 PM Linode

At 20:32 UTC on June 24th, 2026 we have been able to correct the issue that is causing intermittent boot failures on GPU Instances in all Regions. We will be monitoring this to ensure that it remains stable. If you continue to experience problems, please <a href="https://cloud.linode.com/support/tickets">open a Support ticket</a> for assistance.
✓

Resolved
Jun 25 · 7:29 PM Linode

We haven’t observed any additional boot failures on GPU Instances in all regions, and will now consider this incident resolved. If you continue to experience problems, please <a href="https://cloud.linode.com/support/tickets">open a Support ticket</a> for assistance.
✓

Postmortem
Jun 26 · 6:14 PM Linode

On June 23, 2026, following a global software deployment, we identified an issue causing intermittent boot failures specifically for GPU Linodes. The impact was limited to instances where multiple GPU Linodes attempted to boot simultaneously. During the impact window customers could have experienced localized disruption or elevated error rates, particularly during automated node recycling or scaling events. During the investigation it was found that a recent software update created a conflict when multiple GPU servers tried to start up at the exact same time. The servers essentially blocked one another from loading, and our system did not automatically trigger a retry. This specific interaction only happens under heavy, simultaneous workloads, which is why it wasn't caught during our standard pre-release testing. In order to mitigate the issue, we successfully deployed a hotfix directly to all active GPU hosts across our fleet at 20:32 UTC on June 24th, 2026. The affected systems are currently operating normally as expected. In order to prevent this issue from happening in the future, we have developed and integrated a comprehensive fix into our upcoming software release scheduled to roll out globally over the next week. Additionally, we are actively prioritizing the procurement of dedicated GPU testing hardware for our development cloud to improve test coverage and ensure concurrent hardware workloads are fully simulated before future updates reach production. This summary provides an overview of our current understanding of the incident given the information available. Our investigation is ongoing and any information herein is subject to change.

Pattern

Other recent incidents

Browse full incident history →

Get alerted before the next Linode outage.

Pulsetic catches degradations minutes before vendors acknowledge them.

Start monitoring free

SOLUTIONS

For SaaS

For Agencies

For E-commerce

For Developers

For DevOps

For Startups

For Hosting Providers
MONITORING

SSL Monitoring

Ping Monitoring

Port Monitoring

TCP Monitoring

Keyword Monitoring

Cron Monitoring

Domain Monitoring

SLA Monitoring

API Monitoring
STATUS

ChatGPT

Claude

Cloudflare

GitHub

Shopify

Slack

Stripe

Supabase
SERVICE

Pricing

Status Pages

Status Badges

API

MCP Server

Is Website Down?

Blog

Glossary

Website Errors

Free Tools

SLA Calculator

Cron Generator

DNS Lookup

DNS Checker

Bulk URL Checker

SPF Checker

DMARC Checker
COMPARE

Statuspage Atlassian

Better Stack

Uptimerobot

StatusCake

Freshping

Pingdom

Site24x7

Uptime.com

incident.io

Uptime Kuma
ACCOUNT

Login

Dashboard

Help

Support

Lost Password

Privacy

Terms

DPA

Cookies Policy

Affiliate Program

Stay online, all the time, with Pulsetic's uptime prime.

By Designmodo

Designmodo Inc. 169 Madison Ave, #79627, New York, NY 10016, United States