https://status.flyio.net

Page logo
 
      x
Get email notifications whenever Fly.io creates, updates or resolves
an incident.
Email address: [                    ] 
Enter OTP: [                    ]

Resend OTP in: seconds

Didn't receive the OTP? Resend OTP

[Subscribe via Email]
This site is protected by reCAPTCHA and the Google Privacy Policy and
Terms of Service apply.
Get text message notifications whenever Fly.io creates or resolves an
incident.
Country code:
[United States (+1)                    ]
Phone number: [                    ]
Change number Enter OTP: [                    ]

Resend OTP in: 30 seconds

Didn't receive the OTP? Resend OTP

[Subscribe via Text Message]
Message and data rates may apply. By subscribing you agree to the
Atlassian Terms of Service, and the Atlassian Privacy Policy. This
site is protected by reCAPTCHA and the Google Privacy Policy and
Terms of Service apply.
Get incident updates and maintenance status messages in Slack.
Subscribe via Slack
By subscribing you agree to the Atlassian Cloud Terms of Service and
acknowledge Atlassian's Privacy Policy.
Get webhook notifications whenever Fly.io creates an incident,
updates an incident, resolves an incident or changes a component
status.
Webhook URL: [                    ]

The URL we should send the webhooks to

Email address: [                    ]

We'll send you email if your endpoint fails

[Subscribe]
This site is protected by reCAPTCHA and the Google Privacy Policy and
Terms of Service apply.
Visit our support site.
Get the Atom Feed or RSS Feed.
Elevated API Latency and Timeout Errors Subscribe
Monitoring - A fix has been implemented and both Machines API and
GraphQL API performance have returned to normal.
Nov 26, 2024 - 21:13 UTC
Identified - We have identified the cause of the API latency increase
and are working to mitigate
Nov 26, 2024 - 20:28 UTC
Investigating - We are currently investigating elevated error rates
with our Machines and Graphql APIs.

Users may experience slower responses or timeouts using the Machines
API and flyctl commands
Nov 26, 2024 - 20:23 UTC
x

Subscribe to Incident

Subscribe to updates for Elevated API Latency and Timeout Errors via
email and/or text message. You'll receive email notifications when
incidents are updated, and text message notifications whenever Fly.io
creates or resolves an incident.

VIA EMAIL:
[                    ]
VIA SMS:
[United States (+1)                    ]
Enter mobile number
[                    ] Edit number Send OTP
Enter the OTP sent
[                    ] Resend OTP  in 30 seconds
To receive SMS updates, please verify your number. To proceed with
just email click 'Subscribe'
Subscribe to Incident 
Message and data rates may apply. By subscribing you agree to the
Atlassian Terms of Service, and the Atlassian Privacy Policy. This
site is protected by reCAPTCHA and the Google Privacy Policy and
Terms of Service apply.
About This Site

This page is for updates about global incidents. It does not include
updates about routine hardware failures or isolated infrastructure
events that have limited impact. For a personalized view of all
events that might affect your apps, please check the personalized
status page in your Fly Organization's dashboard. For all internal
incidents and other activities, please check Infra Log.

Customer Applications Operational
Dashboard Operational
Machines API Operational
Regional Availability Operational
AMS - Amsterdam, Netherlands Operational
ARN - Stockholm, Sweden Operational
ATL - Atlanta, Georgia (US) Operational
BOG - Bogota, Colombia Operational
BOM - Mumbai, India Operational
CDG - Paris, France Operational
DEN - Denver, Colorado (US) Operational
DFW - Dallas, Texas (US) Operational
EWR - Secaucus, NJ (US) Operational
EZE - Ezeiza, Argentina Operational
FRA - Frankfurt, Germany Operational
GDL - Guadalajara, Mexico Operational
GIG - Rio de Janeiro, Brazil Operational
GRU - Sao Paulo, Brazil Operational
HKG - Hong Kong Operational
IAD - Ashburn, Virginia (US) Operational
JNB - Johannesburg, South Africa Operational
LAX - Los Angeles, California (US) Operational
LHR - London, United Kingdom Operational
MAD - Madrid, Spain Operational
MEL - Melbourne, Australia Operational
MIA - Miami, Florida (US) Operational
NRT - Tokyo, Japan Operational
ORD - Chicago, Illinois (US) Operational
OTP - Bucharest, Romania Operational
PHX - Phoenix, Arizona (US) Operational
QRO - Queretaro, Mexico Operational
SCL - Santiago, Chile Operational
SEA - Seattle, Washington (US) Operational
SIN - Singapore Operational
SJC - San Jose, California (US) Operational
SYD - Sydney, Australia Operational
WAW - Warsaw, Poland Operational
YUL - Montreal, Canada Operational
YYZ - Toronto, Canada Operational
Persistent Storage (Volumes) ? Operational
Deployments ? Operational
Remote Builds Operational
Logs Operational
Metrics ? Operational
SSL/TLS Certificate Provisioning Operational
UDP Anycast ? Operational
Fly Machine Image Registry 1 Operational
Fly Machine Image Registry 2 Operational
Extensions Operational
Upstash for Redis Operational
DNS Operational
Fly Machine .internal DNS ? Operational
Fly Machine External DNS Operational
*.fly.dev Nameservers Operational
*.flyio.net Nameservers Operational
Billing Operational
Usage Metrics API Operational
Stripe API Connection Operational
Corrosion ? Operational
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Past Incidents
Nov 26, 2024
Degraded Connectivity
Resolved - We have determined that some customers' machines are being
throttled due to our full rollout of CPU quotas, separate from the
incident yesterday. This in turn caused apparent networking issues.
We have now temporarily rolled back these changes while we work with
customers to better adapt to CPU quotas.
Nov 26, 16:11 UTC
Investigating - We are aware of customer-reported issues with
internal networking and are investigating.
Nov 26, 14:30 UTC
Degraded API Performance
Resolved - This incident has been resolved.
Nov 26, 08:15 UTC
Update - We've scaled up our systems and applied fixes to our API.
Everything should be operational now.
Nov 26, 07:52 UTC
Update - We are scaling up our systems to handle the increased
traffic
Nov 26, 05:43 UTC
Update - All hosts have completed the restoration process and we are
seeing our overall Corrosion cluster health and performance return to
normal.

Machine API and GraphQL API error rates are improving, but some users
may still see elevated rates of request timeouts and/or 504 errors
when using the Machines API or Flyctl commands. We are continuing to
monitor these services as they recover.
Nov 26, 03:42 UTC
Monitoring - The restore process has completed on the majority of
hosts in our fleet and we are seeing overall Corrosion cluster health
and performance return to normal.

There are a small number of hosts that are still being worked on, we
aim to have them restored shortly.
Nov 26, 02:31 UTC
Update - We are running a restoration and reseed process to bring the
Corrosion cluster back to a healthy, current state.
During this restoration process, you may see elevated error rates on
machines or apps that have been recently updated.
Nov 26, 02:06 UTC
Update - The updates have been applied, however we are still not
seeing recovery on all Corrosion nodes. We are continuing to work on
a fix.

The machines API and proxy performance remains in a degraded state,
especially with newly created and updated machines.
Nov 25, 23:58 UTC
Update - The Machines API issues stem from a propagation delay in our
global state store, Corrosion.

We have completed deploying a configuration change to our Corrosion
cluster and will be applying these changes to each node shortly. We
expect improvement once the changes are applied.

In the meantime users may still see degraded machines API and proxy
performance, especially with newly created machines
Nov 25, 22:15 UTC
Identified - The issue has been identified and a fix is being
implemented.
Nov 25, 20:20 UTC
Investigating - We are investigating degraded API performance
Nov 25, 20:10 UTC
Nov 25, 2024
Nov 24, 2024

No incidents reported.

Nov 23, 2024

No incidents reported.

Nov 22, 2024
Log Search unavailable
Resolved - This incident has been resolved.
Nov 22, 04:29 UTC
Monitoring - A fix has been implemented and we are monitoring the
results.
Nov 21, 21:41 UTC
Investigating - We are investigating an issue with application log
search. This impacts Fly Metrics log search panels, and historical
app logs.

Streaming logs using `fly logs`, the Live Logs page in the dashboard,
and Fly Log Shipper services continue to work as expected.
Nov 21, 15:56 UTC
Nov 21, 2024
Nov 20, 2024
Emergency network maintenance in ARN
Completed - The scheduled maintenance has been completed.
Nov 20, 05:00 UTC
In progress - Scheduled maintenance is currently in progress. We will
provide updates as necessary.
Nov 20, 02:00 UTC
Scheduled - Our network provider is performing an emergency switch
replacement during this window. An up to one hour network outage is
expected during this maintenance window. Please verify your fly apps
are deployed to more than one region to avoid impact.
Nov 20, 00:30 UTC
Nov 19, 2024
Log Search unavailable
Resolved - This incident has been resolved.
Nov 19, 19:48 UTC
Monitoring - A fix has been implemented and we are monitoring the
results.
Nov 19, 19:04 UTC
Investigating - We are investigating an issue causing application log
search to be unavailable. This is affecting the Fly Metrics log
search panels, and historical application logs initially returned
from the `fly logs` command.

Streaming logs using `fly logs`, the Live Logs page in the dashboard,
and Fly Log Shipper services continue to work as expected.
Nov 19, 18:46 UTC
Nov 18, 2024

No incidents reported.

Nov 17, 2024

No incidents reported.

Nov 16, 2024

No incidents reported.

Nov 15, 2024

No incidents reported.

Nov 14, 2024
Degraded IPv6 connectivity in IAD
Resolved - This incident has been resolved.
Nov 14, 09:45 UTC
Investigating - Our upstream provider has degraded IPv6 connectivity
in IAD. We are actively working with them to resolve this.
Nov 14, 09:16 UTC
Nov 13, 2024

No incidents reported.

Nov 12, 2024
QRO network issues
Resolved - This incident has been resolved.
Nov 12, 23:59 UTC
Update - We are working with our upstream provider to fix the network
issue. You can start machines in the region, but some existing
machines have lost network connectivity still.
Nov 12, 20:10 UTC
Identified - The issue has been identified and a fix is being
implemented.
Nov 12, 19:09 UTC
Investigating - We are currently investigating this issue.
Nov 12, 18:54 UTC
- Incident History Powered by Atlassian Statuspage