https://status.flyio.net Page logo x Get email notifications whenever Fly.io creates, updates or resolves an incident. Email address: [ ] Enter OTP: [ ] Resend OTP in: seconds Didn't receive the OTP? Resend OTP [Subscribe via Email] This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Get text message notifications whenever Fly.io creates or resolves an incident. Country code: [United States (+1) ] Phone number: [ ] Change number Enter OTP: [ ] Resend OTP in: 30 seconds Didn't receive the OTP? Resend OTP [Subscribe via Text Message] Message and data rates may apply. By subscribing you agree to the Atlassian Terms of Service, and the Atlassian Privacy Policy. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Get incident updates and maintenance status messages in Slack. Subscribe via Slack By subscribing you agree to the Atlassian Cloud Terms of Service and acknowledge Atlassian's Privacy Policy. Get webhook notifications whenever Fly.io creates an incident, updates an incident, resolves an incident or changes a component status. Webhook URL: [ ] The URL we should send the webhooks to Email address: [ ] We'll send you email if your endpoint fails [Subscribe] This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. Visit our support site. Get the Atom Feed or RSS Feed. Elevated API Latency and Timeout Errors Subscribe Monitoring - A fix has been implemented and both Machines API and GraphQL API performance have returned to normal. Nov 26, 2024 - 21:13 UTC Identified - We have identified the cause of the API latency increase and are working to mitigate Nov 26, 2024 - 20:28 UTC Investigating - We are currently investigating elevated error rates with our Machines and Graphql APIs. Users may experience slower responses or timeouts using the Machines API and flyctl commands Nov 26, 2024 - 20:23 UTC x Subscribe to Incident Subscribe to updates for Elevated API Latency and Timeout Errors via email and/or text message. You'll receive email notifications when incidents are updated, and text message notifications whenever Fly.io creates or resolves an incident. VIA EMAIL: [ ] VIA SMS: [United States (+1) ] Enter mobile number [ ] Edit number Send OTP Enter the OTP sent [ ] Resend OTP in 30 seconds To receive SMS updates, please verify your number. To proceed with just email click 'Subscribe' Subscribe to Incident Message and data rates may apply. By subscribing you agree to the Atlassian Terms of Service, and the Atlassian Privacy Policy. This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply. About This Site This page is for updates about global incidents. It does not include updates about routine hardware failures or isolated infrastructure events that have limited impact. For a personalized view of all events that might affect your apps, please check the personalized status page in your Fly Organization's dashboard. For all internal incidents and other activities, please check Infra Log. Customer Applications Operational Dashboard Operational Machines API Operational Regional Availability Operational AMS - Amsterdam, Netherlands Operational ARN - Stockholm, Sweden Operational ATL - Atlanta, Georgia (US) Operational BOG - Bogota, Colombia Operational BOM - Mumbai, India Operational CDG - Paris, France Operational DEN - Denver, Colorado (US) Operational DFW - Dallas, Texas (US) Operational EWR - Secaucus, NJ (US) Operational EZE - Ezeiza, Argentina Operational FRA - Frankfurt, Germany Operational GDL - Guadalajara, Mexico Operational GIG - Rio de Janeiro, Brazil Operational GRU - Sao Paulo, Brazil Operational HKG - Hong Kong Operational IAD - Ashburn, Virginia (US) Operational JNB - Johannesburg, South Africa Operational LAX - Los Angeles, California (US) Operational LHR - London, United Kingdom Operational MAD - Madrid, Spain Operational MEL - Melbourne, Australia Operational MIA - Miami, Florida (US) Operational NRT - Tokyo, Japan Operational ORD - Chicago, Illinois (US) Operational OTP - Bucharest, Romania Operational PHX - Phoenix, Arizona (US) Operational QRO - Queretaro, Mexico Operational SCL - Santiago, Chile Operational SEA - Seattle, Washington (US) Operational SIN - Singapore Operational SJC - San Jose, California (US) Operational SYD - Sydney, Australia Operational WAW - Warsaw, Poland Operational YUL - Montreal, Canada Operational YYZ - Toronto, Canada Operational Persistent Storage (Volumes) ? Operational Deployments ? Operational Remote Builds Operational Logs Operational Metrics ? Operational SSL/TLS Certificate Provisioning Operational UDP Anycast ? Operational Fly Machine Image Registry 1 Operational Fly Machine Image Registry 2 Operational Extensions Operational Upstash for Redis Operational DNS Operational Fly Machine .internal DNS ? Operational Fly Machine External DNS Operational *.fly.dev Nameservers Operational *.flyio.net Nameservers Operational Billing Operational Usage Metrics API Operational Stripe API Connection Operational Corrosion ? Operational Operational Degraded Performance Partial Outage Major Outage Maintenance Past Incidents Nov 26, 2024 Degraded Connectivity Resolved - We have determined that some customers' machines are being throttled due to our full rollout of CPU quotas, separate from the incident yesterday. This in turn caused apparent networking issues. We have now temporarily rolled back these changes while we work with customers to better adapt to CPU quotas. Nov 26, 16:11 UTC Investigating - We are aware of customer-reported issues with internal networking and are investigating. Nov 26, 14:30 UTC Degraded API Performance Resolved - This incident has been resolved. Nov 26, 08:15 UTC Update - We've scaled up our systems and applied fixes to our API. Everything should be operational now. Nov 26, 07:52 UTC Update - We are scaling up our systems to handle the increased traffic Nov 26, 05:43 UTC Update - All hosts have completed the restoration process and we are seeing our overall Corrosion cluster health and performance return to normal. Machine API and GraphQL API error rates are improving, but some users may still see elevated rates of request timeouts and/or 504 errors when using the Machines API or Flyctl commands. We are continuing to monitor these services as they recover. Nov 26, 03:42 UTC Monitoring - The restore process has completed on the majority of hosts in our fleet and we are seeing overall Corrosion cluster health and performance return to normal. There are a small number of hosts that are still being worked on, we aim to have them restored shortly. Nov 26, 02:31 UTC Update - We are running a restoration and reseed process to bring the Corrosion cluster back to a healthy, current state. During this restoration process, you may see elevated error rates on machines or apps that have been recently updated. Nov 26, 02:06 UTC Update - The updates have been applied, however we are still not seeing recovery on all Corrosion nodes. We are continuing to work on a fix. The machines API and proxy performance remains in a degraded state, especially with newly created and updated machines. Nov 25, 23:58 UTC Update - The Machines API issues stem from a propagation delay in our global state store, Corrosion. We have completed deploying a configuration change to our Corrosion cluster and will be applying these changes to each node shortly. We expect improvement once the changes are applied. In the meantime users may still see degraded machines API and proxy performance, especially with newly created machines Nov 25, 22:15 UTC Identified - The issue has been identified and a fix is being implemented. Nov 25, 20:20 UTC Investigating - We are investigating degraded API performance Nov 25, 20:10 UTC Nov 25, 2024 Nov 24, 2024 No incidents reported. Nov 23, 2024 No incidents reported. Nov 22, 2024 Log Search unavailable Resolved - This incident has been resolved. Nov 22, 04:29 UTC Monitoring - A fix has been implemented and we are monitoring the results. Nov 21, 21:41 UTC Investigating - We are investigating an issue with application log search. This impacts Fly Metrics log search panels, and historical app logs. Streaming logs using `fly logs`, the Live Logs page in the dashboard, and Fly Log Shipper services continue to work as expected. Nov 21, 15:56 UTC Nov 21, 2024 Nov 20, 2024 Emergency network maintenance in ARN Completed - The scheduled maintenance has been completed. Nov 20, 05:00 UTC In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary. Nov 20, 02:00 UTC Scheduled - Our network provider is performing an emergency switch replacement during this window. An up to one hour network outage is expected during this maintenance window. Please verify your fly apps are deployed to more than one region to avoid impact. Nov 20, 00:30 UTC Nov 19, 2024 Log Search unavailable Resolved - This incident has been resolved. Nov 19, 19:48 UTC Monitoring - A fix has been implemented and we are monitoring the results. Nov 19, 19:04 UTC Investigating - We are investigating an issue causing application log search to be unavailable. This is affecting the Fly Metrics log search panels, and historical application logs initially returned from the `fly logs` command. Streaming logs using `fly logs`, the Live Logs page in the dashboard, and Fly Log Shipper services continue to work as expected. Nov 19, 18:46 UTC Nov 18, 2024 No incidents reported. Nov 17, 2024 No incidents reported. Nov 16, 2024 No incidents reported. Nov 15, 2024 No incidents reported. Nov 14, 2024 Degraded IPv6 connectivity in IAD Resolved - This incident has been resolved. Nov 14, 09:45 UTC Investigating - Our upstream provider has degraded IPv6 connectivity in IAD. We are actively working with them to resolve this. Nov 14, 09:16 UTC Nov 13, 2024 No incidents reported. Nov 12, 2024 QRO network issues Resolved - This incident has been resolved. Nov 12, 23:59 UTC Update - We are working with our upstream provider to fix the network issue. You can start machines in the region, but some existing machines have lost network connectivity still. Nov 12, 20:10 UTC Identified - The issue has been identified and a fix is being implemented. Nov 12, 19:09 UTC Investigating - We are currently investigating this issue. Nov 12, 18:54 UTC - Incident History Powered by Atlassian Statuspage