[HN Gopher] Fly.io outage - resolved
       ___________________________________________________________________
        
       Fly.io outage - resolved
        
       Author : punkpeye
       Score  : 232 points
       Date   : 2024-11-26 01:47 UTC (21 hours ago)
        
 (HTM) web link (status.flyio.net)
 (TXT) w3m dump (status.flyio.net)
        
       | punkpeye wrote:
       | It is not reflected in their status page, but fly.io itself is
       | not even loading.
        
         | duxup wrote:
         | Confirmation ;)
        
         | nomilk wrote:
         | https://fly.io/ loading for me
        
       | arusahni wrote:
       | Oof, hugops to the team.
        
       | stevefan1999 wrote:
       | Yep...can confirm my self hosted Bitwarden there is completely
       | FUBAR connection wise even if it is in EA, so it should be a
       | worldwide outage...lemme guess, some internal tooling error,
       | consensus split brain, or if it looks like someone leaked BGP
       | routes again?
        
         | jasonjayr wrote:
         | DNS. It's always DNS. /s
        
           | jart wrote:
           | https://github.com/jart/cosmopolitan/blob/master/third_party.
           | ..
        
           | monkaiju wrote:
           | Might be! Shameless plug of a DNS tool i wrote years ago for
           | anyone this pushes to learn more about DNS
           | 
           | https://dug.unfrl.com/
        
         | satoru42 wrote:
         | Mine is in Asia and it's still accessible.
        
         | odo1242 wrote:
         | It was a consensus split-brain ("database replication failure")
         | it seems
        
       | redslazer wrote:
       | fly.io just has the weirdest outages. It has issues so regularly
       | we dont even need to run mock outages to make sure our system
       | fail overs work.
        
         | duxup wrote:
         | When I worked for a company who worked with big banks /
         | financial institutions we used to run disaster recovery tests.
         | Effectively a simulated outage where the company would try to
         | run off their backup sites. They ran everything from those
         | sites, it was impressive.
         | 
         | Once in a while we'd have a real outage that matched the test
         | we ran as recently as the weekend before.
         | 
         | I was helping a bank switch over to the DR site(s) one day
         | during such a real outage and I left my mic open when someone
         | asked me what the commotion was on the upper floors of our HQ.
         | I said "super happy fun surprise disaster recovery test for
         | company X".
         | 
         | VP of BIG bank was on the line monitoring and laughed "I'm
         | using that one on the executive call in 15, thanks!" Supposedly
         | it got picked up at the bank internally after the VP made the
         | joke and was an unofficial code for such an outage for a long
         | time.
        
           | NetOpWibby wrote:
           | Thankfully your comment was positive!
        
           | latch wrote:
           | In most BIG banks, "Vice President" is almost an entry-level
           | title. Easily have 1000s of them. For example, this article
           | points out that Goldman Sachs had ~12K VPs out of more than
           | 30K employees: https://web.archive.org/web/20150311012855/htt
           | ps://www.wsj.c...
        
             | jart wrote:
             | VP at Goldman is equivalent to Senior SWE according to
             | levels.fyi and their entry level is Analyst. I'm surprised
             | by the compensation though. I would have thought people
             | working at a place with gold in the name would be making
             | more. Also apparently Morgan Stanley pays their VPs
             | $67k/year.
        
               | philipwhiuk wrote:
               | Tech outstripped big finance corps tech a while ago.
               | 
               | Traders make loads, not the SWEs
        
               | bormaj wrote:
               | That VP comp number seems quite low fwiw
        
               | jart wrote:
               | Yes how much longer till we see Morgan Stanley VPs
               | picketing outside demanding a living wage and humming The
               | Internationale.
        
             | SteveNuts wrote:
             | Just like all Sales folks have heavily inflated titles, no
             | customer wants to think they're dealing with a junior
             | salesperson/loan officer when you're about to hand over
             | your money.
             | 
             | It seems like every vendor sales team I work with is an
             | "executive" or "director of sales" even though in reality
             | they're just regular old salespeople.
        
         | benreesman wrote:
         | In fairness to the fly.io folks (who are extremely serious
         | hackers), they're standing up a whole cloud provider and
         | they've priced it attractively and they're much customer-
         | friendlier than most alternatives.
         | 
         | I don't envy the difficulty of doing this, but I'm quite
         | confident they'll iron the bugs out.
        
           | redslazer wrote:
           | The tech is impressive and the pricing is attractive which is
           | why we use them. I just wish there was less black magic.
        
             | benreesman wrote:
             | I don't always agree with @tptacek on social/political
             | issues, and I don't always agree with @xe on the direction
             | of Nix, but these are legends on the technical side of
             | things. And they're trying to build an equitable
             | relationship between the user of cloud services and the
             | provider, not fund a private space program.
             | 
             | If I were in the market for cloud services I'd highly prize
             | a long-term relationship on mutual benefit and fair
             | dealings over a short-term nuisance of being an early
             | adopter.
             | 
             | I strongly suspect your investment in fly is going to pay
             | off.
        
               | verelo wrote:
               | I want to believe, but in the meantime they're killing
               | the product I've been working hard to build trust with my
               | own customers though. There is a limit to my idealism,
               | and it's well and truly in the past.
        
               | reissbaker wrote:
               | FWIW Xe was let go from Fly earlier this year during a
               | round of layoffs.
        
               | benreesman wrote:
               | Unfortunate. Xe rocks.
        
               | xena wrote:
               | Xe here. As a sibling comment said, I didn't survive
               | layoffs. If you're looking for someone like me, I'm on
               | the market!
        
               | benreesman wrote:
               | Hiring people is above my pay grade, but I can vouch to
               | my lords and masters and anyone else who cares what I
               | think that a legend is up for grabs.
               | 
               | b7r6@b7r6.net
        
               | xena wrote:
               | I'd email but I'm about to pass out in bed. Please see
               | https://xeiaso.net/contact/ in case I don't get back to
               | you in the morning.
        
               | foldr wrote:
               | I suspect that making a cloud service provider run
               | reliably requires tons of grunt work more than it
               | requires technical heroism from a small number of highly
               | talented individuals.
        
               | tptacek wrote:
               | Yes.
        
               | tptacek wrote:
               | I'm several steps removed from day-to-day engineering at
               | this point; the team working on this is much better than
               | I am. It's just a very hard problem; biting it off is
               | something you can certainly blame me for, though.
               | 
               | (Also: not a legend, just loud.)
        
               | benreesman wrote:
               | I may be the minority on this view, but I think that it's
               | possible to be both a recognized expert aka legend and
               | loud ("visible" might be a kinder word).
               | 
               | When you talk technology, I listen, and I doubt I'm alone
               | in that. Keep up the good work with fly.io!
        
       | shubhamjain wrote:
       | This is probably 5th or 6th major outage from Fly.io that I have
       | personally seen. Pretty sure there were many others and some just
       | went unnoticed. I recommended the service to a friend, and within
       | two days he faced two outages.
       | 
       | Fly.io seriously needs to get it together. Why it hasn't happened
       | yet is a mystery to me. They have a good product but stability
       | needs to be an absolute top for a hosting service. Everything
       | else is secondary.
        
         | mcqueenjordan wrote:
         | Reliability is hard when your volume is (presumably) scaling
         | geometrically.
        
           | paxys wrote:
           | Can't use the "reliability is hard" excuse when you are quite
           | literally in the business of selling reliability.
        
             | mcqueenjordan wrote:
             | It's just not that big of a mystery. It's not an excuse;
             | it's just true. Also, they're not especially selling
             | reliability as much as they're selling small geo-
             | distributed deployments.
        
         | ilrwbwrkhv wrote:
         | Does anyone use them beyond the free tier? Same with Vercel for
         | example.
        
           | gk1 wrote:
           | Vercel has revenue of over $100M. So yes at least a few
           | companies use them beyond the free tier.
        
           | dizhn wrote:
           | Which company? GitHub? As far as I know fly.io does not have
           | a free tier.
        
         | adityapatadia wrote:
         | We left it about a year ago due to reliability issues. We now
         | use digitalocean apps and working like a charm. Zero downtime
         | with DO.
        
           | subarctic wrote:
           | You mean their App Platform right? How does the pricing
           | compare to fly?
        
             | adityapatadia wrote:
             | Yes, App Platform. Pricing is a little higher but way lower
             | than AWS but it is fully justified. Zero downtime in the
             | last 1 year.
             | 
             | With Fly, we had 3-4 downtimes in 2023 in a span of 4
             | months.
        
         | SOLAR_FIELDS wrote:
         | I get this but I think if people can give GitHub a pass for
         | shitting the bed every two weeks maybe Fly should get a bit of
         | goodwill here. I am not affiliated with Fly at all but I do
         | think that people should temper their expectations when even
         | mega corp can't get it right
         | 
         | I guess the secret is to be the incumbent with no suitable
         | replacement. Then you can be complete garbage in terms of
         | reliability and everyone will just hand wave away your poor ops
         | story
        
           | ojame wrote:
           | The biggest difference is GitHub in your infrastructure is
           | (nearly always) internal. Fly in your infrastructure is
           | external. Users generally don't see when you have issues with
           | GitHub, but they do generally see when you have issues with
           | Fly.
           | 
           | That's the core difference.
        
           | fragmede wrote:
           | Who's giving GitHub a pass on shitting the bed? They go down
           | often enough that if you don't have an internal git server
           | setup for your CICD to hit, that's on you.
        
             | SOLAR_FIELDS wrote:
             | My point is made by your very post - getting off GitHub
             | onto alternatives is not seriously discussed as an option -
             | instead it's "well, why didn't you prepare better to deal
             | with your vendor's poor ops story"
        
               | fragmede wrote:
               | I wasn't going to bring up being on an internally hosted
               | gitlab instead of github, but that would be the "not
               | giving them a pass" part.
        
       | benhoyt wrote:
       | My fly.io-hosted website went down for 5 minutes (6 hours ago),
       | but then came right back up, and has been up ever since. I use a
       | free monitoring service that checks it every 5 minutes, so it's
       | possible it missed another short bit of downtime. But fly.io has
       | been pretty reliable overall for me!
        
         | nomilk wrote:
         | Would be fascinated to see your data over a period of months.
         | 
         | Application up time is flakey, but what was worse were fly
         | deploys failing for no clear reason. Sometimes layers would
         | just hang and eventually fail for no particular reason; I'd run
         | the same command an hour or two later without any changes and
         | it would just work as expected.
         | 
         | I'd love to make a monitoring service to _deploy_ a basic app
         | (i.e. run the fly deploy command) every 5 minutes and see how
         | often those deploys fail or hang. I 'd guess ~5% inexplicably
         | fail, which is frustrating unless you've got a lot of spare
         | time.
        
           | sanswork wrote:
           | My downtimes from fly are pretty rare but generally global
           | when they happen, in this outage we had no downtime but
           | couldn't deploy for a few hours. I have issues with deploying
           | about once per quarter(deploy most days across a few apps)
        
             | nomilk wrote:
             | If that's the case I suspect fly is getting a lot more
             | reliable. I stopped using them about a year ago so haven't
             | kept up on their reliability since. Glad to hear, it's good
             | for a competitive market to have many providers, and fly
             | might have issues but hopefully has a bright future
        
               | sanswork wrote:
               | They are definitely getting more reliable. I was an early
               | user and moved off them to self hosted for quite a while
               | because of the frequent downtime in early days.
               | 
               | Their support still leaves a lot to be desired even as
               | someone that pays for it but the ease of running and
               | deploying a distributed front end keeps bringing me back.
        
           | rozenmd wrote:
           | This may be of interest to you:
           | https://news.ycombinator.com/item?id=42243282
        
           | jrockway wrote:
           | I used to run a service that created k8s clusters on GCP for
           | our customers. We did want to check that that functionality
           | kept working and had a prober test it periodically. It was
           | actually broken a lot.
           | 
           | Always good to monitor your dependencies if you have the
           | time. Then when someone complains about an issue in your
           | service, you can check your monitoring to see if your
           | upstream services are broken. If they are, at least you know
           | where to start debugging.
        
         | beezlewax wrote:
         | Do you mind if I ask what monitoring service that is?
        
           | benhoyt wrote:
           | Sure, it's UptimeRobot: https://uptimerobot.com/
        
           | andrew-jack wrote:
           | Use https://pulsetic.com/
        
             | vextea wrote:
             | Is it your service?
        
           | buzzier wrote:
           | https://github.com/louislam/uptime-kuma
        
         | rozenmd wrote:
         | I externally monitor fly.io and it's docs here:
         | https://flyio.onlineornot.com/
         | 
         | Looks like it lasted 16 minutes for them.
        
         | davidgl wrote:
         | Same for us, down for ~5 mins, back up and fine, error was 501
        
           | TacticalCoder wrote:
           | Someone said 16 minutes: so it's not even 5 nines service.
        
         | dprotaso wrote:
         | What free monitoring tool do you use?
        
       | MaxfordAndSons wrote:
       | Kinda funny that they've named their global state store
       | "Corrosion"... not really a word I'd associate with stability and
       | persistence.
        
         | kermatt wrote:
         | https://community.fly.io/t/reliability-its-not-great/11253
         | 
         | https://github.com/superfly/corrosion
        
         | lordofgibbons wrote:
         | It's an internal project based on Rust, not a product. So I
         | don't think it matters too much what they name it. It's opens
         | source which is great, but still not a product that they need
         | to market.
        
           | SOLAR_FIELDS wrote:
           | And to be fair, it's a bit of a cute meme to name rust
           | projects things that relate to it. Oxide, etc
        
         | dumah wrote:
         | I take your point but corrosion-resistant metals such as
         | Aluminum, Titanium, Weathering Steel and Stainless Steel don't
         | avoid corrosion entirely but form a thin and extremely stable
         | corrosion layer (under the right conditions).
        
           | littlestymaar wrote:
           | Gold and platinum really are corrosion resistant though (but
           | have questionable mechanical properties...)
        
         | toast0 wrote:
         | I stored important data in mnesia, so who would I be to talk.
         | :p
        
           | throwawaymaths wrote:
           | amnesia means forget, so mnesia means remember, I would
           | guess?
        
       | veggieWHITES wrote:
       | I was considering these guys the other day until I saw their
       | pricing page: https://fly.io/pricing/
       | 
       | (There's not a single price on there, why even create the page?)
        
         | schmichael wrote:
         | The prices are just one click deeper. Hardly a nefarious dark
         | pattern.
        
         | rascul wrote:
         | There's a link to what appears to be the actual pricing page
         | https://fly.io/docs/about/pricing/
         | 
         | There's also a link to the pricing calculator
         | https://fly.io/calculator
        
           | totetsu wrote:
           | Is that calculator hourly or monthly?
        
             | radicalriddler wrote:
             | Literally says "Monthly Costs" in the green panel on the
             | right that calculates the total.
        
             | eviks wrote:
             | It's right there: "Monthly Cost"
        
         | Aeolun wrote:
         | OMG, that's hilarious. I use them, and I know what my prices
         | are, but I'd never noticed that the page called pricing doesn't
         | actually have any.
        
           | tptacek wrote:
           | We've always had public pricing; you can't do a metered cloud
           | provider without a rate sheet. But it's been part of our
           | product documentation, rather than the front page of the
           | website, until recently; there's a whole saga behind it,
           | which gets into whether we offer "plans" or not, how support
           | works, all that jazz, all of which kept us from putting
           | together a marketing pricing page.
        
       | HellsMaddy wrote:
       | Suspiciously, Turso started having issues around the same time.
       | Their CEO confirmed on Discord it's due to the Fly outage:
       | 
       | > Ok.I caught up with our oncall and This seems related to the
       | Fly.io incident that is reported in our status page. Our login
       | does call things in the Fly.io API
       | 
       | > we are already in touch with Fly and will see if we can speed
       | this up
        
         | pier25 wrote:
         | Not the first time Turso goes down because of Fly issues. It
         | must suck to have built a db service and have this downtime.
         | 
         | Apparently Turso are going to offer an AWS tier at some point.
        
           | jonasdoesthings wrote:
           | Last month Turso released AWS-hosted databases to the public
           | (still in Beta): https://turso.tech/blog/turso-aws-beta
        
             | pier25 wrote:
             | Thanks!
        
       | DataOverload wrote:
       | We switched from Fly to CF workers a while ago, and never looked
       | back
        
         | eek2121 wrote:
         | congrats on not developing a playbook for the time you have to
         | 'look back'.
         | 
         | Providers will fail. good contingencies won't.
         | 
         | ...hears faint sound...I SAID GOOD, QUIET YOU!
        
         | punkpeye wrote:
         | They are fundamentally different. If Cloudflare provided a way
         | to host docker containers with volumes though, that would be
         | game over for so many paas platforms.
        
           | stoicjumbotron wrote:
           | Can't wait: https://blog.cloudflare.com/container-platform-
           | preview/
        
             | punkpeye wrote:
             | wow, this will be huge
        
               | Aeolun wrote:
               | Only if they can sort out their atrocity of a
               | documentation website.
        
         | rstupek wrote:
         | How are they equivalent?
        
         | frakkingcylons wrote:
         | I switched from apples to oranges and never looked back.
        
         | pier25 wrote:
         | Our stuff on CF Workers has been working non stop for years
         | now.
         | 
         | About 6 months ago we migrated our most critical stuff from Fly
         | to CF and boy every time Fly has issues I'm so glad we did.
        
           | jpgvm wrote:
           | Too much custom stuff too quickly, there is a lot of
           | efficiency in vertical integration and a fully cohesive stack
           | but it takes a very long time to stabilize if you take that
           | route.
           | 
           | We spent months trying to convince them of problems with
           | their H2 implementation in their LB/proxy (they insisted
           | nginx was at fault, spoiler - it wasn't) but had to leave (we
           | also went to CF, which has its own problems). Eventually one
           | of their employees wrong a long blog post about H2 that made
           | it obvious they finally found and fixed those problems but
           | months too late for my employer at the time.
           | 
           | It would have been infinitely better for us if they could
           | have just fixed their stability problems, that abstraction
           | suited us as did their LB/proxy impl and SNI pricing.
           | 
           | I wish them well, some really smart folk over there but I can
           | imagine these reliability problems are probably really
           | grinding down morale.
        
       | EGreg wrote:
       | What exactly does flyio.net do?
        
         | michaelbuckbee wrote:
         | Hosting service that has a lot of interesting distributed
         | features.
        
         | HellsMaddy wrote:
         | If you mean specifically flyio.net and not just fly.io the
         | company, I'm guessing they host their status page on a separate
         | domain in case of DNS/registrar issues with their primary
         | domain.
        
         | stackghost wrote:
         | IIRC their value prop is that they let you rapidly spin up
         | deployments/machines in regions that are closest to your users,
         | the idea being that it will be lower latency and thus better
         | UX.
        
         | eek2121 wrote:
         | WEB 2.0. SEE. TOLD YA! THEY SHOULDA UPGRADED TO THAT NEWFANGLED
         | 3.0! ;)
        
         | vachina wrote:
         | It's basically what Heroku used to be but with CDN-like
         | presence.
        
       | mrcwinn wrote:
       | I tried Fly early. I was very excited about this service, but
       | I've never had a worse hosting experience. So I left.
       | Coincidentally I tried it again a few days ago. Surely things
       | must be better. Nope. Auth issues in the CLI, frustrations
       | deploying a Docker app to a Fly machine. I wouldn't recommend it
       | to anyone.
        
         | steve_adams_86 wrote:
         | I find their user experience to be exceptional. The only flake
         | I've encountered is in uptime and general reliability of
         | services I don't interface with directly. They've done a
         | stellar job on the stuff you actually deal with, but the glue
         | holding your services together seems pretty wobbly.
        
       | teaearlgraycold wrote:
       | I'm grateful to HN for keeping me well aware of Fly's issues.
       | I'll never use them.
        
         | kachapopopow wrote:
         | It's still 99.99+% SLA? Would you really pay 100% more for
         | <0.01% more uptime?
        
           | cj wrote:
           | I think what a lot of people fail to understand is that there
           | are certain categories of apps that simply "can never go
           | down"
           | 
           | Examples include basically any PaaS, IaaS, or any company
           | that provides a mission-critical service to another company
           | (B2B SaaS).
           | 
           | If you run a basic B2C CRUD app, maybe it's not a big deal if
           | you service goes down for 5 minutes. Unfortunately there are
           | quite a few categories of companies where downtime simply
           | isn't tolerated by customers. (I operate a company with a
           | "zero downtime" expectation from customers - it's no joke,
           | and I would never use any infrastructure abstraction layer
           | other than AWS, GCP or Azure - preferably AWS us-east-1
           | because, well, if you know the joke...)
        
             | macNchz wrote:
             | Every PaaS and IaaS I've ever used has had some amount of
             | downtime, often considerably more than 5 minutes, and I've
             | run production services on many of them. Plenty of random
             | issues on major cloud providers as well. Certainly plenty
             | of situations with dozens of Twitter posts happening but
             | never any acknowledgement on the AWS status page. Nothing's
             | perfect.
        
               | cj wrote:
               | Yea, when running services where 5 minutes of downtime
               | results in lots of support tickets, you learn to accept
               | that the incident will happen and learn to manage the
               | incident rather than relying that it will never occur.
        
             | MobiusHorizons wrote:
             | you realize all of those services you mention can't give
             | you zero downtime, they would never even advertise that.
             | They have quite good reliability certainly, but on long
             | enough time horizons absolutely no-one has zero downtime.
        
             | littlestymaar wrote:
             | If your app cannot go down ever, then you cannot use a
             | cloud provider either (because even AWS and Azure do fail
             | sometime, just look up for "Azur down" on HN).
             | 
             | But the truth is everybody can afford _some_ level of
             | outage, simply because nobody has the budget to provision
             | an infra that can _never_ fail.
        
               | vrosas wrote:
               | I've seen a team try and be truly "multi-cloud" but then
               | ended up with this Frankenstein architecture where
               | instead of being able to weather one cloud going down,
               | their app would die if _any_ cloud had an issue. It was
               | also surprisingly hard to convince people it doesn't
               | matter how many globally distributed clusters you have if
               | all your data is in us-east.
        
             | toast0 wrote:
             | > I think what a lot of people fail to understand is that
             | there are certain categories of apps that simply "can never
             | go down"
             | 
             | I refuse to believe that this category still exists, when I
             | need to keep my county's alternate number for 911 in my
             | address book, because CenturyLink had a 6 hour outage in
             | 2014 and a two day outage in 2018. If the phone company
             | can't manage to keep 911 running anymore, I'd be very
             | surprised what does have zero downtime over a ten year
             | period.
             | 
             | Personally, nine nines is too hard, so I shoot for eight
             | eights.
        
             | bri3d wrote:
             | My experience with very large scale B2B SaaS and PaaS has
             | been that customers like to get money, if allowed by
             | contract, by complaining about outages, but that overall,
             | B2B SaaS is actually very forgiving.
             | 
             | Most B2B SaaS solutions have very long sales cycles and a
             | high total cost to implement, so there is a lot of inertia
             | to switching that "a few annoying hours of downtime a year"
             | isn't going to cover. Also, the metric that will drive
             | churn isn't actually zero downtime, it's "nearest
             | competitor's downtime," which is usually a very different
             | number.
        
             | sgrove wrote:
             | All of your examples have had multiple cases of going down,
             | some for multiple days (2011 AWS was the first really long
             | one I think) - or potentially worse, just deleting all
             | customer data permanently and irretrievably.
             | 
             | Meaning empirically, downtime seems to be tolerated by
             | their customers up to some point?
        
           | mrcwinn wrote:
           | This is not my experience at all, as a former paying
           | customer.
        
           | runako wrote:
           | No dog in this fight, all props to the Fly.io team for having
           | the gumption to do what they are doing, I genuinely hope they
           | are successful...
           | 
           | > It's still 99.99+% SLA
           | 
           | But this is simply not accurate. 99.99% uptime is < 52m 9.8s
           | _annually_ of downtime. They apparently blew well through
           | that today. Looks like they essentially had the equivalent of
           | 4 years of 99.99% uptime equivalent this evening.
           | 
           | Four nines is so unforgiving that it's almost the case that
           | if people are required to be in the loop at any point during
           | an incident, you will blow the fourth nine for the whole year
           | in a single incident.
           | 
           | Again, I know it's hard. I would not want to be in the space.
           | That fourth nine is really difficult to earn.
           | 
           | In the meanwhile, <hugops> to the Fly team as they work to
           | resolve this (and hopefully get some rest).
        
             | fulafel wrote:
             | 99.99+% SLA typically means you get some billing credits
             | for the downtime exceeding 99.99+ availability. So
             | technically do get a "99.99+% SLA", but you don't get
             | 99.99+% availability.
             | 
             | Other circles use "SLO" (where the O stands for objective).
             | 
             | (Anyone know what the details in fly.io SLA are?)
        
               | runako wrote:
               | You are correct in the legal/technical sense!
               | 
               | Technically, anyone could offer five- or six-nines and
               | just depend on most customers not to claim the credits
               | :-D
               | 
               | Actually hitting/exceeding four nines is still tough.
        
           | PUSH_AX wrote:
           | You say that like it's their only issue.
           | 
           | Earlier in the year they had a catastrophic outage in LHR, we
           | lost all our data. Yes this is also on me, I'm aware. Still,
           | that's a hard nope from me, we migrated.
        
       | akshayshah wrote:
       | The series of outages early in 2023 also had some Corrosion-
       | related pain: https://community.fly.io/t/reliability-its-not-
       | great/11253
        
         | __turbobrew__ wrote:
         | Seems like rolling their own datastore turned out to be a bad
         | bet.
         | 
         | Im not super familiar with their constraints but scylladb can
         | do eventual consistency and is generally quite flexible.
         | CouchDB is also an option for multi-leader replication.
        
       | pier25 wrote:
       | My apps on Fly have not gone down this time.
        
       | marvin-hansen wrote:
       | No surprise. About a year ago, I looked at fly.io because of it's
       | low pricing and I was wondering where they were cutting corners
       | to still make some money. Ultimately, I found the answer in their
       | tech docs where it was spelled out clearly that an fly instance
       | is hardwired to one physical server and thus cannot fail over in
       | case that server dies. Not sure if that part still is in the
       | official documentation.
       | 
       | In practice, that means if a server goes down, they have to load
       | the last snapshot from that instance from the Backup and push it
       | on a new server, update the network path, and pray to god that
       | not more server fail than spare capacity is available. Otherwise
       | you have to wait for a restore until the datacenter mounted a few
       | more boxes in the rack.
       | 
       | That explains quite a bit the randomness of those outage reports
       | i.e. my app is down vs the other is fine and mine came back in 5
       | minutes vs the other took forever.
       | 
       | As a business on a budget, I think anything else i.e. a small
       | civo cluster serves you better.
        
         | fulafel wrote:
         | The status tells a story about a high-availability/clustering
         | system failure so I think in this case the problem is rather
         | the complexity of the HA machinery hurting the system's
         | availability vs something like a simple VPS.
        
         | ignoramous wrote:
         | Fly.io can migrate vm+volume now:
         | https://fly.io/docs/reference/machine-migration/ /
         | https://archive.md/rAK0V
         | 
         | > _a fly instance is hardwired to one physical server and thus
         | cannot fail over_
         | 
         | I'm having trouble understanding how _else_ this is supposed to
         | be? I understand that _live migration_ is a thing, but even in
         | those cases, a VM is  "hardwired" to some physical server, no?
        
           | mzi wrote:
           | > I'm having trouble understanding how else this is supposed
           | to be? I understand that live migration is a thing, but even
           | in those cases, a VM is "hardwired" to some physical server,
           | no?
           | 
           | You can run your workload (in this case a VM) on top of a
           | scheduler, so if one node goes down the workload is just spun
           | up on another available node.
           | 
           | You will have downtime, but it will be limited.
        
             | ignoramous wrote:
             | > _so if one goes down ... just spun up on another_
             | 
             | On Fly, one can absolutely set this up. Multiple ways:
             | https://fly.io/docs/apps/app-availability /
             | https://archive.md/SJ32K
        
           | sofixa wrote:
           | > I'm having trouble understanding how else this is supposed
           | to be? I understand that live migration is a thing, but even
           | in those cases, a VM is "hardwired" to some physical server,
           | no?
           | 
           | They mean the storage part. If your VM's storage(state) is on
           | one server and that server dies, you have to restore from
           | backup. If your VM's storage is on remote shared storage
           | mounted to that server and the server dies, your VM can be
           | restarted elsewhere that has access to that shared storage.
           | 
           | In AWS land it's the difference between instance store (local
           | to a server) and EBS (remote, attached locally).
           | 
           | There's a tradeoff in that shared storage will be slightly
           | slower due to having to traverse networking, and it's harder
           | to manage properly; but the reliability gain is massive.
        
         | dilyevsky wrote:
         | > Ultimately, I found the answer in their tech docs where it
         | was spelled out clearly that an fly instance is hardwired to
         | one physical server and thus cannot fail over in case that
         | server dies.
         | 
         | Majority of EC2 instance types did not have live migration
         | until very recently. Some probably still don't (they don't
         | really spell out how and when it's supposed to work). It is
         | also not free - there's a noticeable brown-out when your VM
         | gets migrated on GCP for example.
        
           | ixaxaar wrote:
           | Can you shed some more light on this "browning out"
           | phenomenon?
        
             | toast0 wrote:
             | Here's the GCP doc [1]. Other live migration products are
             | similar.
             | 
             | Generally, you have worse performance while in the
             | preparing to move state, an actual pause, then worse
             | performance as the move finishes up. Depending on the
             | networking setup, some inbound packets may be lost or
             | delayed.
             | 
             | [1] https://cloud.google.com/compute/docs/instances/live-
             | migrati...
        
         | pier25 wrote:
         | If you want HA on Fly you need to deploy an app to multiple
         | regions (multiple machines).
         | 
         | Fly might still go down completely if their proxy layer fails
         | but it's much less common.
        
           | sb8244 wrote:
           | The proxy layer was the cause of yesterday's outage according
           | to support.
        
             | pier25 wrote:
             | Yes but the previous comment was about hardware failure.
        
       | theideaofcoffee wrote:
       | Color me not surprised. My few interactions with people there
       | just gave off the impression of them being in a bit over their
       | heads. I don't know how well that translated to their actual ops,
       | but it's difficult to not connect the two when they continue to
       | have major outage after major outage for a product that 'should'
       | be their customer's bedrock upon which they build everything
       | else.
        
       | xyst wrote:
       | Recurring pattern I notice is outages tend to occur the week of
       | major holidays in US.
       | 
       | - MS 365/Teams/Exchange had a blip in the morning
       | 
       | - Fly.io with complete outage
       | 
       | - then a handful of sites and services impacted due to those
       | outages
       | 
       | Usually advocate against "change freezes" but I think a change
       | freeze around major holidays makes sense. Give all teams a
       | recharge/pause/whatever.
       | 
       | Don't put too much pressure on the B-squads that were unfortunate
       | to draw the short stick.
        
         | aaomidi wrote:
         | What do "Freezes" mean? Like, do you stop renewing your
         | certificates? Do you stop taking in security updates for your
         | software?
         | 
         | Sure maybe "unnecessary" changes, but the line gets very gray
         | very fast.
        
           | vrosas wrote:
           | No unnecessary code deployments.
        
           | Spivak wrote:
           | It's not very grey, prod becomes as if you told everyone but
           | your ops team to go home and then sent your ops team on a
           | cruise with pagers. If it's not important enough to merit
           | interrupting their vacation you don't do it.
        
           | fragmede wrote:
           | Certs shouldn't still be done by hand that this point; if
           | another heartbleed comes out in the next 7 days then the risk
           | can be examined, escalated, and the CISO can overrule the
           | freeze. If it's a patch for remote root via Bluetooth drivers
           | on a server that has no Bluetooth hardware, it's gonna wait.
           | 
           | you're right that there's a grey line, but crossing that line
           | involves waking up several people and the on call person
           | makes a judgement call. if it's not important enough to wake
           | up several people over, then things stay frozen.
        
             | aaomidi wrote:
             | Right, that's basically what I mean. There are a lot of
             | automated changes happening in the background for services.
             | I guess the whole thing I'm saying is that not every
             | breakage is happening because of a code change.
        
             | kbolino wrote:
             | There's still a lot of situations where automatic
             | certificate enrollment and renewal is not possible. TLS is
             | not the only use of X.509 certificates, and even then,
             | public facing HTTPS is not the only use of TLS.
             | 
             | It needs to get better but it's not there yet.
        
         | vrosas wrote:
         | Then you just get devs rushing out changes before the freeze...
        
           | fragmede wrote:
           | and stampeding changes in after the thaw, also leading to
           | downtime. so it depends on the org, but doing a freeze is
           | still reasonable policy. Downtime on December 15th is less
           | expensive than on black Friday or cyber Monday for most
           | retailers, so it's just a business decision at that point.
        
           | subarctic wrote:
           | As a developer I don't see why I would rush out a change
           | before the freeze when I could just wait until after. Maybe a
           | stakeholder that really wants it would press for it to get
           | out but personally I'd rather wait until after so I'm not
           | fixing a bug during my holiday.
        
             | vrosas wrote:
             | Congrats on not working for the product team I work for
        
         | ploxiln wrote:
         | I think you can't avoid the fact that these holiday weeks are
         | different from regular weeks. If you "change freeze" then you
         | also freeze out the little fixes and perf tuning that usually
         | happens across these systems, because they're not "critical".
         | 
         | And then inevitably it turns out that there's a special
         | marketing/product push, with special pricing logic that needs
         | new code, and new UI widgets, causing a huge traffic/load
         | surge, and it needs to go out NOW during the freeze, and this
         | is revenue, so it is critical to the business leaders. Most of
         | eng, and all of infra, didn't know about it, because the
         | product team was cramming until the last minute, and it was
         | kinda secret. So it turns out you can freeze the high-quality
         | little fixes, but you can't really freeze the flaky brand-new
         | features ...
         | 
         | It's just a struggle, and I still advise to forget the freeze,
         | and try to be reasonable and not rush things (before, during,
         | or after the freeze).
        
           | ignoramous wrote:
           | Some shops conduct _game days_ as the freeze approaches.
           | 
           | https://wa.aws.amazon.com/wellarchitected/2020-07-02T19-33-2.
           | .. / https://archive.md/uaJlR
        
           | willsmith72 wrote:
           | Any big tech company with large peak periods disagrees with
           | you. It's absolutely worth freezing non-critical changes.
           | 
           | Urgent business change needs to go through? Sure, be prepared
           | to defend to a vp/exec why it needs to go in now.
           | 
           | Urgent security fix? Yep same vp will approve it.
           | 
           | It's a no-brainer to stop your typical changes which aren't
           | needed for a couple of weeks. By the way, it doesn't mean
           | your whole pipeline needs to stop. You can still have stuff
           | ready to go to prod or pre prod after the freeze
        
         | paxys wrote:
         | Bad code rarely causes outages at this scale. The culprit is
         | always configuration changes.
         | 
         | Sure you can try and reduce those as well during the holiday
         | season, but what if a certificate has to be renewed? What if a
         | critical security patch needs to be applied? What if a set of
         | servers need to be reprovisioned? What if a hard disk is
         | running out of space?
         | 
         | You cannot plan your way out of operational challenges,
         | regardless of what time of year it is.
        
           | bobsyourbuncle wrote:
           | This is a good observation. Do you have any resources I can
           | read up on to make this safer?
        
           | jimmyl02 wrote:
           | I think a good way of looking at it is risk. Is the change
           | (whether it is code or configuration, etc.) worth the risk it
           | brings on.
           | 
           | For example if it's a small feature then it probably makes
           | sense to wait and keep things stable. But, if it's something
           | that itself causes larger imminent danger like security
           | patches / hard disk space constraints, then it's worth taking
           | on the risk of change to mitigate the risk of not doing it.
           | 
           | At the end of the day no system is perfect and it ends up
           | being judgement calls but I think viewing it as a risk
           | tradeoff is helpful to understand.
        
           | oarsinsync wrote:
           | > Sure you can try and reduce those as well during the
           | holiday season, but what if a certificate has to be renewed?
           | What if a critical security patch needs to be applied? What
           | if a set of servers need to be reprovisioned? What if a hard
           | disk is running out of space?
           | 
           | Reading this, I see two routine operational issues, one
           | security issue and one hardware issue.
           | 
           | You can't plan you way around security issues or hardware
           | failures, but operational issues you both can and should plan
           | around. Holiday schedules like this are fixed points in time,
           | so there's absolutely no reason why you can't plan all
           | routine works to be completed either a week in advance, or a
           | week after, the holiday period.
           | 
           | Certificates don't need to be near the point of expiry to be
           | renewed. Capacity doesn't need to be at critical levels to be
           | expanded. Ultimately, this is a risk management question (as
           | a sibling has also commented). Is the organisation willing to
           | take on increased risk in exchange for deferring operational
           | expenses?
           | 
           | If the operational expense is inevitable (the certificate
           | will need renewing), that seems like an easy answer when it
           | comes to risk management over holidays.
           | 
           | If the operational expense is not inevitable (will we really
           | need to expand capacity?), it then becomes a game of
           | probabilities and financials - likelihood of expense being
           | incurred, amount of expense incurred if done ahead of time,
           | impact to business if something goes wrong during a holiday.
        
           | tptacek wrote:
           | We'll have a postmortem in next week's infra log update, but
           | here it was a particularly ambitious customer app pushing our
           | state sync service into a corner case; it's one we knew
           | about, but the solution (federating regional state sharing
           | clusters rather than running one globally) is taking time to
           | roll out.
        
         | cess11 wrote:
         | Blip? 365 has an ongoing incident since yesterday morning,
         | european timezone. The reason I know is because I use their
         | compliance tools to secure information in a rather large
         | bankruptcy.
        
       | jart wrote:
       | fly.io publishes their post-mortems here: https://fly.io/infra-
       | log/
       | 
       | The last post-mortem they wrote is very interesting and full of
       | details. Basically back in 2016 the heart or keystone component
       | of fly.io production infrastructure was called consul, which is a
       | highly secure TLS server that tracks shared state and it requires
       | that both the server certificate and the client certificate be
       | authenticated. Since it was centralized, it had scaling issues,
       | so fly.io wrote a replacement for it in 2020 called corrosion,
       | and quickly forgot about consul, but didn't have the heart to
       | kill it. Then in October 2024 consul's root key signing key
       | expires, which brought down all connectivity, and since it uses
       | bidirectional authentication, they couldn't bring it back online
       | until they deployed new SSL certificates to every machine in
       | their fleet. Somehow they did this in half an hour, but the chain
       | of dominoes had already been set in motion to reveal other
       | weaknesses in their infrastructure that they could eliminate.
       | There was this other internal service whose own independent set
       | of TLS keys had also expired long ago, but they didn't notice
       | until they tried rebooting it as part of the consul rekey, since
       | doing so severed the TCP connections it had established way back
       | when its certificate was valid. Plus the whole time this is
       | happening, their logging tools are DDOSing their network
       | provider. It took some real heroes to save the company and all
       | their customers too when that many things explode at once.
        
         | ignoramous wrote:
         | On that Consul outage, Fly Infra concludes, "The moral of the
         | story is, no more half-measures."
         | 
         | On their careers page [1], the Fly team goes, "We're not big
         | believers in tech debt."
         | 
         | As an outsider, reads like a cacophony of contradictions?
         | 
         | [1] https://fly.io/docs/hiring/working/#we-re-ruthless-about-
         | doi...
        
           | jart wrote:
           | No one actually lives up to their principles, but it's still
           | important that we have them.
           | 
           | If you actually do live up to yours, then you need to adopt
           | better principles.
        
             | whilenot-dev wrote:
             | Any principle in itself isn't without critique, agree, but
             | it's still the choice being made to pick this specific
             | principle that tells the whole story. There are so many
             | principles to pick from and the tech dept pick follows up
             | with a _" We have a 3-month "no refactoring" rule for new
             | hires. This isn't everyone's preferred work style! We try
             | to be up front about stuff."_, which sounds a bit like an
             | additional _perform or else..._ principle that just delays
             | ownership of the stuff you 're supposed to work with. In
             | the best case that sounds like naiive optimism and in the
             | worst case that's gross negligence... neither one speaks
             | "engineering" to me.
        
           | Aeolun wrote:
           | Two contradictory statements do not read like a 'cacophony'
           | of anything to me xD I think you need a whole lot more than
           | two to do that word justice.
        
             | JimDabell wrote:
             | _"No more half-measures"_ and _"We're not big believers in
             | tech debt"_ aren't even contradictory statements, let alone
             | a cacophony of them.
        
             | mattgreenrocks wrote:
             | The comment section doing what it does best!
        
               | ignoramous wrote:
               | For brevity I chose to put up only the conclusion from a
               | postmortem (of which I've read plenty by now) and another
               | point from their otherwise comparatively shorter careers
               | page, which imo capture the inherent tension between
               | building out fast & building out right. This is not
               | something I've started complaining about today or
               | yesterday. I've used Fly in prod for 4 years and spilled
               | much ink on this topic on their forums already. Even if I
               | critique, I remain optimistic about Fly despite the
               | seemingly endless list of failure modes building such
               | complex systems entail: https://community.fly.io/t/fly-
               | down/10224/15
               | 
               | (personally speaking, I'm humble enough because I can
               | hardly build a toy side-project right!)
        
           | bdcravens wrote:
           | "full measures" aren't the same thing as tech debt.
           | Complexity isn't even the same thing as tech debt.
        
       | gigapotential wrote:
       | HUGOPS
       | 
       | Everything is going to be 200 OK!
        
       | xyst wrote:
       | I can't even login to my old account. Password reset is timing
       | out yet still receive password reset e-mail. Password reset link
       | broken, with 500 status code.
        
       | neya wrote:
       | Personal experience between Fly.io and Railway.com - Railway wins
       | for me hands down. I have used both and the Railways support is
       | stellar too, in comparison. Fly.io never responded to my query
       | about data deletion till date. Despite emailing on their support
       | email.
       | 
       | I have had my Railway app online till date without any major
       | downtimes too. I recommend anyone looking for a decent
       | replacement to try them.
        
         | punkpeye wrote:
         | How does it compare in terms of price?
        
           | justjake wrote:
           | We actually only charge you for what you use. As a result
           | people often see 30%+ savings when moving stuff over from
           | other providers (especially Heroku)
           | 
           | https://railway.com/pricing
        
         | andai wrote:
         | I've used Railway control panel maybe a total of 10 times in my
         | life and half the time it was having weird issues. Control
         | panel UI not loading or not working, actions failing, deploys
         | randomly failing... I love the idea but in practice it's not
         | something I'd want to use for anything serious.
        
           | justjake wrote:
           | While we've always aimed for great reliability on compute,
           | the dashboard reliability wasn't very good at the start of
           | the year.
           | 
           | We ack'd this and then pretty heavily to making it stellar,
           | so if you're still having issues please let us know (that
           | should not be the case)
           | 
           | Best, Jake from Railway
        
         | ignoramous wrote:
         | Fly builds on their own hardware. Is Railway doing the same? If
         | not, that'd explain some of why Railway has relatively less
         | number of outages (they're engineering fewer things).
         | 
         | I understand that end-users want reliability (and Fly gets a
         | bad rep despite pretty _significant_ investment on this front
         | in the past 2 years), but such outages aren 't exclusive to one
         | provider & not the other. Building cloud infra is no one's
         | definition of easy.
        
           | justjake wrote:
           | We run on Google Cloud, AWS, and our own hardware now since
           | middle of this year :)
           | 
           | https://railway.com/changelog/2024-09-20-railway-metal-
           | beta#...
        
       | punkpeye wrote:
       | Contrary to the title of the post, Fly.io API remains
       | inaccessible. Meaning, users still cannot access
       | deploys/databases, etc.
       | 
       | For accurate updates, follow https://community.fly.io/t/fly-io-
       | site-is-currently-inaccess...
        
       | cryptos wrote:
       | Fly.io seems to be a bit of a mixed bag:
       | 
       | https://news.ycombinator.com/item?id=41917436
       | 
       | https://news.ycombinator.com/item?id=35044516
       | 
       | https://news.ycombinator.com/item?id=34742946
       | 
       | https://news.ycombinator.com/item?id=34229751
       | 
       | If a cloud platform doesn't really provide reliability, I'd say
       | it's probably not worth it. You could better just rent a
       | (virtual) server and save the cloud tax.
        
         | qeternity wrote:
         | I don't really understand the value prop of fly.io. They seem
         | to have an impressive engineering team despite the outages, but
         | is edge compute really something that 99.9% of devs need? There
         | are tons of large companies that operate out of a single AWS
         | region and those services are used by millions around the
         | globe. It just strikes me as something that enables premature
         | optimization right out of the box.
        
           | k__ wrote:
           | It's basically the new Heroku with less lock-in, because it
           | works with Docker.
           | 
           | You get edge computing, autoscaling, and load balancing
           | without additional configuration.
           | 
           | Not as flexible as AWS, but also much easier to setup and
           | maintain.
           | 
           | But the reliability issues suck now and then.
        
             | nikodotio wrote:
             | This is precisely it. The ease of deploy, https domain
             | configuration, scaling.
             | 
             | Additionally, having machines that turn off when not in use
             | is easy to configure, which I never managed on AWS.
        
               | ignoramous wrote:
               | > _which I never managed on AWS_
               | 
               | I haven't looked at it recently, but _App Runner_ could
               | do a few of Fly.io esque things (but slightly more
               | expensive): https://aws.amazon.com/apprunner/
        
             | ignoramous wrote:
             | > _Not as flexible as AWS_
             | 
             | Today, Fly.io is more or less in the same market as
             | _Lightsail_ , not AWS. And when you compare it to
             | _Lightsail_ , it blows it away.
        
               | mtlynch wrote:
               | _And when you compare it to Lightsail, it blows it away._
               | 
               | This is a bit of a confusing sentence because there are
               | so many pronouns. Do all of the "it"s refer to Fly.io?
        
               | dijksterhuis wrote:
               | > And when you compare [fly.io] to Lightsail, [fly.io]
               | blows [Lightsail] away.
        
               | watermelon0 wrote:
               | Did you count reliability into your assesment here? I'm
               | reading about Fly.io outages multiple times a year,
               | whereas Lightsail seem to be as stable as AWS EC2.
        
             | gurgunday wrote:
             | DigitalOcean has been doing this for years, and their value
             | proposition is unmatched IMO
             | 
             | For $5 you get:
             | 
             | Latest gen CPUs and RAM
             | 
             | HTTPS
             | 
             | DDoS protection
             | 
             | Cloudflare CDN
             | 
             | Autoscale
             | 
             | Competent support
             | 
             | I'd say the best part is the predictable monthly prices
             | 
             | And while most people probably don't care, they are an
             | established public company, so there is more chance they
             | will exist in 10 years
        
               | dijksterhuis wrote:
               | are global r/w token permissions still a thing, or did
               | the token scopes thing finally come out of beta?
               | 
               | also, my experience with support was not the same as
               | yours. they were utterly useless for the most part.
               | 
               | for a personal web dev (or similar) project, like, i
               | agree, they've got good value.
               | 
               | but having worked in a small biz where DO was what they
               | built everything on -- no. bad idea. spend more. use aws
               | (graviton ec2 instances)/azure.
        
               | fragmede wrote:
               | the $5 droplet is underpowered and can't run anything
               | substantial. it's just the price to get you in the door.
        
               | yabones wrote:
               | It doesn't really need to run anything "substantial"
               | though. Running some janky wordpress site with some
               | scabbed-on ecommerce customizations is like 50% of the
               | internet.
        
               | infecto wrote:
               | a 1vCPU 512mb instance is plenty for most base cases.
               | Maybe you need one additional machine to act as a
               | background worker. I am sure there are some noisy
               | neighbors but to say its underpowered is silly.
        
               | fragmede wrote:
               | I'm calling it underpowered because the $5 one had
               | trouble running my custom ssh daemon. ssh! the
               | cryptography for that shouldn't chug down the server I'm
               | renting from them. a bigger instance from them isn't
               | having the same problems.
        
               | pajeetz wrote:
               | you wouldn't be able to run anything substantial with
               | that kind of budget
               | 
               | but GO and pocketbase is on record for supporting 10k
               | concurrent requests per second on low powered VPS
        
             | infecto wrote:
             | I have asked this multiple times but is anyone really using
             | edge compute and getting value out of it? I am certain
             | there are cases but I have not seen any of them written up
             | before.
        
               | pier25 wrote:
               | We have an embeddable audio player served globally with
               | very low latency. This wouldn't be possible without edge
               | compute/data.
        
               | sofixa wrote:
               | Depends on what you mean by edge compute, but you
               | probably are.
               | 
               | 5G towers are a ton of compute on the edge to secure and
               | protect the traffic passing through them.
               | 
               | Or if by edge you mean having stuff close to your
               | consumers, every non trivial operation does that.
        
           | victorbjorklund wrote:
           | If half your customers are in new your and half in sidney it
           | makes you app faster if you run it in both places.
           | 
           | There is a lot of things we do for our users that we don't
           | need (no one "needs" SPA etc). But if it is easy to make your
           | app faster for your users, why not?
        
             | victorbjorklund wrote:
             | And it is easier than AWS to deploy.
        
           | jrockway wrote:
           | I would take edge compute if it's free and easy. That's
           | fly.io's value prop.
           | 
           | In a world where much web browsing starts with ACK SYN ACK,
           | it is nice if the server is close to you.
        
           | brainzap wrote:
           | I typed fly launch, fly deploy and my node.js project was
           | deployed. So I guess hobby projects?
        
           | austinpena wrote:
           | I have an SSR Astro project. Using Fly makes my project fast.
           | 
           | For dynamic data I use SWR.
           | 
           | I could use Cloudflare workers but it doesn't play so nice
           | with Astro.
           | 
           | I also have a "form submission service" where I receive a
           | Post and send an email.
           | 
           | I need maximum uptime to avoid revenue loss.
           | 
           | It's a go service so I deploy ~6 machines across the US to
           | ensure I don't drop any requests.
           | 
           | I haven't had downtime in years.
        
           | infecto wrote:
           | I am going to go out on a limb and say there is no real value
           | prop to fly.io. I could completely be wrong but it always
           | feels like the modern MongoDB. Everyone wants to use it but I
           | am not sure they are extracting value from it and instead its
           | a shiny toy that is fun to build from.
        
         | huijzer wrote:
         | For experiments and hobby projects the value proposition is
         | amazing. Where else can you spin up an independent instance for
         | $1.94 per month?*
         | 
         | *Note this is for an instance with only 256MB RAM
         | (https://fly.io/docs/about/pricing/), but it's definitely
         | possible to run non-trivial projects on that. Rust-based web
         | servers like Rocket require only about 10MB RAM. Basic PHP
         | servers should also fit from what I can find.
        
           | hobo_mark wrote:
           | One such microVM per month used to be within the free monthly
           | allowance, is that not the case anymore?
        
           | oefrha wrote:
           | There are plenty of better deals as long as you don't limit
           | yourself to big clouds and clouds with startup-esque landing
           | pages frequently posted to HN. LowEndTalk may be the most
           | well-known place for finding such deals.
           | 
           | (Not saying the typical cheap VPS on LowEndTalk has
           | comparable PaaS features. Only responding to parent's use
           | case of a single cheap instance.)
        
           | belter wrote:
           | Sounds like a Lambda function....
        
           | input_sh wrote:
           | Nowhere? Because that's a ridiculously low amount of RAM to
           | offer even in your cheapest offerings?
           | 
           | You can easily get 4 GB of RAM for $5 from the likes of
           | Hetzner or Hostinger, so that's 16x more RAM for 2.5x the
           | price. One relatively unknown provider I have used in the
           | past offers 2 GB of RAM for EUR3.6/month (if paid monthly,
           | EUR3 if anually), so 8x more RAM for 1.5-2x the price. I'm
           | sure I could find something even cheaper, but I'm just
           | looking at providers I have personally used.
           | 
           | BTW that dropdown seems to be sorted cheapest > most
           | expensive. If you go to the bottom of the list the price for
           | that same VPS doubles.
        
             | KomoD wrote:
             | > Nowhere? Because that's a ridiculously low amount of RAM
             | to offer even in your cheapest offerings?
             | 
             | There's definitely places that offer it... also 512m
             | 
             | I know because I've personally bought such plans and that
             | was $5-10/yr because I didn't need dedicated ipv4.
        
           | kelvinjps10 wrote:
           | I'm getting 1$ for a 2gb ram vps in ovh for the first year
        
           | TiredOfLife wrote:
           | Oracle free is one 4 core 24gb ram vps + 2 dualcore amd vps.
        
             | treesknees wrote:
             | And actually, it's the resources that are free (CPU,
             | memory, network) and you're allowed to split them up into
             | multiple VMs if you want to.
             | 
             | One of my VMs had an uptime of more than 1050 days before
             | the infrastructure rebooted it, so in terms of availability
             | they've certainly surprised me.
             | 
             | The only downside I've come across with Oracle Free is that
             | the 'best' regions are typically full. I ended up
             | provisioning my free VMs in another region/country and it
             | works fine.
             | 
             | I suppose another downside (if you want to view it this
             | way) is they will delete idle unused free VMs after a
             | certain time period. You have to add a credit card to your
             | account to "upgrade" your account and run free resource
             | indefinitely. While you're not charged for anything, it
             | makes me nervous forking over a CC number to Oracle.
        
           | throwaway63467 wrote:
           | Best business model in the world, buy stuff in big bags, put
           | it in smaller ones, sell at a multiple of the original price.
           | 
           | Fly is mostly (to my knowledge) reselling Netactuate and OVH
           | servers, their main innovation is the developer experience on
           | top, using Docker on a MicroVM based approach. Of course not
           | only that, but I think it's their main differentiator.
           | 
           | Haven't used that in a while but Scaleway offered
           | ridiculously cheap dedicated ARM hardware close to these
           | price points, not sure if they still do.
        
           | pc86 wrote:
           | Maybe if you're limiting yourself to AWS-wrapper cloud
           | companies. What good is a $2/mo cloud instance if it's down
           | multiple times a month?
           | 
           | Just get a $5/mo VPS instead if you're really concerned about
           | a few dollars a month.
        
             | cxr wrote:
             | > What good is a $2/mo cloud instance if it's down multiple
             | times a month?
             | 
             | The perverse irony is that the most common reason cited by
             | cloud providers for not letting people set a hard cap on
             | charges is an insistence that surely the last thing you
             | want in the world is for your service to be taken offline,
             | even if it does means avoiding a $1k-$100k bill at the end
             | of the month.
        
           | hansvm wrote:
           | I used to use Racknerd for that sort of thing, and the costs
           | were around there -- maybe $1.90/mo for a 512MB instance. It
           | was easy to squeeze several hobby projects onto the machine.
        
           | pajeetz wrote:
           | i recommend lowendtalk what fly.io doing is running colocated
           | baremetal servers and using firecracker to overcommit
           | (probably via memory ballooning and other disk compression on
           | demand)
           | 
           | if you are going to haggle over $2/month then you are better
           | off just connecting your raspberry pi with
           | wireguard/cloudflare tunnel on a residential connection
        
         | akoculu wrote:
         | Also:
         | 
         | https://news.ycombinator.com/item?id=36808296
        
         | zackify wrote:
         | The reliability is very very bad. It was really insane that 2
         | times in the past few months the main dashboard was down as I'm
         | demoing something. Not to mention the deploy outages and almost
         | daily some random thing was unavailable or delayed.
         | 
         | I had to leave a few months ago after the price raises and how
         | many times my boss saw some issue in the project I had with
         | them.
         | 
         | They also deprecated and removed their sqlite backup service.
         | Back to GCP and not worrying about so many outages now.
        
           | pc86 wrote:
           | Now just to worry about GCP getting shut down with a few
           | days' notice. /s
           | 
           | But in all seriousness the gall to raise prices before
           | actually fixing the reliability problems is pretty shocking.
           | I understand it's a bit of a chicken-and-egg thing where you
           | maybe are tight on resources but there's no scenario where
           | it's acceptable to have a product with these kinds of
           | problems and then raise prices on existing customers who are
           | putting up with it.
        
             | encom wrote:
             | No /s is needed. Relying on any Google product long term is
             | crazy.
        
               | sofixa wrote:
               | Google's b2b products are relatively stable (relative to
               | their b2c free services). You generally get somewhere
               | like a year of notice if they shut it down.
        
           | pajeetz wrote:
           | theres just so many anecdotes/nightmare stories from people
           | using fly.io here much more than the ones linked by GP
           | 
           | expect to see more of these "post-mortem apologies" from
           | fly.io in the future because it won't be the last
        
             | tptacek wrote:
             | You're right. It won't. Nobody could claim otherwise.
        
         | pajeetz wrote:
         | fly.io has a very bad reputation for reliability there doesn't
         | seem to be any damage control beyond hackernews and even here
         | the consensus seems to be "dont run anything mission critical
         | on fly.io or expect data redundancy"
         | 
         | in fact, you can almost get the same thing fly.io does by
         | running firecracker on your own bare metal servers and cheaper
         | too.
         | 
         | I'm afraid the public sentiment towards fly.io has been tainted
         | for good (I can't count how many times they apologized now).
        
           | tptacek wrote:
           | This is the second place you've offered this sentiment. Was
           | it your expectation that we were going to hit some point,
           | sometime in the near future, where we weren't going to have
           | deployment-blocking outages? I'd like to better understand
           | your premise. If it's "I can get more reliability by
           | deploying on a hyperscaler cloud", who ever told you
           | otherwise?
        
         | ARCarr wrote:
         | I tried out Fly.io and deployed a little test app. I couldn't
         | even access the app, because they put it onto a server that was
         | under "emergency maintenance" and had been that way for twelve
         | days.
        
       | mattbee wrote:
       | It feels like fly is trying to repeat a growth model that worked
       | 20 years ago: throw interesting toys at engineers, then wait for
       | engineers to recommend their services as they move on in their
       | careers.
       | 
       | Part of that playbook is the old Move Fast & Break Things. That
       | can still be the right call for young projects, but it has two
       | big problems:
       | 
       | 1) AWS successfully moved themselves into the position of "safe"
       | hosting choice, so it's much rarer for engineers to have
       | influence on something that's seen by money men as a humdrum,
       | solved problem;
       | 
       | 2) engineers are not the internal influencers they used to be,
       | being laid off left and right the last few years, and without
       | time for hobby projects.
       | 
       | (maybe also 3) it's much harder to build a useful free tier on a
       | hosting service, which used to be a necessary marketing expense
       | to reach those engineers).
       | 
       | So idk, I feel like the bar is just higher for hosting stability
       | than it used to be, and novelty is a much harder sell, even here.
       | Or rather: if you're going to brag about reinventing so many
       | wheels, they need to not to come off the cart as often.
        
       | travisgriggs wrote:
       | Don't a bunch of Elixir/Erlang guys work at fly.io? It's weird to
       | me that that hallmark of reliability is associated with something
       | that the public sees as unreliable. What gives with that
       | association?
        
       | Huppie wrote:
       | It's interesting to see this discussion about fly.io's
       | reliability on a day that (after _over three days of downtime_ )
       | Microsoft Azure finally decided the update of Azure Static Web
       | Apps they deployed last Friday is indeed broken for customers
       | using specific authentication settings...
       | 
       | ...with not a single status update from Microsoft in sight.
        
       ___________________________________________________________________
       (page generated 2024-11-26 23:01 UTC)