hngopher.com

       [HN Gopher] Fly.io has GPUs now
       ___________________________________________________________________
        
       Fly.io has GPUs now
        
       Author : andes314
       Score  : 561 points
       Date   : 2024-02-13 22:06 UTC (1 days ago)
        
 (HTM) web link (fly.io)
 (TXT) w3m dump (fly.io)
        
       | iambateman wrote:
       | It's cool to see that they can handle scaling down to zero.
       | Especially for working on experimental sites that don't have the
       | users to justify even modest server costs.
       | 
       | I would love an example on how much time a request charges.
       | Obviously it will vary, but is it 2 seconds or "minimum 60
       | seconds per spin up"?
        
         | mrkurt wrote:
         | We charge from the time you boot a machine until it stops.
         | There's no enforced minimum, but in general it's difficult to
         | get much out of a machine in less than 5 seconds. For GPU
         | machines, depending on data size for whatever is going into GPU
         | memory, it could need 30s of runtime to be useful.
        
           | andes314 wrote:
           | Do you offer some sort of keep_warm parameter that removes
           | this latency (for a greater cost)?
        
             | mrkurt wrote:
             | You control machine lifecycles. To scale down, you just set
             | the appropriate restart policy, then exit(0).
             | 
             | You can also opt to let our proxy stop machines for you,
             | but the most granular option is to just do it in code.
             | 
             | So yes, kind of. You just wait before you exit.
        
               | Aeolun wrote:
               | So just to confirm, for these workloads, it'd start a
               | machine when the request comes in, and then shut it down
               | immediately after the request is finished (with some
               | 30-60s in between I suppose)? Is there some way to keep
               | it up if additional requests are in the queue?
               | 
               | Edit: Found my answer elsewhere (yes).
        
           | sodality2 wrote:
           | How long does model loading take? Loading 19GB into a machine
           | can't be instantaneous (especially if the model is a network
           | share).
        
             | loloquwowndueo wrote:
             | There are no "network shares". The typical way to store
             | model data would be in a volume, which is basically local
             | nvme storage.
        
               | xena wrote:
               | Wellllllll, technically there is LSVD which would let you
               | store model weights in S3.
               | 
               | God that's a horrible idea. Blog time!
        
             | carl_dr wrote:
             | It takes about 7s to load a 9GB model on Beam (they claim,
             | and tested as about right), I imagine it is similar with
             | Fly - I've not had any performance issues with Fly.
        
           | bbkane wrote:
           | I see the whisper transcription article. Is there an easy way
           | to limit it to, say $100 worth of transcription a month and
           | then stop till next month? I want to transcribe a bunch of
           | speeches but I want to spread the cost over time
        
             | IanCal wrote:
             | Probably available elsewhere but you could setup an account
             | with a monthly spend limit with openai and use their API
             | until you hit errors.
             | 
             | $100/mo is about 10 days of speeches a month, how much data
             | do you have?
             | 
             | edit - if the pricing seems reasonable, you can just limit
             | how many minutes you send. AssemblyAI is another provider
             | at about the same cost.
        
               | bbkane wrote:
               | Thanks! Maybe 50hr of speeches. It's a hobby idea so I'll
               | check these out when I get some time
        
               | xena wrote:
               | Email xe@fly.io, I'm intrigued.
        
               | IanCal wrote:
               | I can probably just run these through whisper locally for
               | you if you want and are able to share. Email is in my bio
               | (ignore the pricing, I'm obv not charging)
        
       | holoduke wrote:
       | Anybody has experience with the performance. First glance is that
       | they are quite expensive. Compared to for example Hetzner (cpu
       | machines)
        
         | impulser_ wrote:
         | I'm not sure about others, but you can get A100s with 90gb of
         | RAM from DigitalOcean for $1.15 an hour. So about 1/3 the
         | price.
         | 
         | You can even get H100s for cheaper than these prices at $2.24
         | an hour.
         | 
         | So these do seem a bit expensive, but this might be because
         | there is high demand for them from customers and they don't
         | have the supply.
        
           | skrtskrt wrote:
           | getting supply is super hard right now, DigitalOcean just
           | straight up bought Paperspace to get access to those GPUs.
           | 
           | The whole reason Coreweave is on a fat growth trajectory
           | right now is they used their VC money to buy a ton of GPUs at
           | the right time
        
           | treesciencebot wrote:
           | Just to correct the record, both $1.15 per A100 and $2.24 per
           | H100 require a 3-year-commitment. On-demand prices are 2.5X
           | that.
        
             | Aeolun wrote:
             | > _$2.24 /hour pricing is for a 3-year commitment. On-
             | demand pricing for H100 is $5.95/hour under our special
             | promo price.\$1.15/hour pricing is for a 3-year
             | commitment._
             | 
             | Wow, that's some spectacularly false advertising.
        
           | dathinab wrote:
           | Company I work for had multiple times problems of not being
           | able to allocate any gpus from some larger cloud providers
           | (with the region restrictions we have, which still include
           | all of EU as regions).
           | 
           | (I'm not sure which of them it was, we are currently
           | evaluating multiple providers and I'm not really involved in
           | that process.)
        
       | andes314 wrote:
       | Has anyone who has used Beam.Cloud compare that service to this
       | one?
        
       | Havoc wrote:
       | How fast is the spin up/down on this scale to zero? If it is fast
       | this could be pretty interesting
        
         | amanda99 wrote:
         | I think the bigger question is how long it takes to load any
         | meaningful model onto the GPU.
        
           | fideloper wrote:
           | that's exactly right.
           | 
           | gpu-friendly base images tend to be larger (1-3g+) so that
           | takes time (30s - 2m range) to create a new Machine (vm).
           | 
           | Then there's "spin up time" of your software - downloading
           | model files adds as long as it takes to download GB of model
           | files.
           | 
           | Models (and pip dependencies!) can generally be "cached" if
           | you (re)use volumes.
           | 
           | Attaching volumes to gpu machines dynamically created via the
           | API takes a bit of management on your end (in that you'd need
           | to keep track of your volumes, what region they're in, and
           | what to do if you need more volumes than you have)
        
             | dathinab wrote:
             | I know it's not common in research and makes often little
             | sense there.
             | 
             | But at least in theory for deployments you should generate
             | deployment images.
             | 
             | I.e. no pip included in the image(!), all dependencies
             | preloaded, unnecessary parts stripped, etc.
             | 
             | Models likely might also be bundled, but not always.
             | 
             | Still large images, but also depending on what they are for
             | the same image might be reused often so it can be cached by
             | the provider to some degree.
        
       | nextworddev wrote:
       | Somehow cheaper than AWS?
        
         | reactordev wrote:
         | AWS isn't the cheapest so how is that a surprise? They are a
         | business and know how to turn the right knobs to increase cash
         | flow. GPUs for AI is one major knob right now.
        
         | CGamesPlay wrote:
         | AWS is one of the most expensive infrastructure providers out
         | there (especially anything beyond the "basic" services like
         | EC2). And even though AWS still has some globally-notable
         | uptime issues, "nobody ever got fired for picking AWS".
        
           | dathinab wrote:
           | I mean from hearsay of people which had to work with AWS &
           | Google Cloud & Microsoft Azure it seems to me that the other
           | two are in practice worse to a point they would always pick
           | AWS over them even through they hate the AWS UX.
           | 
           | And if it's the best of the big 3 providers, then it can't
           | that bad, right ..... right? /s
        
         | andersa wrote:
         | It would be absurd if it wasn't.
        
         | seabrookmx wrote:
         | They're a "real" cloud provider (with their own hardware) and
         | not a reseller like Vercel and Netlify. So this isn't _that_
         | surprising. AWS economies of scale do allow them to make
         | certain services cheap but only if they choose. A lot of time
         | they choose to make money!
        
         | patmorgan23 wrote:
         | They run their own data centers.
        
           | tptacek wrote:
           | We run our own hardware, but not our own data centers.
        
             | huydotnet wrote:
             | Is there any write up on how Fly.io run your
             | infrastructure? The "not data center" fact makes me
             | interested a little bit.
        
               | tptacek wrote:
               | We should write that up! We lease space in data centers
               | like Equinix.
        
               | rxyz wrote:
               | It's just renting space in a big server room. Every mid-
               | to-large city has companies providing that kind of
               | service
        
         | Sohcahtoa82 wrote:
         | Genuine question...why are you surprised?
        
         | dathinab wrote:
         | As a person working in a startup which used AWS for a while:
         | 
         | *AWS is expensive, always, except if magic*
         | 
         | Where magic means very clever optimizations (often deeply
         | affecting your project architecture/code design) which require
         | the right amount of knowledge/insights into a very confusing
         | UI/UX and enough time evaluate all aspects. I.e. it might
         | simple not be viable for startups and is expensive in it's own
         | way.
         | 
         | Through most cheaper alternatives have their own huge bag of
         | issues.
         | 
         | Most important fly.io is their own cloud provider not just a
         | more easy way to use AWS. I mean while I don't know if they
         | have their own server centers in every region they do have
         | their own servers.
        
       | nakovet wrote:
       | About Fly but not about the GPU announcement, I wish they had a
       | S3 replacement, they suggest a GNU Affero project that is a
       | dealbreaker for any business, needing to leave Fly to store user
       | assets was a dealbreaker for us to use Fly on our next project,
       | sad cause I love the simplicity, the value for money, the built
       | in VPN.
        
         | benatkin wrote:
         | This looks promising https://github.com/seaweedfs/seaweedfs
        
           | candiddevmike wrote:
           | Seaweed requires a separate coordination setup which may
           | simplify the architecture but complicates the deployment.
        
         | JoshTriplett wrote:
         | > I wish they had a S3 replacement, they suggest a GNU Affero
         | project that is a dealbreaker for any business
         | 
         | AGPL does not mean you have to share everything you've built
         | atop a service, just everything you've linked to it and any
         | changes you've made to it. If you're accessing an S3-like
         | service using only an HTTPS API, that isn't going to make your
         | code subject to the AGPL.
        
           | bradfitz wrote:
           | Regardless, some companies have a blanket thou-shalt-not-use-
           | AGPL-anything policy.
        
             | trollian wrote:
             | Lawyercats are the worst cats.
        
             | hiharryhere wrote:
             | Some companies Including Google.
             | 
             | I've sold Enterprise Saas to Google and we had to attest we
             | have no AGPL code servicing them. This is for a CRM-like
             | app.
        
             | anonzzzies wrote:
             | Yep, our lawyers say to not use and we have to check
             | components and libs we use too. People are really shooting
             | themselves in the foot with that license.
        
               | aragilar wrote:
               | You assume that people want you to use their project. For
               | MinIO, the AGPL seems to be a way to get people into
               | their ecosystem so they can sell exceptions. Others might
               | want you to contribute code back.
        
               | anonzzzies wrote:
               | I have no problem with contributing back: we do that all
               | the time on MIT / BSD projects even if we don't have to.
               | AGPL just restricts the use-cases and (apparently) there
               | is limited legal precedence in my region to see if we
               | don't have to give away everything that's not even
               | related but uses it, so the lawyers (I am not a lawyer,
               | so I cannot provide more details) say to avoid it
               | completely. Just to be safe. And I am sure it hurts a lot
               | of projects... There are many modern projects that are
               | the same thing, but they don't share code because the
               | code is agpl.
        
               | corobo wrote:
               | Sounds more like the license is doing its job as
               | intended, and businesses that can afford lawyers but not
               | bespoke licenses are shooting themselves in the foot with
               | that policy
        
           | RcouF1uZ4gsC wrote:
           | > AGPL does not mean you have to share everything you've
           | built atop a service, just everything you've linked to it and
           | any changes you've made to it. If you're accessing an S3-like
           | service using only an HTTPS API, that isn't going to make
           | your code subject to the AGPL.
           | 
           | I am not so sure about that. Otherwise, you could trivially
           | get around the AGPL by using https services to launder your
           | proprietary changes.
           | 
           | There is not enough caselaw to say how a case that used only
           | http services provided by AGPL to run a proprietary service
           | would turn out, and it is not worth betting your business on
           | it.
        
             | xcdzvyn wrote:
             | > you could trivially get around the AGPL by using https
             | services to launder your proprietary changes.
             | 
             | This is a very interesting proposition that makes me
             | reconsider my opinion of AGPL.
        
               | mbreese wrote:
               | Anything "clever" in a legal sense is a red flag for
               | me... Computer people tend to think of the law as a black
               | and white set of rules, but it is and it isn't. It's
               | interpreted by people and "one clever trick" doesn't
               | sound like something I'd put a lot of faith in. Intent
               | can matter a lot.
               | 
               | (Regardless of how you see the AGPL)
        
               | internetter wrote:
               | > Computer people tend to think of the law as a black and
               | white set of rules
               | 
               | I've never seen someone put this into words, but it makes
               | a lot of sense. I mean, idealistically computers are
               | deterministic, whereas the law is not (by design), yet
               | there exists many parallels between the two. For
               | instance, the lawbook has strong parallels to the
               | documentation for software. So it makes sense why
               | programmers might assume the law is also mostly
               | deterministic, even if this is false
        
               | ozr wrote:
               | I'm an engineer with a passing interest in the law. I've
               | frequently had to explain to otherwise smart and capable
               | people that their _one weird trick_ will just get them a
               | contempt charge.
        
               | Dylan16807 wrote:
               | On the other hand the AGPL itself is trying to be one
               | clever trick in the first place, so maybe it's
               | appropriate here.
        
               | xcdzvyn wrote:
               | Even if that wasn't directly targeted at me, I'll
               | elaborate on my concern:
               | 
               | That it's possible to interpret the AGPL both ways (that
               | the prior hack is legal, and that it is not), and that
               | the project author could very well believe either one,
               | suggests to me that the AGPL's terms aren't rigidly
               | binding, but ultimately a kind of "don't do what the
               | author thinks the license says, whatever that is".
        
             | c0balt wrote:
             | > > AGPL does not mean you have to share everything you've
             | built atop a service, just everything you've linked to it
             | and any changes you've made to it. If you're accessing an
             | S3-like service using only an HTTPS API, that isn't going
             | to make your code subject to the AGPL.
             | 
             | Correct, this is a known caveat, that's also covered a bit
             | more in the GNU article about the AGPL when discussing
             | Software as a Service Substitutes, ref:
             | https://www.gnu.org/licenses/why-affero-gpl.html.en
        
         | benhoyt wrote:
         | They're about to get an S3 replacement, called Tigris (it's a
         | separate company but integrated into flyctl and runs on Fly.io
         | infra): https://benhoyt.com/writings/flyio-and-tigris/
        
         | benbjohnson wrote:
         | We have an region-aware S3 replacement that's in beta right
         | now: https://community.fly.io/t/global-caching-object-storage-
         | on-...
        
         | simonw wrote:
         | Sounds like you might be interested in the Tigris preview:
         | 
         | - https://www.tigrisdata.com/
         | 
         | - https://benhoyt.com/writings/flyio-and-tigris/ (discussed
         | here: https://news.ycombinator.com/item?id=39360870)
         | 
         | - https://fly.io/docs/reference/tigris/
        
         | martylamb wrote:
         | Funny you should mention that:
         | https://news.ycombinator.com/item?id=39360870
        
         | tptacek wrote:
         | Give us a minute.
        
         | itake wrote:
         | The dealbreaker should be their uptime and support. They
         | deleted my database and have many uptime issues.
        
       | riquito wrote:
       | Is there any configuration to keep alive the machine for X
       | seconds after a request has been served, instead of scaling down
       | to zero immediately? I couldn't find it skimming the docs
        
         | mrkurt wrote:
         | Machines are both dumber and more powerful than you'd think.
         | Scaling down means just exit(0) if you have the right restart
         | policy set. So you can implement any kind of keep-warm logic
         | you want.
        
           | Aeolun wrote:
           | Oh! I hadn't thought if it like that. That makes sense.
        
         | kylemclaren wrote:
         | you might also be looking for `kill_signal` and `kill_timeout`:
         | https://fly.io/docs/reference/configuration/#runtime-options
        
       | xena wrote:
       | Hi, author of the post and Fly.io devrel here in case anyone has
       | any questions. GPUs went GA yesterday, you can experiment with
       | them to your heart's content should the fraud algorithm machine
       | god smile upon you. I'm mostly surprised my signal post about
       | what the "GPUs" are didn't land well here:
       | https://fly.io/blog/what-are-these-gpus-really/
       | 
       | If anyone has any questions, fire away!
        
         | qeternity wrote:
         | I posted further down before seeing your comment. First,
         | congrats on the launch!
         | 
         | But who is the target user of this service? Is this mostly just
         | for existing fly.io customers who want to keep within the
         | fly.io sandbox?
        
           | subarctic wrote:
           | Commenters like this, for one thing:
           | https://news.ycombinator.com/item?id=34242767
        
           | xena wrote:
           | Part of it is for people that want to do GPU things on their
           | fly.io networks. One of the big things I do personally is I
           | made Arsene (https://arsene.fly.dev) a while back as an
           | exploration of the "dead internet" theory. Every 12 hours it
           | pokes two GPUs on Fly.io to generate article prose and key
           | art with Mixtral (via Ollama) and an anime-tuned Stable
           | Diffusion XL model named Kohaku-XL.
           | 
           | Frankly, I also see the other part of it as a way to ride the
           | AI hype train to victory. Having powerful GPUs available to
           | everyone makes it easy to experiment, which would open Fly.io
           | as an option for more developers. I think "bring your own
           | weights" is going to be a compelling story as things advance.
        
             | gooseyman wrote:
             | https://en.m.wikipedia.org/wiki/Dead_Internet_theory
             | 
             | What have you learned from the exploration?
        
               | xena wrote:
               | Enough that I'd probably need to write a blogpost about
               | it and answer some questions that I have about it. The
               | biggest one I want to do is a sentiment analysis of these
               | horoscopes vs market results to see if they are
               | "correct".
        
             | cosmojg wrote:
             | Interesting setup! What's the monthly cost of running
             | Arsene on fly.io?
        
               | xena wrote:
               | Because I have secret magical powers that you probably
               | don't, it's basically free for me. Here's the breakdown
               | though:
               | 
               | The application server uses Deno and Fresh
               | (https://fresh.deno.dev) and requires a shared-1x CPU at
               | 512 MB of ram. That's $3.19 per month as-is. It also uses
               | 2GB of disk volume, which would cost $0.30 per month.
               | 
               | As far as post generation goes: when I first set it up it
               | used GPT-3.5 Turbo to generate prose. That cost me
               | rounding error per month (maybe like $0.05?). At some
               | point I upgraded it to GPT-4 Turbo for free-because-I-
               | got-OpenAI-credits-on-the-drama-day reasons. The prose
               | level increase wasn't significant.
               | 
               | With the GPU it has now, a cold load of the model and
               | prose generation run takes about 1.5 minutes. If I didn't
               | have reasons to keep that machine pinned to a GPU
               | (involving other ridiculous ventures), it would probably
               | cost about 5 minutes per day (increased the time to make
               | the math easier) of GPU time with a 40 GB volume (I now
               | use Nous Hermes Mixtral at Q5_K_M precision, so about 32
               | GB of weights), so something like $6 per month for the
               | volume and 2.5 hours of GPU time, or about $6.25 per
               | month on an L40s.
               | 
               | In total it's probably something like $15.75 per month.
               | That's a fair bit on paper, but I have certain
               | arrangements that make it significantly less cheap for
               | me. I could re-architect Arsene to not have to be online
               | 24/7, but it's frankly not worth it when the big cost is
               | the GPU time and weights volume. I don't know of a way to
               | make that better without sacrificing model quality more
               | than I have to.
               | 
               | For a shitpost though, I think it'd totally worth it to
               | pay that much. It's kinda hilarious and I feel like it
               | makes for a decent display of how bad things could get if
               | we go full "AI replaces writers" like some people seem to
               | want for some reason I can't even begin to understand.
               | 
               | I still think it's funny that I have to explicitly tell
               | people to not take financial advice from it, because if I
               | didn't then they will.
        
           | tptacek wrote:
           | This isn't the target user, but the boy's been using it at
           | the soil bacteria lab he works in to do basecalling for a
           | FAST5 data from a nanopore sequencer.
        
             | yard2010 wrote:
             | Can you please elaborate?
        
               | tptacek wrote:
               | I am nowhere within a million miles smart enough to
               | elaborate on this one.
        
         | bl4kers wrote:
         | How difficult world it be to set up Folding@home on these?
         | https://foldingathome.org
        
           | xena wrote:
           | I'm not sure, the more it uses CUDA the easier I bet. I don't
           | know if it would be fiscally worth it though.
        
         | yla92 wrote:
         | Not a question but the link "Lovelace L40s are coming soon
         | (pricing TBD)" is 404.
        
           | xena wrote:
           | Uhhhh that's not ideal. I'll go edit that after dinner.
           | Thanks!
        
           | thangngoc89 wrote:
           | If it's a link to nvidia.com then it's expected to be broken.
           | Seriously, I've never seen a valid link to nvidia.com
        
         | Nevin1901 wrote:
         | How fast are coldstarts, and how do you compare against other
         | gpu providers (runpod modal etc)
        
           | xena wrote:
           | The slowest part is loading weights into vram in my
           | experience. I haven't done benchmarking on that. What kind of
           | benchmark would you like to see?
        
             | ipsum2 wrote:
             | I would like to see time to first inference for typical
             | models (llama-7b first token, SDXL 1 step, etc)
        
         | thangngoc89 wrote:
         | This is right on time. I'm evaluating "severless" GPU services
         | for my upcoming project. I see on the announcement that pricing
         | is per hours. Is scaling to zero priced based on
         | minutes/seconds? For my workflow, medical image segmentation,
         | one file takes about 5 minutes.
        
         | benreesman wrote:
         | I'd be fascinated to hear your thoughts on Apple hardware for
         | inference in particular. I spend a lot of time tuning up
         | inference to run locally for people with Apple Silicon on-prem
         | or even on-desk, and I estimate a lot of headroom left even
         | with all the work that's gone into e.g. GGUF.
         | 
         | Do you think the process node advantage and SoC/HBM-first will
         | hold up long enough for the software to catch up? High-end
         | Metal gear looks expensive until you compare it to NVIDIA with
         | 64Gb+ of reasonably high memory bandwidth attached to dedicated
         | FP vector units :)
         | 
         | One imagines that being able to move inference workloads on and
         | off device with a platform like `fly.io` would represent a lot
         | of degrees of freedom for edge-heavy applications.
        
           | xena wrote:
           | Well, let me put it this way. I have a MacBook with 64 GB of
           | vram so I can experiment with making an old-fashioned x.ai
           | clone (the meeting scheduling one, not the "woke chatgpt"
           | one) amongst other things now. I love how Apple Silicon makes
           | things vroomy on my laptop.
           | 
           | I do know that getting those working in a cloud provider
           | setup is a "pain in the ass" (according to ex-AWS friends) so
           | I don't personally have hope in seeing that happen in
           | production.
           | 
           | However, the premise makes me laugh so much, so who knows? :)
        
       | niz4ts wrote:
       | As far as I know, Fly uses Firecracker for their VMs. I've been
       | following Firecracker for a while now (even using it in a
       | project), and they don't support GPUs out of the box (and have no
       | plan to support it [1]).
       | 
       | I'm curious to know how Fly figured their own GPU support with
       | Firecracker. In the past they had some very detailed technical
       | posts on how they achieved certain things, so I'm hoping we'll
       | see one on their GPU support in the future!
       | 
       | [1]: https://github.com/firecracker-
       | microvm/firecracker/issues/11...
        
         | mrkurt wrote:
         | The simple spoiler is that the GPU machines use Cloud
         | Hypervisor, not Firecracker.
        
           | niz4ts wrote:
           | Way simpler than what I was expecting! Any notes to share
           | about Cloud Hypervisor vs Firecracker operationally? I'm
           | assuming the bulkier Cloud Hypervisor doesn't matter much
           | compared to the latency of most GPU workloads.
        
             | tptacek wrote:
             | They are operationally pretty much identical. In both
             | cases, we drive them through a wrapper API server that's
             | part of our orchestrator. Building the cloud-hypervisor
             | wrapper took me all of about 2 hours.
        
       | qeternity wrote:
       | Who is the target market for this? Small/unproven apps that need
       | to run some AI model, but won't/can't use hosted offerings by the
       | literally dozens of race-to-zero startups offering OSS models?
       | 
       | We run plenty of our own models and hardware, so I get wanting to
       | have control over the metal. I'm just trying to figure out who
       | _this_ is targeted at.
        
         | mrkurt wrote:
         | We have some ideas but there's no clear answer yet. Probably
         | people building hosting platforms. Maybe not obvious hosting
         | platforms, but hosting platforms.
        
         | KTibow wrote:
         | Fly is an edge network - in theory, if your GPUs are next to
         | your servers and your servers are next to your users, your app
         | will be very fast, as highlighted in the article. In practice
         | this might not matter much since inference takes a long time
         | anyway.
        
           | tptacek wrote:
           | We're really a couple things; the edge stuff was where we got
           | started in 2020, but "fast booting VMs" is just as important
           | to us now, and that's something that's useful whether or not
           | you're doing edge stuff.
        
           | joshxyz wrote:
           | this is crazy, this move alone cements fly as an edge player
           | for the next 3 / 5 / 10 years.
        
         | dathinab wrote:
         | TL;DR: (skip to last paragraph)
         | 
         | - having the GPU compute in the same data center or at least
         | from the same cloud provider can be a huge plus
         | 
         | - it's not that rare for various providers we have tried out to
         | run out of available A100 GPUs, even with large providers we
         | had issues like that multiple times (less an issue if you
         | aren't locked to specific regions)
         | 
         | - not all providers provide a usable scale down to zero "on
         | demand" model, idk. how well it works with fly long term but
         | that could be another point
         | 
         | - race-to-zero startups have the tendency to not last, it's
         | kind by design from a 100 of them just a very few survive
         | 
         | - if you are already on fly and write a non-public tech demo
         | which just gets evaluated a few times their GPU offering can
         | act like a default don't think much about it solution (through
         | you using e.g. Huggingface services would be often more likely)
         | 
         | - A lot of companies can't run their own hardware for various
         | reasons, at best they can rent a rack in another Datacenter but
         | for small use use-cases this isn't always worth it. Similar
         | there are use cases which do might A100s but only run them
         | rarely (e.g. on weekly analytics data). Potentially less then
         | 1h/w in which case race-to-zero pricing might not look
         | interesting at all
         | 
         | To sum up I think there are many small reasons why some
         | companies, not just startups, might have interest in fly GPUs,
         | especially if they are already on fly. But there is no single
         | "that's why" argument, especially if you are already deploying
         | to another cloud.
        
           | qeternity wrote:
           | It's not like Fly has GPUs in every PoP...so there goes all
           | the same datacenter stuff (unless you just want to be in the
           | PoP with GPUs in which case...)
           | 
           | But none of this answers my question.
           | 
           | I'm trying to understand the intersection of things like
           | "people who need GPU compute" and "people who need to scale
           | down to zero".
           | 
           | This can't be a very big market.
        
         | DreamGen wrote:
         | I am not seeing any race-to-zero in the hosted offering space.
         | Most charge multiples of what you would pay on GCP, and the
         | public prices on GCP are already several times what you would
         | pay as an enterprise customer.
        
           | qeternity wrote:
           | I don't know what you think I'm talking about, or who is
           | charging multiples of GCP? But I'm talking about hosted
           | inference, where many startups are offering Mistral models
           | cheaper than Mistral are.
        
       | dcsan wrote:
       | Can fly run cog files like replicate uses? Would be nice to take
       | those pre packaged models run them here with the same prediction
       | API
       | 
       | Maybe cos it's replicate they might be hesitant to adopt it but
       | it does seem to make things a lot smoother Even with lambalabs'
       | lambdastack I still hit cuda hell
       | https://github.com/replicate/cog
        
       | UncleOxidant wrote:
       | I don't want to deploy an app, I just want to play around with
       | LLMs and don't want to go out and buy an expensive PC with a
       | highend GPU just now. Is Fly.io a good way to go? What about
       | alternatives?
        
         | leourbina wrote:
         | Paperspace is a great way to go for this. You can start by just
         | using their notebook product (similar to Collab), and you get
         | to pick which type of machine/GPU it runs on. Once you have the
         | code you want to run, you can rent machines on demand:
         | 
         | https://www.paperspace.com/notebooks
        
           | janalsncm wrote:
           | I used paperspace for a while. Pretty cheap for mid tier gpu
           | access (A6000 for example). There were a few things that
           | annoyed me though. For one, I couldn't access free GPUs with
           | my team account. So I ended up quitting and buying a 4090
           | lol.
        
         | mrkurt wrote:
         | You might actually be better off building a gaming rig and
         | using that. The datacenter GPUs are silly expensive, because
         | this is how NVIDIA price discriminates. The consumer, game GPUs
         | work really well and you can buy them for almost as cheap as
         | you can lease datacenter ones.
        
         | mrcwinn wrote:
         | https://ollama.com/ - Easy setup, run locally, free.
        
           | UncleOxidant wrote:
           | Yeah, but I've got an RTX1070 in my circa 2017 PC. How well
           | is that going to work?
        
             | thangngoc89 wrote:
             | It's slow but still decent since it has 8GB of RAM.
        
             | jeswin wrote:
             | You mean GTX 1070. There's no RTX 1070.
        
         | nojs wrote:
         | I can recommend runpod.io after a few months of usage - very
         | easy to spin up different GPU configurations for testing and
         | the pricing is simple and transparent. Using TheBloke docker
         | images you can get most local models up and running in a few
         | minutes.
        
         | ignoramous wrote:
         | > _What about alternatives?_
         | 
         | Custom models? Apart from the Big 3 (in no particular order):
         | 
         | - https://together.ai/
         | 
         | - https://replicate.com/
         | 
         | - https://anyscale.com/
         | 
         | - https://baseten.co/
         | 
         | - https://modal.com/
         | 
         | - https://banana.dev/
         | 
         | - https://runpod.io/
         | 
         | - https://bentoml.com/
         | 
         | - https://brev.dev/
         | 
         | - https://octo.ai/
         | 
         | - https://cerebrium.ai/
         | 
         | ...
        
           | ayewo wrote:
           | > Apart from the Big 3 ...
           | 
           | Who are the big 3 in this context?
        
             | gk1 wrote:
             | OpenAI, Anthropic, Cohere
        
         | mrb wrote:
         | Use https://vast.ai and rent a machine for as long as you need
         | (minutes, hours, days). You pick the OS image, and you get a
         | root shell to play with. An RTX 4090 currently costs $0.50 per
         | hour. It literally took me less than 15 minutes to sign up for
         | the first time a few weeks ago.
         | 
         | For comparison, the first time experience on Amazon EC2 is much
         | worse. I had tried to get a GPU instance on EC2 but couldn't
         | reserve it (cryptic error message). Then I realized as a first-
         | time EC2 user my default quota simply doesn't allow any GPU
         | instances. After contacting support and waiting 4-5 days I
         | eventually got a response my quota was increased, but I still
         | can't launch a GPU instance... apparently my quota is still
         | zero. At this point I gave up and found vast.ai. I don't know
         | if Amazon realizes how FRUSTRATING their useless default quotas
         | are for first-time EC2 users.
        
           | janalsncm wrote:
           | Pretty much had the same experience with EC2 GPUs. No
           | permission, had to contact support. Got permission a day
           | later. I wanted to run on A100 ($30/hour, 8GPU minimum) but
           | they were out of them that night. I tried again next day,
           | same thing. So I gave up and used RunPod.io.
        
         | dathinab wrote:
         | main question, do you need a A100?
         | 
         | some use cases do so.
         | 
         | but if not there are much cheaper consumer GPU based choices
         | 
         | but then maybe you anyway just use it for 1-2 hours in total in
         | which case the price difference might just not matter
        
       | k8svet wrote:
       | Does it have basic functioning other stuff? I am _shocked_ at how
       | our production usage of Fly has gone. Even basic stuff as support
       | not being able to just... look up internal platform issues.
       | Cryptic /non-existent error messages. I'm not impressed. It feels
       | like it's compelling to those scared of or ignorant of
       | Kubernetes. I thought I was over Kubernetes, but Fly makes me
       | miss it.
        
         | chachra wrote:
         | Been on it 7 months, 0 issues. Feel like you're alone on this
         | potentially.
        
           | weird-eye-issue wrote:
           | Alone? _Every_ thread about Fly has complaints about
           | reliability and people complain about it on Twitter too
        
             | nixgeek wrote:
             | That hasn't been my experience with Fly but I'm sorry to
             | hear it seems to be others :(
        
             | chachra wrote:
             | ok possibly not alone, maybe the issues happened before I
             | started using them extensively. I've had ~no downtime that
             | affects me in 7 months.
             | 
             | I do wish they had some features I need, but their support
             | and responses are top notch. And I've lost much less hair
             | and time than I would going full-blown AWS or another cloud
             | provider.
        
             | jokethrowaway wrote:
             | To be fair most hosting providers come with plenty of
             | public complaints about downtime. The big ones do way
             | better, the best one is AWS, then GC and last Azure. They
             | cost stupid money though.
             | 
             | Digital ocean has been terrible for me, some regions just
             | go down every month and I lose thousands of requests,
             | increasing my churn rate.
             | 
             | Fly.io had tons of weird issues but it got better in the
             | last months. It's still very incomplete in terms of
             | functionality and figuring out how to deploy the first time
             | is a massive pain.
             | 
             | My plan is to add Hetzner and load balance with bunnycdn
             | across DO and H
        
             | loloquwowndueo wrote:
             | Every thread on the Internet about any product or service
             | has complaints.
        
               | weird-eye-issue wrote:
               | Not to this extent, it has always stood out to me in
               | particular
        
               | weird-eye-issue wrote:
               | Actually here is a good example: Cloudflare. Sure people
               | complain a ton about privacy but I haven't seen a single
               | complaint about the reliability of Cloudflare Workers or
               | similar product in the dozens of threads I've seen on HN
        
             | jrockway wrote:
             | It's hard to tell how meaningful the reviews are. I have
             | used AWS, GCP, DigialOcean, and Linode throughout my
             | career. Every single one of these, through no fault of
             | myself or my team, messed up and caused downtime. Like, you
             | can get most SRE types in a room to laugh if you blurt out
             | "us-east-1", because it's known to be so unreliable. And
             | yet, it's where every Fortune 500 puts every service; we
             | laugh about the reliability and it's literally powering the
             | economy just fine.
             | 
             | So yes, a lot of people on HN complain about fly's
             | reliability. fly posts to HN a lot and gives them the
             | opportunity. Is it actually meaningful compared to the
             | alternatives? It's very hard to tell.
        
               | tptacek wrote:
               | Hoo boy.
               | 
               | First: this is 100% a "live by the sword, die by the
               | sword" situation for us. We're as aware as anybody about
               | our weird HN darling status (this is a post from two
               | months ago, about an announcement from many months ago,
               | that spent like 12 hours plastered to the front page; we
               | have no idea why it hit today, and it actually stepped on
               | another thing we wanted to post today so don't think we
               | secretly orchestrated any of this!). We've allowed
               | ourselves to be ultra-visible here, and threads like this
               | are natural consequence.
               | 
               | Moreover: a lot of this criticism is well warranted! I
               | can cough up a litany of mitigating factors (the guy who
               | stored his database in ephemeral instance storage instead
               | of a volume, for instance), but I mean, come on. The
               | single most highly upvoted and trafficked thing we've
               | ever written was a post a year ago owning up to
               | reliability issues on the platform. People have
               | definitely had issues!
               | 
               | A fun cop-out answer here is to note all the times people
               | compare us to AWS or Cloudflare, as if we were a
               | hyperscaler public cloud. More fun still is to search HN
               | for stories about us-east-1. We certainly do that to
               | self-sooth internally! And: also? If your only
               | consideration for picking a place to host an application
               | is platform reliability? You're hosting on AWS anyways.
               | But it's still a cop-out.
               | 
               | So I guess I'd sum all this up as: we've picked a hard
               | problem to work on. Things are mathematically guaranteed
               | to go wrong even if we're perfect, and we are not that.
               | People should take criticisms of us on these threads
               | seriously. We do. This is a tough crowd (the threads, if
               | not the vote scores on our blog post) and there's value
               | in that. Over the last year, and through this upcoming
               | year, staffing for infra reliability has been the single
               | biggest driver of hiring at Fly.io, I think that's the
               | right call, and I think the fact that we occasionally get
               | mauled on threads is part of what enabled us to make that
               | call.
               | 
               | (Ordinarily I'd shut up about this stuff and let the
               | thread die out itself, but some dearly loved user of ours
               | took a stand and said they'd never had any problems on
               | us, which: you can imagine the "ohhhhh nooooooo" montage
               | that took place in my brain when I read that someone had
               | essentially dared the thread to come up with times when
               | we'd sucked for some user, so I guess all bets are off.
               | Go easy on Xe, though: they really are just an ultra-
               | helpful uncynical person, and kind of walked into a
               | buzzsaw here).
        
               | jrockway wrote:
               | I also don't know why HN is so upset about people willing
               | to help out in the threads. The way I see it is, if you
               | talk about your product on HN, inevitably someone will
               | remember they have a support inquiry while HN is open,
               | and ask it there instead of over email. Since employees
               | are probably reading HN, they are naturally going to want
               | to answer or say they escalated there. I don't think it's
               | some sort of scam, just what any reasonable person would
               | do.
        
               | tptacek wrote:
               | It's become a YC cliche, that the way to get support for
               | any issue is to get a complaint upvoted to the top of a
               | thread. People used to talk about "Collison installs",
               | which are real-use product demos that are so slick your
               | company founder (in this case Stripe's 'pc) can just
               | wander around installing your product for people to
               | evangelize it; there should be another Collison term for
               | decisively resolving customer support issues by having
               | the founder drop into a thread, and I think that's the
               | vibe people are reacting to here.
        
           | uo21tp5hoyg wrote:
           | https://community.fly.io/t/reliability-its-not-great/11253
        
           | heeton wrote:
           | Not alone, I've been part of two teams who have evaluated fly
           | and hit weird reliability or stability issues, deemed it not
           | ready yet.
        
           | yawnxyz wrote:
           | this is what I thought, until once I spent two days to
           | publish a new, trivial code change to my Fly.io hosted API --
           | it just wouldn't update! And every time I tried to re-publish
           | it'd give me a slightly different error.
           | 
           | When it works, it's brilliant. The problem is that it hasn't
           | worked too well in the last few months.
        
         | xena wrote:
         | Can you email the first two letters of my username at fly.io
         | with more details? I'd love to find out what you've been having
         | trouble with so I can help make the situation better any way I
         | can. Thanks!
        
           | bongobingo1 wrote:
           | Another support.flycombinator.com classic.
        
             | azinman2 wrote:
             | Would you rather them be unresponsive?
        
               | lostemptations5 wrote:
               | It's HN -- if the company proved responsive it might
               | invalidate his OP and everyone who band wagons on it.
        
             | zmgsabst wrote:
             | Why would you care about customer problems if they don't
             | embarrass you in public?
             | 
             | /s
        
               | keeganpoppen wrote:
               | the only thing easier than them responding in this thread
               | is someone making this comment in this thread...
        
           | throwaway220033 wrote:
           | ...as if it's one person who had issues! I thought it was
           | just incompetency. But it now looks like a theatre,
           | pretending now.
        
             | ignoramous wrote:
             | I've been a paying Fly.io customer for 3 years now, and for
             | the past 18 months, I've had no real issue with any of my
             | apps. In fact, I don't even monitor our Fly.io servers any
             | more than I monitor S3 buckets; the kind of _zero devops_ I
             | expect from it is already a reality.
             | 
             | > _it 's one person who had issues_
             | 
             | Issues specific to an application or one particular account
             | _have_ to be addressed as special cases (like any
             | _NewCloud_ platform, Fly.io has its own idiosyncrasies).
             | The first step anyway is figuring out just what you 're
             | dealing with (special v common failure).
             | 
             | > _looks like a theatre_
             | 
             | I have had the Fly.io CEO do customer service. Some may
             | call it theatre, but this isn't uncommon for smaller
             | upstarts, and indicative of their commitment, if anything.
        
               | throwaway220033 wrote:
               | You're right, we have been quite unfair to Fly.io. All
               | these people who's talking bad about Fly io, like those
               | who lost their database, those who sent weekends trying
               | to get their product up and running while Fly.io not even
               | communicating but busy with their public image, we're
               | just some bad people talking bad about Fly.io. Your one
               | personal experience invalidates the whole data.
               | 
               | May be we should all switch using to your personal
               | account, as everything great works for you.
        
         | pech0rin wrote:
         | Yep they have terrible reliability and support. Couldn't deploy
         | for 2 days once and they actually told me to use another
         | company. Unmanaged dbs masquerading as managed. Random
         | downtime. I could go on but it's not a production ready service
         | and I moved off of it months ago.
        
           | biorach wrote:
           | > Unmanaged dbs masquerading as managed
           | 
           | Are you talking about fly postgres? Because I use it and feel
           | they've been pretty clear that it's unmanaged.
        
             | andy_ppp wrote:
             | Seriously! That's crazy. I need to setup terraform and move
             | to AWS before launching I guess.
        
               | biorach wrote:
               | > Seriously! That's crazy
               | 
               | huh? it does what it says on the tin. nothing crazy about
               | it.
               | 
               | They spell out for you in detail what they offer:
               | https://fly.io/docs/postgres/getting-started/what-you-
               | should...
               | 
               | And suggest external providers if you need managed
               | postgres: https://fly.io/docs/postgres/getting-
               | started/what-you-should...
        
               | andy_ppp wrote:
               | I was shocked because I didn't realise it wasn't managed.
               | Even Digital Ocean offer managed Postgres.
               | 
               | If you are offering a service like Fly I think the
               | database should be managed personally, the whole point of
               | Fly.io is to provide abstractions to make production
               | simpler.
               | 
               | Do you think the type of user who is using fly.io is
               | interested in or capable of managing their own Postgres
               | database? I'd rather just trust RDS or another provider.
        
               | corobo wrote:
               | > Do you think the type of user who is using fly.io is
               | interested in or capable of managing their own Postgres
               | database?
               | 
               | Honestly.. kinda, yeah
               | 
               | At least I'm projecting my weird "I want to love you for
               | some reason, Fly" plus my skillset onto anyone else that
               | wants to love Fly too haha
               | 
               | They feel very developer/nerd/HN/tinkerer targeted
        
           | benzible wrote:
           | The header at the top of their Getting Started is "This Is
           | Not Managed Postgres " [1]
           | 
           | and they have a managed offering [2] in private beta now...
           | 
           | > Supabase now offers their excellent managed Postgres
           | service on Fly.io infrastructure. Provisioning Supabase via
           | flyctl ensures secure, low-latency database access from
           | applications hosted on Fly.io.
           | 
           | [1] https://fly.io/docs/postgres/getting-started/what-you-
           | should...
           | 
           | [2] https://fly.io/docs/reference/supabase/
        
         | awestroke wrote:
         | I have run several services on Fly for almost a year now, have
         | not had any issues.
        
         | parhamn wrote:
         | I was hoping to migrate to Fly.io and during my testing I found
         | that simple deploys would drop connections for a few seconds
         | during a deploy switch over. Try a `watch -n 2 curl
         | <serviceipv4>` during a deploy to see for yourself (try any one
         | of the the strategies documented including blue-green). I
         | wonder how many people know this?
         | 
         | When I tested it I was hoping for at worst early termination of
         | old connections with no dropped new connections and at best I
         | expected them to gracefully wait for old connections to finish.
         | But nope, just a full downtime switch over every time. But then
         | when you think about the network topology described in their
         | blog posts, you realize theres no way it could've been done
         | correctly to begin with.
         | 
         | It's very rare for me to comment negatively on a service but
         | that fact that this was the case paired with the way support
         | acted like we were crazy when we sent video evidence of it
         | definitely irked me for infrastructure company standards.
         | Wouldn't recommend it outside of toy applications now.
         | 
         | > It feels like it's compelling to those scared of or ignorant
         | of Kubernetes
         | 
         | I've written pretty large deployment systems for kubernetes.
         | This isn't it. Theres a real space for heroku-like deploys done
         | properly and no one is really doing it well (or at least
         | without ridiculously thin or expensive compute resources)
        
           | asaddhamani wrote:
           | Yeah I had a similar experience where I got builds frozen for
           | a couple days, such that I was not able to release any
           | updates. When I emailed their support, I got an auto-response
           | asking me to post in the forum. Pretty much all hosts are
           | expected to offer a ticket system even for their unmanaged
           | services if its a problem on their side. I just moved over
           | all my stuff to Render.com, it's more expensive, but its been
           | reliable so far.
        
             | loloquwowndueo wrote:
             | The first (pinned) post in the fly.io forum explains it:
             | 
             | https://community.fly.io/t/fly-io-support-community-vs-
             | email...
        
               | malfist wrote:
               | That forum post just says what OP said, that they will
               | ignore all tickets from unnmanaged customers. Which is a
               | pretty shitty thing to do to your customers.
        
           | sofixa wrote:
           | > I've written pretty large deployment systems for
           | kubernetes. This isn't it. Theres a real space for heroku-
           | like deploys done properly and no one is really doing it well
           | (or at least without ridiculously thin or expensive compute
           | resources)
           | 
           | Have you tried Google Cloud Run(based on KNative) I've never
           | used it in production, but on paper seems to fit the bill.
        
             | parhamn wrote:
             | Yeah we're mostly hosted there now. The cpu/virtualization
             | feels slow but I haven't had time to confirm (we had to
             | offload super small ffmepg operations).
             | 
             | It's in a weird place between heroku and lambda. If your
             | container has a bad startup time like one of our python
             | services, autoscaling can't be used as latency becomes a
             | pain. Its also common deploy services on there that need
             | things like health checks (unlike functions which you
             | assume are alive), this assumes at least 1 instance of
             | sustained use as well, assuming you do minute health
             | checks. Their domain mapping service is also really really
             | bad and can take hours to issue a cert for a domain so you
             | have to be very careful about putting a lb in front of it
             | for hostname migrations.
             | 
             | I don't care right now but the fact that we're paying 5x in
             | compute is starting to bother me a bit. A 8core 16gb 'node'
             | is ~$500/month ($100 on DO) assuming you don't scale to
             | zero (which you probably wont). Plus I'm pretty sure the 8
             | cores reported isn't a meaty 8 cores.
             | 
             | But its been pretty stable and nice to use otherwise!
        
               | jetbalsa wrote:
               | A 6c / 12t Dedicated Server with 32GB of ram is 65$ a
               | month with OVH
               | 
               | I do get that it is a bare server, but if you deploy even
               | just bare containers to it, you would be saving a good
               | bit of money and get better performance from it.
        
               | doctorpangloss wrote:
               | Another interpretation is the so-called dedicated servers
               | are too good to be true.
        
               | jrockway wrote:
               | It depends on what the 6 cores are. Like I have a 8C/8T
               | dedicated server sitting in my closet that costs $65 per
               | the number of times you buy it. (Usually once.) The cores
               | are not as fast as the highest-end Epyc cores, however ;)
        
               | ac29 wrote:
               | At the $65/month level for an OVH dedicated server, you
               | get a 6-core CPU from 2018 and a 500Mbps public network
               | limit. Doesnt even seem like that good a deal.
               | 
               | There is also a $63/month option that is significantly
               | worse.
        
             | dig1 wrote:
             | I have yet to gain positive experience with Cloud Run. I
             | have one project with it, and Cloud Run is very
             | unpredictable with autoscaling. Sometimes, it can start
             | spinning up/down containers without any apparent reason,
             | and after hunting Google support for months, they said it
             | is an "expected behavior". Good luck trying to debug this
             | independently because you don't have access to knative
             | logs.
             | 
             | Starting containers on Cloud Run is weirdly slow, and oh
             | boy, how expensive that thing is. I'm getting the
             | impression that pure VMs + Nomad would be a way better
             | option.
        
               | parhamn wrote:
               | > Starting containers on Cloud Run is weirdly slow
               | 
               | What is this about? I assumed a highly throttled cpu or
               | terrible disk performance. A python process that would
               | start in 4 seconds locally could easily take 30 seconds
               | there.
        
               | JoshTriplett wrote:
               | Last I checked, Cloud Run isn't actually running real
               | Linux, it's emulating Linux syscalls.
        
               | sofixa wrote:
               | > I'm getting the impression that pure VMs + Nomad would
               | be a way better option
               | 
               | As a long time Nomad fan (disclaimer: now I work at
               | HashiCorp), I would certainly agree. You lose some on the
               | maintenance side because there's stuff for you to deal
               | with that Google could abstract for you, but the added
               | flexibility is _probably_ worth it.
        
               | jonatron wrote:
               | I just use AWS EC2, load balancer, auto scaling groups.
               | The user_data pulls and runs a docker image. To deploy I
               | do an instance refresh which has no downtime. Obvious
               | downside is more configuration than more managed
               | services.
        
             | giovannibonetti wrote:
             | I have been using Google Cloud Run in production for a few
             | years and have had a very good experience. It has the
             | fastest auto scaler I have ever seen, except only for FaaS,
             | which are not a good option for client-facing web services.
        
               | davidspiess wrote:
               | Same experience here, using it for years in production
               | for our critical api services without issues.
        
           | rollcat wrote:
           | > Try a `watch -n 2 curl <serviceipv4>` during a deploy
           | 
           | You need blackbox HTTP monitoring right now, don't _ever_
           | wait for your customer to tell you that your service is down.
           | 
           | I use Prometheus (&Grafana), but you can also get a hosted
           | service like Pingdom or whatever.
        
         | morgante wrote:
         | Unfortunately this is a pretty common story. Half the people I
         | know who adopted Fly migrated off it.
         | 
         | I was very excited about Fly originally, and built an entire
         | orchestrator on top of Fly machines--until they had a multi-day
         | outage where it took days to even get a response.
         | 
         | Kubernetes can be complex, but at least that complexity is (a)
         | controllable and (b) fairly well-trodden.
        
           | loloquwowndueo wrote:
           | Fly.io is not comparable to Kubernetes. It's a bit like
           | comparing AWS to Terraform.
           | 
           | Or to clarify your comment, Kubernetes on which cloud?
           | Amazon? google? Linode?
        
             | jrockway wrote:
             | Kubernetes on AWS, GCP, and Linode are all controllable and
             | well-trodden.
             | 
             | I definitely understand the comparison between Kubernetes
             | and fly. You have couple apps that are totally unrelated,
             | managed by separate teams, and you want to figure out how
             | you can avoid the two teams duplicating effort. One option
             | is to use something like fly.io, where you get a command
             | line you run to build your project and push the binary to a
             | server. Another option is to self-host infrastructure like
             | Kubernetes, and eventually get that down to one command to
             | build and push (or have your CI system do it).
             | 
             | The end result that organizations are aiming for are
             | similar; developers code the code and then the code runs in
             | production. Frankly, a lot of toil and human effort is
             | spent on this task, and everyone is aiming to get it to
             | take less effort. fly.io is an approach. Kubernetes is an
             | approach. Terraform on AWS is an approach.
        
               | loloquwowndueo wrote:
               | Maybe you're comparing flyctl with Kubernetes?
               | 
               | That'd be a slightly more valid comparison albeit flyctl
               | is much less ambitious by choice and design. That said,
               | using flyctl to orchestrate your deployments is not the
               | only way to Fly. Example:
               | 
               | https://fly.io/blog/fks/
        
             | morgante wrote:
             | > Fly.io is not comparable to Kubernetes.
             | 
             | The Fly team has worked on solving similar problems to
             | Kubernetes. Ex://fly.io/blog/carving-the-scheduler-out-of-
             | our-orchestrator/
             | 
             | Of course, Fly _also_ provides the underlying
             | infrastructure stack too. If you want to be pedantic, you
             | can compare it to GKE /AKS/EKS.
             | 
             | Kubernetes on any major cloud platform is more mature,
             | controllable, and reliable than Fly.
        
         | throwaway220033 wrote:
         | I switched to Kamal and Hetzner. It's the sweet spot.
        
         | rmbyrro wrote:
         | I find it amazing how much bad vibes fly.io gets here.
         | 
         | It looks worse than AWS or Azure to me.
         | 
         | Never used the service, but based on what I hear, I'll never
         | try...
        
       | m3kw9 wrote:
       | Now having GPUs is news now?
        
       | Mikejames wrote:
       | anyone know if this is a PCI passthrough for a full a100? or some
       | fancy clever vgpu thing?
        
         | mrkurt wrote:
         | Passthrough, yes.
        
         | tptacek wrote:
         | Do not get me started on the fancy vGPU stuff.
        
           | mgliwka wrote:
           | I'll bite :-) What are your experiences with that?
        
             | tptacek wrote:
             | Bad.
        
       | dvrp wrote:
       | too expensive
        
       | ec109685 wrote:
       | The recipe example or any any LLM use case seems like a very poor
       | way of highlighting "inference at the edge" given the extra few
       | hundred ms round trip won't matter.
        
         | manishsharan wrote:
         | This. I cannot think of a business case for running LLMs on the
         | edge. Is this a Pets.com moment for the AI industry?
        
         | unraveller wrote:
         | The better use case is obviously voice assistant at the edge.
         | As in voice 2 text 2 search/GPT 2 voice generated response.
         | That is where ms matter but it is also a high abuse angle no
         | one wants to associate with just yet. My guess is they are
         | going to do this in another post, and if so they should make
         | their own perplexity style online-gpt. For now they just wanted
         | to see what else people can think up by making the introduction
         | of it boring.
        
           | ec109685 wrote:
           | There's three options for inference: 1) On device inference
           | 2) Inference "on the edge" 3) Inference in a data center
           | 
           | Given fly is deployed in equinox data centers just like
           | everyone else, fundamentally there isn't much difference
           | between #2 and #3.
        
       | bugbuddy wrote:
       | This is amazing and it shows that Nvidia should be the most
       | valuable stock in the world. Every company, country, city, town,
       | village, large enterprise, medium and small business, AI bro,
       | Crypto bro, gamer bro, big tech, small tech, old tech, new tech,
       | and start up want Nvidia GPUs. Nvidia GPUs will become the new
       | green oil of the 21st century. I am all in and nothing short of a
       | margin call will change my mind.
        
       | isoprophlex wrote:
       | Almost twice as cheap as Modal! Very nice!
        
       | pgt wrote:
       | I was an early adopter of Fly.io. It is not production-ready.
       | They should fix their basic features before adding new ones.
        
         | urduntupu wrote:
         | Unfortunately true. Also jumped the fly.io ship after initial
         | high excitement for their offering. Moved back to
         | DigitalOcean's app platform. A bit more config effort,
         | significantly pricier, but we need stability on production.
         | Can't have my customers call me b/c of service interruption.
        
         | throwaway220033 wrote:
         | +1 - It's the most unreliable hosting service I've ever used in
         | my life with "nice looking" packaging. There were frequently
         | multiple things broken at same time, status page would always
         | be green while my meetings and weekends were ruined. Software
         | can be broken but Fly handles incidents with unprofessional,
         | immature attitude. Basically you pay 10x more money for an
         | unreliable service that just looks "nice". I'm paying 4x less
         | to much better hardware with Hetzner + Kamal; it works
         | reliably, pricing is predictable, I don't pay 25% more for the
         | same usage next month.
         | 
         | https://news.ycombinator.com/item?id=36808296
        
         | ecmascript wrote:
         | Comments like these are just sad to see on HN. It is not
         | constructive. What is these basic features that need fixing
         | you're speaking about and what is the fixes required?
        
           | cschmatzler wrote:
           | Reliability and support. Having even "the entire node went
           | down" tickets get an auto-response to "please go fuck off
           | into the community forum" is insane. What is the community
           | forum gonna do about your reliability issues? I can get a
           | 4EUR/mo server at Hetzner and have actual people in the
           | datacenter respond to my technical inquiries within minutes.
        
       | DreamGen wrote:
       | Great, more competition for the price-gouging platforms like
       | Replicate and Modal is needed. As always with these, I would be
       | curious about the cold-start time -- are you doing anything smart
       | about being able to start (load models into VRAM) quickly? Most
       | platforms that I tested are completely naive in their
       | implementation, often downloading the docker image just-in-time
       | instead of having it ready to be deployed on multiple machines.
        
       | wslh wrote:
       | Interesting. We have this discussing this kind of services
       | (offloading training) over the last several days [1] [2] [3].
       | Thinking on the opportunity to compete with top cloud services
       | such as Google Cloud, AWS, and Azure.
       | 
       | [1] https://news.ycombinator.com/item?id=39353663
       | 
       | [2] https://news.ycombinator.com/item?id=39329764
       | 
       | [3] https://news.ycombinator.com/item?id=39263422
        
       | unixhero wrote:
       | I use Fly.io free tier to run uptime monitoring with Uptime kuma.
       | It works insanely well, and I'm a really happy camper.
        
         | rozenmd wrote:
         | What do you use to let you know uptime kuma went down?
        
           | unixhero wrote:
           | It doesn't
        
       | faust201 wrote:
       | > The speed of light is only so fast
       | 
       | This is the title of one of the sections. Why? Think IT sector
       | needs to stop using such titles.
        
       | jimnotgym wrote:
       | It is a bit of an odd thing that we still call GPUs GPUs when the
       | main use for them seems to have little to do with Graphics!
        
       ___________________________________________________________________
       (page generated 2024-02-14 23:01 UTC)