[HN Gopher] Rethinking serverless with FLAME
___________________________________________________________________
Rethinking serverless with FLAME
Author : kiwicopple
Score : 304 points
Date : 2023-12-06 12:03 UTC (10 hours ago)
(HTM) web link (fly.io)
(TXT) w3m dump (fly.io)
| 8organicbits wrote:
| > With FLAME, your dev and test runners simply run on the local
| backend.
|
| Serverless with a good local dev story. Nice!
| victorbjorklund wrote:
| Totally. One reason I don't like serverless is because the
| local dev exp is so much worse compared to running a monolith.
| sergiomattei wrote:
| This is incredible. Great work.
| amatheus wrote:
| > Imagine if you could auto scale simply by wrapping any existing
| app code in a
|
| > function and have that block of code run in a temporary copy of
| your app.
|
| That's interesting, sounds like what fork does but for
| serverless. Great work
| abdellah123 wrote:
| Wow, this is amazing. Great work.
|
| One could really up a whole Hetzner/OVH server and create a KVM
| for the workload on the fly!!
| MoOmer wrote:
| WELL, considering the time delay in provisioning on
| Hetzner/OVH, maybe Equinix Metal would work better? But, if
| you're provisioning + maybe running some configuration, and
| speed is a concern, probably using Fly or Hetzner Cloud, etc.
| still makes sense.
| chrismccord wrote:
| Author here. I'm excited to get this out and happy to answer any
| questions. Hopefully I sufficiently nerd sniped some folks to
| implement the FLAME pattern in js , go, and other langs :)
| ryanjshaw wrote:
| This looks great. Hopefully Microsoft are paying attention
| because Azure Functions are way too complicated to secure and
| deploy, and have weird assumptions about what kind of code you
| want to run.
| bbkane wrote:
| I had a lot of problems trying to set up Azure Functions with
| Terraform a couple of years ago. Wonder if it's gotten
| better?
|
| https://www.bbkane.com/blog/azure-functions-with-terraform/
| orochimaaru wrote:
| I used them with Python. Simple enough but opinionated. I
| didn't play around with durable functions.
|
| Don't have strong feelings there. It worked. I did have
| some issues with upgrading the functions but found the work
| arounds.
| bob1029 wrote:
| > weird assumptions about what kind of code you want to run
|
| Those "weird assumptions" are what makes the experience
| wonderful for the happy path. If you use the C#/v4 model, I
| can't imagine you'd have a hard time. Azure even sets up the
| CI/CD for you automatically if your functions are hosted in
| Github.
|
| If your functions need to talk to SQL, you should be using
| Managed Identity authentication between these resources. We
| don't have any shared secrets in our connection strings
| today. We use Microsoft Auth to authenticate access to our
| HttpTrigger functions. We take a dep on IClaimsPrincipal
| right in the request and everything we need to know about the
| user's claims is trivially available.
|
| I have zero experience using Azure Functions outside of the
| walled garden. If you are trying to deploy python or rust to
| Az Functions, I can imagine things wouldn't be as smooth.
| Especially, as you get into things like tracing, Application
| Insights, etc.
|
| I feel like you should only use Microsoft tech if you intend
| to drink a large amount of their koolaid. The moment you
| start using their tooling with non C#/.NET stacks, things go
| a bit sideways. You might be better off in a different cloud
| if you want to use their FaaS runners in a more "open" way.
| If you _can_ figure out how to dose yourself appropriately
| with M$ tech, I 'd argue the dev experience is unbeatable.
|
| Much of the Microsoft hate looks to me like a stick-in-bike-
| wheels meme. You can't dunk on the experience until you've
| tried the one the chef actually intended. Dissecting your
| burger and only eating a bit of the lettuce is not a thorough
| review of the cuisine on offer.
| jorams wrote:
| > You can't dunk on the experience until you've tried the
| one the chef actually intended. Dissecting your burger and
| only eating a bit of the lettuce is not a thorough review
| of the cuisine on offer.
|
| But Microsoft isn't selling burgers that people are taking
| a bit of lettuce from. They're selling lettuce, and if that
| lettuce sucks in any context that isn't the burger that
| they're _also_ selling, then complaining about the quality
| of their lettuce is valid.
| jabradoodle wrote:
| A cloud vendor where using some of the most popular
| languages in the world makes your life harder is a genuine
| reason to dislike something.
| kapilvt wrote:
| azure functions don't fit common definition of serverless,
| I've had a few convos with them over several years.. but
| there is really a mismatch owing to the original origin at
| azure, and real lack of understanding of the space, ie origin
| is as built on top of web apps.. ie.. azure functions is
| built on a hack for to try and enter the marketing in the
| serverless space at its origins. how many websites do you
| need to run... ie you can't run more than 50 functions, or
| the 16 cell table on different runtime options (ie. provision
| servers for your server less)... consumption is better, but
| the origins in web apps means its just a different product..
| hey every function has a url by default :shrug: azure needs a
| radical rethink of what serverless is, I haven't seen any
| evidence they got the memo. in aws, lambda originated out of
| s3, re bring compute to storage.
| danielskogly wrote:
| Great article and video, and very exciting concept! Looking
| forward to a JS implementation, but that looks like a challenge
| to get done.
|
| And now I feel (a tiny bit) bad for sniping ffmpeg.fly.dev :)
| tlivolsi wrote:
| On an unrelated note, what syntax highlighting theme did you
| use for the code? I love it.
| willsmith72 wrote:
| Pretty cool idea, and that api is awesome.
|
| > CPU bound work like video transcoding can quickly bring our
| entire service to a halt in production
|
| Couldn't you just autoscale your app based on cpu though?
| quaunaut wrote:
| Yes and no: Maybe the rest of your workloads don't require much
| CPU- you only need this kind of power for one or two workloads,
| and you don't want them getting crowded out by other work
| potentially.
|
| Or they require a GPU.
|
| Or your core service only needs 1-2 servers, but you need to
| scale up to dozens/hundreds/thousands on demand, for work that
| only happens maybe once a day.
| willsmith72 wrote:
| fair enough.
|
| i think it's cool tech, but none of those things are "hair on
| fire" problems for me. i'm sure they are for some people.
| chrismccord wrote:
| Thanks! I try to address this thought in the opening. The issue
| with this approach is you are scaling at the wrong level of
| operation. You're scaling your entire app, ie webserver, in
| order to service specific hot operations. Instead what we want
| (and often reach for FaaS for) is _granular_ elastic scale. The
| idea here is we can do this kind of granular scale for our
| existing app code rather that smashing the webserver /workers
| scale buttons and hoping for the best. Make sense?
| stuartaxelowen wrote:
| If you autoscale based on CPU consumption, doesn't the macro
| level scaling achieve the same thing? Is the worry scaling
| small scale services where marginal scaling is a higher
| multiple, e.g. waste from unused capacity?
| sofixa wrote:
| Very interesting concept, however it's a bit soured by the fact
| that Container-based FaaS is never mentioned, and it removes a
| decent chunk of the negatives around FaaS. Yeah you still need to
| deal with the communication layer (probably with managed services
| such as SQS or Pub/Sub), but there's no proprietary runtime
| needed, no rewrites needed between local/remote runtime
| environments.
| willsmith72 wrote:
| what are some examples of container-based faas? like you put
| your docker image onto lambda?
| sofixa wrote:
| * Google Cloud Run -
| https://cloud.google.com/run/docs/deploying#command-line
|
| * OpenFaaS - https://www.openfaas.com/blog/porting-existing-
| containers-to...
|
| * AWS Lambda - https://docs.aws.amazon.com/prescriptive-
| guidance/latest/pat...
|
| * Scaleway Serverless Containers -
| https://www.scaleway.com/en/serverless-containers/
|
| * Azure Container Instances - https://learn.microsoft.com/en-
| us/azure/container-instances/...
|
| Probably others too, those are just the ones I know off the
| top of my head. I see very little reason to use traditional
| Function-based FaaS, which forces you into a special, locked-
| in framework, instead of using containers that work
| everywhere.
| willsmith72 wrote:
| ok yeah so like an image on lambda, totally agree, a lot of
| the pros of serverless without a lot of the cons
| dprotaso wrote:
| https://knative.dev/ - (CloudRun API is based on this OSS
| project)
| chrismccord wrote:
| Bring-your-own-container is certainly better than proprietary
| js runtimes, but as you said it carries every other negative I
| talk about in the post. You get to run your language of choice,
| but you're still doing all the nonsense. And you need to reach
| for the mound of proprietary services to actually ship
| features. This doesn't move the needle for me, but I would be
| happy to have it if forced to use FaaS.
| agundy wrote:
| Looks like a great integrated take on carving out serverless
| work. Curious to see how it handles the server parts of
| serverless like environment variables, db connection counts, etc.
|
| One potential gotcha I'm curious if there is a good story for is
| if it can guard against code that depends on other processes in
| the local supervision tree. I'm assuming since it's talking about
| Ecto inserts it brings over and starts the whole apps supervision
| tree on the function executor but that may or may not be desired
| for various reasons.
| chrismccord wrote:
| It starts your whole app, including the whole supervision tree,
| but you can turn on/off services based on whatever logic you
| want. I talk a bit about this in the screencast. For example,
| no need to start the phoenix endpoint (webserver) since we
| aren't serving web traffic. For the DB pool, you'd set a lower
| pool size or single connection in your runtime configuration
| based on the presence of FLAME parent or not.
| agundy wrote:
| Oh cool! Thanks for the reply, haven't had time to watch the
| screencast yet. Looking forward to it.
| OJFord wrote:
| This is one reason I really don't like US headline casing as
| enforced by HN - it looks like Serverless, as in the capital-S
| company, serverless.com, is what's being rethought, not the
| small-s principle.
|
| (Aside: I _wish_ someone would rethink Serverless, heh.)
| davidjfelix wrote:
| This is a very neat approach and I agree with the premise that we
| need a framework that unifies some of the architecture of cloud -
| shuttle.rs has some thoughts here. I do take issue with this
| framing:
|
| - Trigger the lambda via HTTP endpoint, S3, or API gateway ($)
| * Pretending that starting a fly machine doesn't cost the same as
| triggering via s3 seems disingenuous.
|
| - Write the bespoke lambda to transcode the video ($)
| * In go this would be about as difficult as flame -- you'd have
| to build a different entrypoint that would be 1 line of code but
| it could be the same codebase. Node it would depend on bundling
| but in theory you could do the same -- it's just a promise that
| takes an S3 event, that doesn't seem much different.
|
| - Place the thumbnail results into SQS ($) * I
| wouldn't do this at all. There's no reason the results need to be
| queued. Put them in a deterministically named s3 bucket where
| they'll live and be served from. Period.
|
| - Write the SQS consumer in our app (dev $) *
| Again -- this is totally unnecessary. Your application *should
| forget* it dispatched work. That's the point of dispatching it.
| If you need subscribers to notice it or do some additional work
| I'd do it differently rather than chaining lambdas.
|
| - Persist to DB and figure out how to get events back to active
| subscribers that may well be connected to other instances than
| the SQS consumer (dev $) * Your lambda really
| should be doing the DB work not your main application. If you've
| got subscribers waiting to be informed the lambda can fire an SNS
| notification and all subscribed applications will see "job 1234
| complete"
|
| So really the issue is:
|
| * s3 is our image database
|
| * our app needs to deploy an s3 hook for lambda
|
| * our codebase needs to deploy that lambda
|
| * we might need to listen to SNS
|
| which is still some complexity, but it's not the same and it's
| not using the wrong technology like some chain of SQS nonsense.
| chrismccord wrote:
| Thanks for the thoughts - hopefully I can make this more clear:
|
| > * Pretending that starting a fly machine doesn't cost the
| same as triggering via s3 seems disingenuous.
|
| You're going to be paying for resources wherever you decide to
| run your code. I don't think this needs to be spelled out. The
| point about costs is rather than paying to run "my app", I'm
| paying at multiple layers to run a full solution to my problem.
| Lambda gateway requests, S3 put, SQS insert, each have their
| own separate costs. You pay a toll at every step instead of a
| single step on Fly or wherever you host your app.
|
| > * I wouldn't do this at all. There's no reason the results
| need to be queued. Put them in a deterministically named s3
| bucket where they'll live and be served from. Period. This is
| totally unnecessary. Your application _should forget_ it
| dispatched work. That 's the point of dispatching it. If you
| need subscribers to notice it or do some additional work I'd do
| it differently rather than chaining lambdas.
|
| You still need to tell your app about the generated thumbnails
| if you want to persist the fact they exist where you placed
| them in S3, how many exist, where you left off, etc.
|
| > * Your lambda really should be doing the DB work not your
| main application. If you've got subscribers waiting to be
| informed the lambda can fire an SNS notification and all
| subscribed applications will see "job 1234 complete"
|
| This is _exactly_ my point. You bolt on ever more Serverless
| offerings to accomplish any actual goal of your application.
| SNS notifications is exactly the kind of thing I don 't want to
| think about, code around, and pay for. I have
| Phoenix.PubSub.broadcast and I continue shipping features. It's
| already running on all my nodes and I pay nothing for it
| because it's already baked into the price of what I'm running -
| my app.
| davidjfelix wrote:
| > This is exactly my point. You bolt on ever more Serverless
| offerings to accomplish any actual goal of your application.
| SNS notifications is exactly the kind of thing I don't want
| to think about, code around, and pay for. I have
| Phoenix.PubSub.broadcast and I continue shipping features.
| It's already running on all my nodes and I pay nothing for it
| because it's already baked into the price of what I'm running
| - my app.
|
| I think this is fine if and only if you have an application
| that can subscribe to PubSub.broadcast. The problem is that
| not everything is Elixir/Erlang or even the same language
| internally to the org that runs it. The solution
| (unfortunately) seems to be reinventing everything that made
| Erlang good but for many general purpose languages at once.
|
| I see this more as a mechanism to signal the runtime
| (combination of fly machines and erlang nodes running on
| those machines) you'd like to scale out for some scoped
| duration, but I'm not convinced that this needs to be
| initiated from inside the runtime for erlang in most cases --
| why couldn't something like this be achieved externally
| noticing the a high watermark of usage and adding nodes, much
| like a kubernetes horizontal pod autoscaler?
|
| Is there something specific about CPU bound tasks that makes
| this hard for erlang that I'm missing?
|
| Also, not trying to be combative -- I love Phoenix framework
| and the work y'all are doing at fly, especially you Chis,
| just wondering if/how this abstraction leaves the walls of
| Elixir/Erlang which already has it significantly better than
| the rest of us for distributed abstractions.
| tonyhb wrote:
| You're literally describing what we've built at
| https://www.inngest.com/. I don't want to talk about us
| much in this post, but it's _so relevant_ it 's hard not to
| bring it up. (Huge disclaimer here, I'm the co-founder).
|
| In this case, we give you global event streams with a
| durable workflow engine that any language (currently
| Typescript, Python, Go, Elixir) can hook into. Each step
| (or invocation) is backed by a lightweight queue, so queues
| are cheap and are basically a 1LOC wrapper around your
| existing code. Steps run as atomic "transactions" which
| must commit or be retried within a function, and are as
| close to exactly once as you could get.
| ekojs wrote:
| I don't know if I agree with the argument regarding durability vs
| elastic execution. If I can get both (with a nice API/DX) via
| something like Temporal (https://github.com/temporalio/temporal),
| what's the drawback here?
| bovermyer wrote:
| As an alternative to Lambdas I can see this being useful.
|
| However, the overhead concerns me. This would only make sense in
| a situation where the function in question takes long enough that
| the startup overhead doesn't matter or where the main application
| is running on hardware that can't handle the resource load of
| many instances of the function in question.
|
| I'm still, I think, in the camp of "monoliths are best in most
| cases." It's nice to have this in the toolbox, though, for those
| edge cases.
| cchance wrote:
| He commented in another post that they use pooling so you don't
| really pay the cold start penalty as often as you'd think so
| maybe not a issue?
| freedomben wrote:
| I don't think this goes against "monoliths are best in most
| cases" at all. In fact it supports that by letting you code
| like it's all one monolith, but behind-the-scenes it spins up
| the instance.
|
| Resource-wise if you had a ton of unbounded concurrency then
| that would be a concern as you could quickly hit instance
| limits in the backend, but the pooling strategy discussed lower
| in the post addresses that pretty well, and gives you a good
| monitoring point as well.
| tonyhb wrote:
| This is great! It reminds me of a (very lightweight) Elixir
| specific version of what we built at https://www.inngest.com/.
|
| That is, we both make your existing code available to serverless
| functions by wrapping with something that, essentially, makes the
| code callable via remote-RPC .
|
| Some things to consider, which are called out in the blog post:
|
| Often code like this runs in a series of imperative steps. Each
| of these steps can run in series or parallel as additional
| lambdas. However, there's implicit state captured in variables
| between steps. This means that functions become _workflows_. In
| the Inngest model, Inngest captures this state and injects it
| back into the function so that things are durable.
|
| On the note of durability, these processes should also be backed
| by a queue. The good thing about this model is that queues are
| cheap. When you make queues cheap (eg. one line of code)
| _everything becomes easy_ : any developer can write reliable code
| without worrying about infra.
|
| Monitoring and observability, as called out, is critical. Dead
| letter queues suck absolute major heaving amounts of nauseous
| air, and being able to manage and replay failing functions or
| steps is critical.
|
| A couple differences wrt. FLAME and Inngest. Inngest is queue
| backed, event-driven, and servable via HTTP across any language.
| Because Inngest backs your state externally, you can write a
| workflow in Elixir, rewrite it in Typescript, redeploy, and
| running functions live migrate across backend languages, similar
| to CRIU.
|
| Being event-driven allows you to manage flow control: everything
| from debounce to batching to throttling to fan-out, across any
| runtime or language (eg. one Elixir app on Fly can send an event
| over to run functions on TypeScript + Lambda).
|
| I'm excited where FLAME goes. I think there are similar goals!
| chrismccord wrote:
| Ingest looks like an awesome service! I talk about job
| processors/durability/retries in the post. For Elixir
| specifically for durability, retries, and workflows we reach
| for Oban, which we'd continue to do here. The Oban job would
| call into FLAME to handle the elastic execution.
| darwin67 wrote:
| FYI: there's an Elixir SDK for Inngest as well. Haven't fully
| announced it yet, but plan to post it in ElixirForum some
| time soon.
|
| https://github.com/inngest/ex_inngest
| gitgud wrote:
| > _It then finds or boots a new copy of our entire application
| and runs the function there._
|
| So for each "Flame.call" it begins a whole new app process and
| copies the execution context in?
|
| A very simple solution to scaling, but I'd imagine this would
| have some disadvantages...
|
| Adding 10ms to the app startup time, adds 10ms to every
| "Flame.call" part of the application too... same with memory I
| suppose
|
| I guess these concerns just need to be consider when using this
| system
| chrismccord wrote:
| The FLAME.Pool discussed later in the post addresses this.
| Runners are pooled and remain configurable hot for whatever
| time you want before idling down. Under load you are rarely
| paying the cold start time because the pool is already hot. We
| are also adding more sophisticated pool growth techniques to
| the Elixir library next so you also avoid hitting an at
| capacity runner and cold starting one.
|
| For hot runners, the only overhead is the latency between the
| parent and child, which should be the same datacenter so 1ms or
| sub 1ms.
| bo0tzz wrote:
| Currently the per-runner concurrency is limited by a fixed
| number. Have you thought about approaches that instead base
| this on resource usage, so that runners can be used
| optimally?
| chrismccord wrote:
| Yes, more sophisticated pool growth options is something I
| want longer term. We can also provide knobs that will let
| you drive the pool growth logic yourself if needed.
| solatic wrote:
| Cold start time is _the_ issue with most serverless runtimes.
|
| Your own mission statement states: "We want on-demand,
| granular elastic scale of specific parts of our app code."
| Doing that _correctly_ is _fundamentally_ a question of how
| long you need to wait for cold starts, because if you have a
| traffic spike, the spiked part of the traffic is simply not
| being served until the cold start period elapses. If you 're
| running hot runners with no load, or if you have incoming
| load without runners (immediately) serving them, then you're
| not really delivering on your goal here. AWS EC2 has had
| autoscaling groups for more than a decade, and of course, a
| VM is essentially a more elaborate wrapper for any kind of
| application code you can write, and one with a longer cold-
| start time.
|
| > Under load you are rarely paying the cold start time
| because the pool is already hot.
|
| My spiky workloads beg to differ.
| bo0tzz wrote:
| Depending of course on the workload and request volume, I
| imagine you could apply a strategy where code is run
| locally while waiting for a remote node to start up, so you
| can still serve the requests on time?
| solatic wrote:
| No, because then you're dividing the resources allocated
| to the function among the existing run + the new run. If
| you over-allocate ahead of time to accommodate for this,
| you might as well just run ordinary VMs, which always
| have excess allocation locally; the core idea of scaling
| granularly is that you only allocate the resources you
| need for that single execution (paying a premium compared
| to a VM but less overall for spiky workloads since less
| overhead will be wasted).
| conradfr wrote:
| In a Elixir/Phoenix app I don't think this will be really
| used for web traffic and more for background/async jobs.
| tardismechanic wrote:
| > FLAME - Fleeting Lambda Application for Modular Execution
|
| Reminds me of 12-factor app (https://12factor.net/) especially
| "VI. Processes" and "IX. Disposability"
| rubenfiszel wrote:
| That's great. I agree with the whole thesis.
|
| We took an alternative approach with https://www.windmill.dev
| which is to consider the unit of abstraction to be at the source
| code level rather than the container level. We then parse the
| main function, and imports to extract the args and dependencies,
| and then run the code as is in the desired runtime (typescript,
| python, go, bash). Then all the secret sauce is to manage the
| cache efficiently so that the workers are always hot regardless
| of your imports
|
| It's not as integrated in the codebase as this, but the audience
| is different, our users build complex workflows from scratch,
| cron jobs, or just one-off scripts with the auto-generated UI.
| Indeed the whole context in FLAME seems to be snapshotted and
| then rehydrated on the target VM. Another approach would be to
| introduce syntax to specify what is required context from what is
| not and only loading the minimally required. That's what we are
| currently exploring for integrating better Windmill with existing
| codebase instead of having to rely on http calls.
| bo0tzz wrote:
| > Indeed the whole context in FLAME seems to be snapshotted and
| then rehydrated on the target VM. Another approach would be to
| introduce syntax to specify what is required context from what
| is not and only loading the minimally required.
|
| This isn't strictly what is happening. FLAME just uses the
| BEAM's built in clustering features to call a function on a
| remote node. That implicitly handles transferring only the
| context that is necessary. From the article:
|
| > FLAME.call accepts the name of a runner pool, and a function.
| It then finds or boots a new copy of our entire application and
| runs the function there. Any variables the function closes over
| (like our %Video{} struct and interval) are passed along
| automatically.
| rubenfiszel wrote:
| Fair point, TIL about another incredible capability of the
| BEAM. As long as you're willing to write Elixir, this is
| clearly a superior scheme for deferred tasks/background jobs.
|
| One issue I see with this scheme still is that you have to be
| careful of what you do at initialization of the app since now
| all your background jobs are gonna run that. For instance,
| maybe your task doesn't need to be connected to the db and as
| per the article it will if your app does. They mention having
| hot-modules, but what if you want to run 1M of those jobs on
| 100 workers, you now have a 100 unnecessary apps. It's
| probably a non-issue, the number of things done at
| initialization could be kept minimal, and FLAME could just
| have some checks to skip initialization code when in a flame
| context.
| chrismccord wrote:
| This is actually a feature. If you watch the screencast, I
| talk about Elixir supervision trees and how all Elixir
| programs carefully specify the order their services stop
| and stop in. So if your flame functions need DB access, you
| start your Ecto.Repo with a small or single DB connection
| pool. If not, you flip it off.
|
| > It's probably a non-issue, the number of things done at
| initialization could be kept minimal, and FLAME could just
| have some checks to skip initialization code when in a
| flame context.
|
| Exactly :)
| jrmiii wrote:
| So, Chris, how do you envision the FLAME child
| understanding what OTP children it needs to start on
| boot, because this could be FLAME.call dependent if you
| have multiple types of calls as described above. Is there
| a way to pass along that data or for it to be pulled from
| the parent?
|
| Acknowledging this is brand new; just curious what your
| thinking is.
|
| EDIT: Would it go in the pool config, and a runner as a
| member of the pool has access to that?
| chrismccord wrote:
| Good question. The pools themselves in your app will be
| per usecase, and you can reference the named pool you are
| a part of inside the runner, ie by looking in system env
| passed as pool options. That said, we should probably
| just encode the pool name along with the other parent
| info in the `%FLAME.Parent{}` for easier lookup
| jrmiii wrote:
| Ah, that makes a lot of sense - I think the
| FLAME.Parent{} approach may enable backends that wouldn't
| be possible otherwise.
|
| For example, if I used the heroku api to do the
| equivalent of ps:scale to boot up more nodes - those new
| nodes (dynos in heroku parlance) could see what kind of
| pool members they are. I don't think there is a way to do
| dyno specific env vars - they apply at the app level.
|
| If anyone tries to do a Heroku backend before I do, an
| alternative might be to use distinct process types in the
| Procfile for each named pool and ps:scale those to 0 or
| more.
|
| Also, might need something like Supabase's
| libcluster_postgres[1] to fully pull it off.
|
| EDIT2: So the heroku backend would be a challenge. You'd
| maybe have to use something like the formation api[2] to
| spawn the pool, but even then you can't idle them down
| because Heroku will try to start them back. I.e. there's
| no `restart: false` from what I can tell from the docs or
| you could use the dyno api[3] with a timeout set up front
| (no idle awareness)
|
| [1] https://github.com/supabase/libcluster_postgres
|
| [2] https://devcenter.heroku.com/articles/platform-api-
| reference...
|
| [3] https://devcenter.heroku.com/articles/platform-api-
| reference...
| Nezteb wrote:
| Oops you've got an extra w, here is the URL for anyone looking:
| https://www.windmill.dev/
|
| I love the project's goals; I'm really hoping Windmill becomes
| a superior open-source Retool/Airtable alternative!
| rubenfiszel wrote:
| Thanks, fixed! (and thanks)
| AlchemistCamp wrote:
| This looks fantastic! At my last gig we had exactly the "nuts"
| FaaS setup described in the article for generating thumbnails and
| alternate versions of images and it was a source of unnecessary
| complexity.
| RcouF1uZ4gsC wrote:
| This seems a lot like the "Map" part of map-reduce.
| neoecos wrote:
| Awesome work, let's see how long it takes to get the Kubernetes
| backend.
| anonyfox wrote:
| Amazing addition to Elixir for even more scalability options!
| Love it!
| isoprophlex wrote:
| Whoa, great idea, explained nicely!
|
| Elixir looks ridiculously powerful. How's the job market for
| Elixir -- could one expect to have a chance at making money
| writing Elixir?
| ed wrote:
| Yep! Elixir is ridiculously powerful. Best place to look for
| work is the phoenix discord which has a pretty active job
| channel.
| anonyfox wrote:
| It's indeed very powerful and there are jobs out there. Besides
| being an excellent modern toolbox for lots of problems
| (scaling, performance, maintenance) and having the arguably
| best frontend-tech in the industry (LiveView), the Phoenix
| framework also is the most loved web framework and elixir
| itself the 2nd most loved language according to the
| stackoverflow survey.
|
| Its still a more exotic choice of a tech stack, and IMO its
| best suited for when you have fewer but more senior devs
| around, this is where it really shines. But I also found that
| phoenix codebase survived being "tortured" by a dozen juniors
| over years quite well.
|
| I basically make my money solely with Elixir and have been for
| ~5 years now, interrupted only by gigs as a devops for the
| usual JS nightmares including serverless (where the cure always
| has been rewriting to Elixir/Phoenix at the end).
| imafish wrote:
| Having dealt with the pain and complexity of a 100+ lambda
| function app for the last 4 years, I must say this post
| definitely hits the spot wrt. the downsides of FaaS serverless
| architectures.
|
| When starting out, these downsides are not really that visible.
| On the contrary, there is a very clear upside, which is that
| everything is free when you have low usage, and you have little
| to no maintenance.
|
| It is only later, when you have built a hot mess of lambda
| workflows, which become more and more rigid due to
| interdependencies, that you wish you had just gone the monolith
| route and spent the few extra hundreds on something self-managed.
| (Or even less now, e.g. on fly.io)
|
| A question for author: what if not using Elixir?
| chrismccord wrote:
| I talk about FLAME outside elixir in one one of the sections in
| the blog. The tldr; is it's a generally applicable pattern for
| languages with a reasonable concurrency model. You likely won't
| get all the ergonomics that we get for free like functions with
| captured variable serialization, but you can probably get 90%
| of the way there in something like js, where you can move your
| modular execution to a new file rather than wrapping it in a
| closure. Someone implementing a flame library will also need to
| write the pooling, monitoring, and remote communication bits.
| We get a lot for free in Elixir on the distributed messaging
| and monitoring side. The process placement stuff is also really
| only applicable to Elixir. Hope that helps!
| jrmiii wrote:
| > functions with captured variable serialization
|
| Can't wait for the deep dive on how that works
| hinkley wrote:
| A pattern I see over and over, which has graduated to somewhere
| between a theorem and a law, is that motivated developers can
| make just about any process or architecture work for about 18
| months.
|
| By the time things get bad, it's almost time to find a new job,
| especially if the process was something you introduced a year
| or more into your tenure and are now regretting. I've seen it
| with a handful of bad bosses, at least half a dozen times with
| (shitty) 'unit testing', scrum, you name it.
|
| But what I don't know is how many people are mentally aware of
| the sources of discomfort they feel at work, instead of a more
| nebulous "it's time to move on". I certainly get a lot of
| pushback trying to name uncomfortable things (and have a lot
| less bad feelings about it now that I've read Good to Great).
| Nobody wants to say, "Oh look, the consequences of my actions."
|
| The people materially responsible for the Rube Goldberg machine
| I help maintain were among the first to leave. The captain of
| that ship asked a coworker of mine if he thought it would be a
| good idea to open source our engine. He responded that nobody
| would want to use our system when the wheels it reinvented
| already exist (and are better). That guy was gone within three
| to four months, under his own steam.
| antod wrote:
| That's why I'm always wary of people who hardly ever seem to
| stay anywhere more than a couple of years.
|
| There's valuable learning (and empathy too) in having to see
| your own decisions and creations through their whole
| lifecycle. Understanding how tech debt comes to be, what
| tradeoffs were involved and how they came to bite later.
| Which ideas turned out to be bad in hindsight through the
| lens of the people making them at the time.
|
| Rather than just painting the previous crowd as incompetent
| while simultaneously making worse decisions you'll never
| experience the consequences of.
|
| Moving on every 18-24 months leaves you with a potentially
| false impression of your own skills/wisdom.
| katzgrau wrote:
| And don't forget that the developer fought like hell to use
| that new process, architecture, pattern, framework, etc
| icedchai wrote:
| I couldn't even stand having a dozen lambdas. The app was
| originally built by someone who didn't think much about
| maintenance or deployment. Code was copy-pasted all over the
| place. Eventually, we moved to a "fat lambda" monolith where a
| single lambda serves multiple endpoints.
| viraptor wrote:
| > that you wish you had just gone the monolith route
|
| Going from hundreds of lambdas to a monolith is overreacting to
| one extreme by going the other one. There's a whole spectrum of
| possible ways to split a project in useful ways, which simplify
| development and maintenance.
| p10jkle wrote:
| I'm working on something that I think might solve the problem
| in any language (currently have an sdk for typescript, and java
| in the works). You can avoid splitting an application into 100s
| of small short-running chunks if you can write normal service-
| orientated code, where lambdas can call each other. But this
| isn't possible without paying for all that time waiting around.
| If the Lambdas can pause execution while they are blocked on
| IO, it solves the problem. So I think durable execution might
| be the answer!
|
| I've been working on a blog post to show this off for the last
| couple of weeks:
|
| https://restate.dev/blog/suspendable-functions-make-lambda-t...
| solardev wrote:
| Superficially, this sounds similar to how Google App Engine and
| Cloud Run already work
| (https://cloud.google.com/appengine/migration-
| center/run/comp...). Both are auto-scaling containers that can
| run a monolith inside.
|
| Is that a fair comparison?
| chrismccord wrote:
| They handle scaling at only highest level, similar to spinning
| up more dynos/workers/webservers like I talk about in the
| intro. FLAME is about elastically scaling individual hot
| operations of your app code. App Engine and such are about
| scaling at the level of your entire app/container. Splitting
| your operations into containers then breaks the monolith into
| microservice pieces and introduces all the downsides I talk
| about in the post. Also, while it's your code/language, you
| still need to interface with the mount of proprietary offerings
| to actual accomplish your needs.
| hq1 wrote:
| So how does it work if there are workers in flight and you
| redeploy the main application?
| bo0tzz wrote:
| The workers get terminated. If the work they were doing is
| important, it should be getting called from your job queue and
| so it should just get started up again.
| chrismccord wrote:
| If you're talking about inflight work that is running on the
| runner, there is a Terminator process on the runner that will
| see the parent go away, then block on application shutdown for
| the configured `:shutdown_timeout` as long as active work is
| being done. So active processes/calls/casts are given a
| configurable amount of time to finish and no more work is
| accepted by the runner.
|
| If you're talking about a FLAME.call at app shutdown that
| hasn't yet reached the runner, it will follow the same app
| shutdown flows of the rest of your code and eventually drop
| into the ether like any other code path you have. If you want
| durability you'd reach for your job queue (like Oban in Elixir)
| under the same considerations as regular app code. Make sense?
| arianvanp wrote:
| One thing I'm not following how this would work with IAM etc. The
| power of Lambda to me is that it's also easy to deal with
| authorization to a whole bunch of AWS services. If I fire off a
| flame to a worker in a pool and it depends on say accessing
| DynamoDB, how do I make sure that that unit of work has the right
| IAM role to do what it needs to do?
|
| Similarly how does authorization/authentication/encryption work
| between the host and the forked of work? How is this all secured
| with minimal permissions?
| xavriley wrote:
| > how does authorization between the host and the forked work?
|
| On fly.io you get a private network between machines so comms
| are already secure. For machines outside of fly.io it's
| technically possible to connect them using something like
| Tailscale, but that isn't the happy path.
|
| > how do I make sure that the unit of work has the right IAM
|
| As shown in the demo, you can customise what gets loaded on
| boot - I can imagine that you'd use specific creds for services
| as part of that boot process based on the node's role.
| timenova wrote:
| I have a question about distributed apps with FLAME. Let's say
| the app is running in 3 Fly regions, and each region has 2
| "parent" servers with LiveViews and everything else.
|
| In that case, how should the Flame pools look like? Do they
| communicate in the same region and share the pools? Or are Flame
| pools strictly children of each individual parent? Does it make a
| difference in pricing or anything else to run on hot workers
| instead of starting up per parent?
|
| What would you recommend the setup be in such a case?
|
| Aside: I really liked the idea of Flame with Fly. It's a really
| neat implementation for a neat platform!
| chrismccord wrote:
| > Or are Flame pools strictly children of each individual
| parent?
|
| Confirm. Each parent node runs its own pool. There is no global
| coordination by design.
|
| > Does it make a difference in pricing or anything else to run
| on hot workers instead of starting up per parent?
|
| A lot would depend on what you are doing, the size of runner
| machines you decide to start in your pools (which can be
| different sizes from the app or other pools), etc. In general
| Elixir scales well enough that you aren't going to be running
| your app in every possible region. You'll be in a handful of
| regions servicing traffic in those regions and the load each
| region has. You _could_ build in your own global coordination
| on top, ie try to find processes running on the cluster already
| (which could be running in a FLAME runner), but you 're in
| distributed systems land and it All Depends(tm) what you're
| building the tradeoffs you want.
| timenova wrote:
| Thanks for the reply!
|
| Can I suggest adding some docs to Fly to run Flame apps? To
| cover the more complex aspects of integrating with Fly, such
| as running Flame machines with a different size compared to
| the parent nodes, what kind of fly.toml config works and
| doesn't work with Flame, such as the auto_start and auto_stop
| configurations on the parent based on the number of requests,
| and anything else particularly important to remember with
| Fly.
| hinkley wrote:
| > Also thanks to Fly infrastructure, we can guarantee the FLAME
| runners are started in the same region as the parent.
|
| If customers think this is a feature and not a bug, then I have a
| very different understanding about what serverless/FaaS is meant
| to be used for. My division is pretty much only looking at edge
| networking scenarios. Can I redirect you to a CDN asset in Boston
| instead of going clear across the country to us-west-1? We would
| definitely NOT run Lamba out of us-west-1 for this work.
|
| There are a number of common ways that people who don't
| understand concurrency think they can 'easily' or 'efficiently'
| solve a problem that provably do not work, and sometimes
| tragicomically so. This feels very similar and I worry that fly
| is Enabling people here.
|
| Particularly in Elixir, where splitting off services is already
| partially handled for you.
| aidos wrote:
| I used a service years ago that did effectively this. PiCloud
| were sadly absorbed into Dropbox but before that they had exactly
| this model of fanning out tasks to workers transparently. They
| would effectively bundle your code and execute it on a worker.
|
| There's an example here. You'll see it's exactly the same model.
|
| https://github.com/picloud/basic-examples/blob/master/exampl...
|
| I've not worked with Elixer but I used Erlang a couple of decades
| back and it appears BEAM hasn't changed much (fundamentally). My
| suspicion is that it's much better suited for this work since
| it's a core part of the design. Still, not a totally free lunch
| because presumably there a chance the primary process crashes
| while waiting?
| thefourthchime wrote:
| I created something similar at my work, which I call "Long
| Lamda", the idea is that what if a lambda could run more than 15
| minutes? Then do everything in a Lambda. An advantage of our
| system as is you can also run everything locally and debug it. I
| didn't see that with the FLAME but maybe I missed it.
|
| We use it for our media supply chain which processes a few
| hundred videos daily using various systems.
|
| Most other teams drank the AWS Step Koolaid and have thousands of
| lambas deployed, with insane development friction and
| surprisingly higher costs. I just found out today that we spend
| 6k a month on "Step Transitions", really?!
| jrmiii wrote:
| > you can also run everything locally and debug it. I didn't
| see that with the FLAME but maybe I missed it.
|
| He mentioned this:
|
| > With FLAME, your dev and test runners simply run on the local
| backend.
|
| and this
|
| > by default, FLAME ships with a LocalBackend
| seabrookmx wrote:
| I'm firmly in the "I prefer explicit lambda functions for off-
| request work" camp, with the recognition that you need a lot of
| operational and organizational maturity to keep a fleet of
| functions maintainable. I get that isn't everyone's cup of tea or
| a good fit for every org.
|
| That said, I don't understand this bit:
|
| > Leaning on your worker queue purely for offloaded execution
| means writing all the glue code to get the data into and out of
| the job, and back to the caller or end-user's device somehow
|
| I assumed by "worker queue" they were talking about something
| akin to Celery in python land, but it actually does handle all
| this glue. As far as I can tell, Celery provides a very similar
| developer experience to FLAME but has the added benefit that if
| you do want durability those knobs are there. The only real
| downside seems you need redis or rabbit to facilitate it? I don't
| have any experience with them but I'd assume it's the same story
| with other languages/frameworks (eg ruby+sidekiq)?
|
| Maybe I'm missing something.
| jrmiii wrote:
| Yeah, I think this was more inward focusing on things like
| `Oban` in elixir land.
|
| He's made the distinction in the article that those tools are
| great when you need durability, but this gives you a lower
| ceremony way to make it Just Work(tm) when all you're after is
| passing off the work.
| josevalim wrote:
| Wouldn't you lose, for example, streaming capabilities once you
| use Celery? You would have to first upload the whole video,
| then enqueue the job, and then figure out a mechanism to send
| the thumbnails back to that client, while with FLAME you get a
| better user experience by streaming thumbnails as soon as the
| upload starts.
|
| I believe the main point though is that background workers and
| FLAME are orthogonal concepts. You can use FLAME for
| autoscaling, you can use Celery for durability, and you could
| use Celery with FLAME to autoscale your background workers
| based on queue size. So being able to use these components
| individually will enable different patterns and use cases.
___________________________________________________________________
(page generated 2023-12-06 23:00 UTC)