[HN Gopher] Ask HN: Options for handling state at the edge?
___________________________________________________________________
Ask HN: Options for handling state at the edge?
With Cloudflare workers able to be called single digit ms away from
customers on much of the planet now, I wonder how I can keep state
as close to the workers / lambdas as possible. What are the
options we have for handling state as the edge? What do you use in
your business or service?
Author : CaptainJustin
Score : 58 points
Date : 2022-05-11 16:42 UTC (6 hours ago)
| powersurge360 wrote:
| I haven't done this, but I've been thinking about it lately.
| Fly.IO has had some very interesting ideas on this if you want to
| use a relational database. There was an article about litestream
| that would allow you to replicate your SQLite database to an
| arbitrary number of nodes, which means that every application
| server would have a SQLite file sitting on it for read queries,
| and then you can capture write queries and forward them to a
| write leader and let that user continue talking to that server
| until it replicates across your application servers.
|
| You can do basically the same idea with any relational database,
| have a write leader... somewhere and a bunch of read replicas
| that live close to the edge.
|
| There's also what you would call cloud native data stores that
| purport to solve the same issue, but I don't know much about how
| they work because I much prefer working w/ relational databases
| and most of those are NoSQL. And I haven't had to actually solve
| the problem yet for work so I also haven't made any compromises
| yet in how I explore it.
|
| Another interesting way to go might be CockroachDB. It's wire
| compatible w/ PostgreSQL and supposedly automatically clusters
| and shares data in the cluster. I don't know very much about it
| but it seems to be becoming more and more popular and many ORMs
| seem to have an adapter to support it. May also be worth looking
| into because if it works as advertised you can get an RBDMS that
| you can deploy to an arbitrary number of places and then
| configure to talk to one another and not have to worry about
| replicating the data or routing correctly to write leaders and
| all that.
|
| And again, I'm technical, but I haven't solved these problems so
| consider the above to be a jumping off point and take nothing as
| gospel.
| adam_arthur wrote:
| Depends on your product, but I'm able to do everything via
| Cloudflare workers, KV, DurableObjects, and use JSON files stored
| in Cloudflare CDN as source of truth (hosted for free btw)
|
| Cloudflare KV can store most of what you need in JSON form, while
| DurableObjects let you model updates with transactional
| guarantees.
|
| My app is particularly read heavy though, and backing data is
| mostly static (but gets updated daily).
|
| Honestly after using Cloudflare feel like they will easily become
| the go to cloud for building small/quick apps. Everything is
| integrated much better than AWS and way more user friendly from
| docs and dev experience perspective. Also their dev velocity on
| new features is pretty insane.
|
| Honestly didn't think that much of them until I started digging
| into these things
| don-code wrote:
| I recently had an opportunity to build an application on top of
| Lambda@Edge (AWS's equivalent of Cloudflare workers). The
| prevailing wisdom there was to make use of regional services,
| like S3 and DynamoDB, from the edge. That, of course, makes my
| edge application depend on calls to a larger, further away point
| of presence.
|
| While it's possible to distribute state to many AWS regions and
| select the closest one, I ended up going a different route:
| packaging state alongside the application. Most of the
| application's state was read-only, so I ended up packaging the
| application state up as JSON alongside the deployment bundle. At
| startup, it'd then statically read the JSON into memory - this
| performance penalty only happens at startup, and as long as the
| Lambda functions are being called often (in our case they are),
| requests are as fast as a memory read.
|
| When the state does need to get updated, I just redeploy the
| application with the new state.
|
| That strategy obviously won't work if you need "fast" turnaround
| on your state being in sync at all points of presence, or if
| users can update that state as part of your application's
| workflow.
| geewee wrote:
| We do something similar in Climatiq on Fastly's Compute@Edge.
| When building the application we load in a big chunk of read-
| only data in-memory, and serialize that memory to a file. When
| we spin up our instance, all we have to do is load that file
| into memory and then we have tons of read-only data in just a
| few ms.
| chucky_z wrote:
| I think this is a really clear winner for something like
| Litestream, where you can have state far away but sync it
| locally with periodic syncs if you can live with 'small wait on
| startup' and 'periodic state updates'.
| powersurge360 wrote:
| Isn't this effectively the same as using a static site
| generator? Could you potentially freeze or pre-bake your
| generated files and then just serve that?
| Elof wrote:
| Check out Macrometa, a data platform that uses CRDTs to manage
| state a N number of pops and also does real time event
| processing. - https://macrometa.com (full disclosure, I work at
| Macrometa)
| rad_gruchalski wrote:
| Cloudflare Workers with k/v store, R2 and their new D1 database.
| crawdog wrote:
| I have used card database files before with success.
| https://cr.yp.to/cdb.html
|
| Have your process regularly update the CDB file from a blob store
| like S3. Any deltas can be pulled from S3 or you can use a
| message bus if the changes are small. Every so often pull the
| latest CDB down and start aggregating deltas again.
|
| CDB performs great and can scale to multiple GBs.
| jhgb wrote:
| I thought it was "constant database"? Is it indeed meant to
| mean "card database"?
| crawdog wrote:
| my typo thanks for catching that
| kevsim wrote:
| CloudFlare just announced their own relational DB for workers
| today: https://blog.cloudflare.com/introducing-d1
|
| On HN: https://news.ycombinator.com/item?id=31339299
| bentlegen wrote:
| The convenience of this announcement makes me feel like the
| original post is an astroturfed marketing effort by Cloudflare.
| I'd love to know (I'm genuinely curious!), but unless OP admits
| as much, I don't know that anyone would put their hand up.
| jFriedensreich wrote:
| it really depends on the type of state:
|
| cloudflare kv store is great if the supported write pattern fits
|
| if you need something with more consistency between pops durable
| objects should be on your radar
|
| i also found that cloudant/couchdb is a perfect fit for a lot of
| usecases with heavy caching in the cf worker. its also possible
| to have multiple master replication with each couchdb cluster
| close to the local users, so you dont have to wait for writes to
| reach a single master on the other side of the world
| tra3 wrote:
| I never thought of this, but now I have a lot of questions. Do
| you have an application in mind where this would be useful? Most
| of my experience if with traditional webapps/SaaS so I'd love to
| see an example.
| deckard1 wrote:
| Doesn't Cloudflare have a cache API and/or cache fetch calls for
| workers?
|
| A number of people are talking about Lambda or loading files,
| SQLite, etc. These aren't likely to work on CF. CF uses isolated
| JavaScript sandboxes. You're not guaranteed to have two workers
| accessing the same memory space.
|
| This is, in general, the problem with serverless. The model of
| computing is proprietary and very much about the fine print
| details.
|
| edit: CF just announced their SQLite worker service/API today:
| https://blog.cloudflare.com/introducing-d1/
| ccouzens wrote:
| I've got a Fastly compute@edge service. My state is relatively
| small (less than a MB of JSON) and only changes every few hours.
| So I compile the state into the binary and deploy that.
|
| I can share a blog post about this if there is interest.
|
| It gives us very good performance (p95 under 1ms) as the function
| doesn't need to call an external service.
| anildash wrote:
| would love to hear more about how you're doing this, been
| poking at the idea but haven't seen a running example
| documented
| ccouzens wrote:
| https://medium.com/p/302f83a362a3
|
| There's a heading "This is how we made it fast" about 1/3 of
| the way down if you'd like to skip the introduction and
| background.
| asdf1asdf wrote:
| You just developed your application from the cache inwards,
| instead of the application outwards.
|
| Now on to develop the actual application that will host/serve
| your data to said cache layer.
|
| If you learn basic application architecture concepts, you won't
| be fooled by sales person lies again.
| fwsgonzo wrote:
| Just build a tiny application alongside an open source Varnish
| instance, and use it as a local backend. It's "free" if you have
| decent latency to the area of Internet you care about. For
| example, my latency is just fine to all of Europe so I host
| things myself.
|
| If you want to go one step further you can build a VMOD for
| Varnish to run your workloads inside Varnish, even with Rust:
| https://github.com/gquintard/vmod_rs_template
| F117-DK wrote:
| R2, KV, D1 and Durable Objects. Many options in the Cloudflare
| Suite.
| efitz wrote:
| AWS Lambda functions have a local temp directory. I have
| successfully used that in the past to store state.
|
| In my application, I had a central worker process that would
| ingest state updates and would periodically serialize the data to
| a MySQL database file, adding indexes and so forth and then
| uploading a versioned file to S3.
|
| My Lambda workers would check for updates to the database,
| downloading the latest version to the local temp directory if
| there was not a local copy or if the local copy was out of date.
|
| Then the work of checking state was just a database query.
|
| You can tune timings etc to whatever your app can tolerate.
|
| In my case the problem was fairly easy since state updates only
| occurred centrally; I could publish and pull updates at my
| leisure.
|
| If I had needed distributed state updates I would have just made
| the change locally without bumping version, and then send a
| message (SNS or SQS) to the central state maintainer for commit
| and let the publication process handle versioning and
| distribution.
| michaellperry71 wrote:
| There are many technical solutions to this problem, as others
| have pointed out. What I would add is that data at the edge
| should be considered immutable.
|
| If records are allowed to change, then you end up in situations
| where changes don't converge. But if you instead collect a
| history of unchanging events, then you can untangle these
| scenarios.
|
| Event Sourcing is the most popular implementation of a history of
| immutable events. But I have found that a different model works
| better for data at the edge. An event store tends to be centrally
| localized within your architecture. That is necessary because the
| event store determines the one true order of events. But if you
| relax that constraint and allow events to be partially ordered,
| then you can have a history at the edge. If you follow a few
| simple rules, then those histories are guaranteed to converge.
|
| Rule number 1: A record is immutable. It cannot be modified or
| deleted.
|
| Rule number 2: A record refers to its predecessors. If the order
| between events matters, then it is made explicit with this
| predecessor relationship. If there is no predecessor
| relationship, then the order doesn't matter. No timestamps.
|
| Rule number 3: A record is identified only by its type, contents,
| and set of predecessors. If two records have the same stuff in
| them, then they are the same record. No surrogate keys.
|
| Following these rules, analyze your problem domain and build up a
| model. The immutable records in that model form a directed
| acyclic graph, with arrows pointing toward the predecessors. Send
| those records to the edge nodes and let them make those
| millisecond decisions based only on the records that they have on
| hand. Record their decisions as new records in this graph, and
| send those records back.
|
| Jeff Doolittle and I talk about this system on a recent episode
| of Software Engineering Radio: https://www.se-
| radio.net/2021/02/episode-447-michael-perry-o...
|
| No matter how you store it, treat data at the edge as if you
| could not update or delete records. Instead, accrue new records
| over time. Make decisions at the edge with autonomy, knowing that
| they will be honored within the growing partially-ordered
| history.
| weatherlight wrote:
| Fly.io
| lewisl9029 wrote:
| A lot of great info here already, but I just wanted to add my 2c
| as someone who's been chasing the fast writes everywhere dream
| for https://reflame.app.
|
| Most of the approaches mentioned here will give you fast reads
| everywhere, but writes only fast if you're close to some
| arbitrarily chosen primary region.
|
| A few technologies I've experimented with for doing fast,
| eventually consistently replicated writes: DynamoDB Global
| Tables, CosmosDB, Macrometa, KeyDB.
|
| None of them are perfect, but in terms of write latency, active-
| active replicated KeyDB in my fly.io cluster has everything else
| beat. It's the only solution that offered _reliable_ sub-5ms
| latency writes (most are close to 1-2ms). Dynamo and Cosmos
| advertise sub-10ms, but in practice, while _most_ writes fall in
| that range, I've seen them fluctuate wildly to over 200ms (Cosmos
| was much worse than Dynamo IME), which is to be expected on the
| public internet with noisy neighbors.
|
| Unfortunately, I got too wary of the operational complexity of
| running my own global persistent KeyDB cluster with potentially
| unbounded memory/storage requirements, and eventually migrated
| most app state over to use Dynamo as the source of truth, with
| the KeyDB cluster as a auto-replicating caching layer so I don't
| have to deal with perf/memory/storage scaling and backup. So far
| that has been working well, but I'm still pre-launch so it's not
| anywhere close to battle tested.
|
| Would love to hear stories from other folks building systems with
| similar requirements/ambitions!
| lewisl9029 wrote:
| One thing I forgot to mention: Reflame requires fast writes
| globally only for a small subset of use cases. For everything
| else, it only needs fast reads globally, and for those I've
| been really liking FaunaDB.
|
| It's not SQL, but it offers strongly consistent global writes
| that allows me to reason about the data as if it lived in a
| regular strongly-consistent non-replicated DB. This has been
| incredibly powerful since I don't have to worry at all about
| reading stale data like I would with an eventually consistently
| read-replicated DB.
|
| It comes at the cost of write latency of ~200ms, which is still
| perfectly serviceable for everything I'm using it for.
| rektide wrote:
| I don't have a whole lot to say on this right now (very WIP), but
| I have a strong belief that git is a core tool we should be using
| for data.
|
| Most data-formats are thick-formats, pack data into a single
| file. Part of the effort in switching to git would be a shift to
| trying to unpack our data, to really make use of the file system
| to store fine grained pieces of data.
|
| It's been around for a while, but Irmin[1] (written in Ocaml) is
| a decent-enough almost-example of these kinds of practices. It
| lacks the version control aspect, but 9p is certainly another
| inspiration, as it encouraged state of all things to be held &
| stored in fine-grained files. Git I think is a superpower, but
| just as much: having data which can be scripted, which speaks the
| lingua-franca of computing- that too is a superpower.
|
| [1] https://irmin.org/
| https://news.ycombinator.com/item?id=8053687 (147 points, 8 years
| ago, 25 comments)
| richardwhiuk wrote:
| You really want to use CRDTs, not data types subject to human
| resolved merge conflicts.
| rektide wrote:
| I feel like crdts are sold as a panacea. I can esily imagine
| users making conflicting changes, so I dont really see or
| understand what the real value or weaknesses of CRDTs are.
|
| Im also used to seeing them used for online synchronization,
| & far less examples of distributed crdts, which is, to me,
| highly important.
|
| Git by contrast has straightforward & good merge strategies.
| At this point, I feel like the problems are complex & that we
| need complex tools that leave users & devs in charge &
| steering. Im so ready to be wrong, but I dont feel like these
| problems are outsmartable; crdts have always felt like they
| try to define a too limited world. For now, I feel like tools
| for managing files between different fs'es are more complex,
| but a minimum level of possibility we need.
| weego wrote:
| _I have a strong belief that git is a core tool we should be
| using for data_
|
| It isn't, we shouldn't, and you're not the first and won't be
| the last person to put time into this. It's neither a
| compelling solution nor even a particularly good one.
___________________________________________________________________
(page generated 2022-05-11 23:01 UTC)