[HN Gopher] Introducing ReadySet
___________________________________________________________________
Introducing ReadySet
Author : alanamarzoev4
Score : 107 points
Date : 2022-04-05 17:32 UTC (5 hours ago)
(HTM) web link (blog.readyset.io)
(TXT) w3m dump (blog.readyset.io)
| dkhenry wrote:
| I am really excited to see where readyset can take this
| technology, if deployment is as simple as they say this sounds
| like an instant win for any high throughput service.
|
| I am curious how they handle queries that would overflow local
| main memory, like if I just had a PK lookup on a 10TB table you
| obviously can't store all that in RAM, and would still need to do
| some form of cache invalidation.
| Jonhoo wrote:
| The trick is "partial view materialization"
| (https://jon.thesquareplanet.com/papers/phd-thesis.pdf).
| Basically, you only materialize results for commonly-accessed
| keys, and compute other keys on-demand.
| dkhenry wrote:
| Is there a way to federate which keys are commonly accessed ?
| Like if I commonly access the entire table can I direct
| inbound traffic to different application servers and have
| them access different caches so each cache can pull only a
| subset of the data into the cache, and not worry about things
| like which keys are being written globally
| glittershark wrote:
| We've thought about that, actually! We have an experimental
| mode where multiple copies of the same query can be created
| (actually just multiple copies of the leaf node in the
| dataflow graph, so intermediate state is reused) with
| different subsets of keys materialized - the idea is then
| that these separate readers would be run on different
| regions, so eg the reader in the EU region gets keys for EU
| users, and the reader in the NA region gets keys for NA
| users.
| ko27 wrote:
| Sounds great, until you realize you are switching to eventual
| consistency for your whole application.
| sedev wrote:
| Digging down a couple layers of links from this, the underlying
| paper, "Partial State in Dataflow-Based Materialized Views"
| https://jon.thesquareplanet.com/papers/phd-thesis.pdf is pretty
| intriguing. It sounds like a potential free lunch in specific
| performance areas, which means it also sounds too good to be
| true, but if it turns out to be a metaphorical 90%-off lunch
| that's still very promising.
| glittershark wrote:
| the remaining 10% of the 90%-off free lunch is pretty much just
| eventual consistency - it can occasionally be the case that you
| write something to the DB, and an immediate subsequent write
| doesn't see it. That said, there are escape-hatches there
| (we'll proxy queries that happen inside of a transaction to the
| upstream mysql/postgresql database, and there's an experimental
| implementation of opt-in Read-Your-Writes consistency), and I'd
| wager that the vast majority of "traditional" web applications
| can tolerate slightly stale reads.
|
| Our official docs also have an aptly-titled: "what's the
| catch?" section:
| https://docs.readyset.io/concepts/overview#whats-the-catch
| Jonhoo wrote:
| Oh hey, that's my thesis! Happy to answer any questions you may
| have about it :) There's also the OSDI'18 paper here which may
| be of interest: https://jon.tsp.io/papers/osdi18-noria.pdf
| adamgordonbell wrote:
| This is super exciting. Ever since I talked to you about
| Noria I've been telling people about this concept. I'm
| excited to see a production ready implementation of it.
| BenoitP wrote:
| Big fan here!
|
| I've been following the space since a bit of time, and I must
| say it's exciting. To me this is the future of apps where the
| Truth lives server-side, and everything reacts from there;
| With partial state evaluation lowering resource consumption
| to a minimum.
|
| Kafka Streams and Apache Flink seem to be focused on real-
| time analytics, and I wish they'd get there to stimulate the
| space.
|
| Are you affiliated with ReadySet?
| educaysean wrote:
| According to the linked article, Jon appears to be one of
| the co-founders
| Jonhoo wrote:
| I'm pretty excited about it too! I remember when I
| initially started the research I was amazed that this
| didn't already exist.
|
| Some context:
| https://twitter.com/jonhoo/status/1511401461669720068
|
| Basically, I co-founded the company around the time I
| graduated, but had had my fill of database research after
| six years of PhD. So I joined AWS to work on Rust while
| Alana (the CEO) took on leading ReadySet.
| msvan wrote:
| I've used both of the suggested methods under "Current standards
| for scaling out databases" so I see where this is coming from.
| But I peeked at the AWS reference architecture, and it places a
| Consul and ReadySet deployment in my environment for me to run
| and maintain. I feel like any sales pitch for this really needs
| to convince me that having these things in my environments is
| going to be worth the hassle in terms of milliseconds and
| dollars, as opposed to just using RDS read replicas and paying a
| bit more. Then again, I can see this being an obvious choice if
| you're growing very quickly or have tight latency requirements.
|
| With that said, it looks like cool tech and I read Jon's Rust for
| Rustaceans which serves as a stamp of quality for this even if I
| haven't tried it yet!
| zeroonetwothree wrote:
| This feels like the sort of the where it works great 90% of the
| time, but as soon as you want to do something more
| complicated/nontrivial it doesn't handle it properly and is
| impossible to debug/improve since you are using an opaque
| solution.
|
| At least with scaling replicas or having a dumb cache layer it's
| easy to understand the system.
| skyde wrote:
| Hi, Jon Gjengset I am so happy to see attempt to make Noria
| usable in legacy applications. Am I right in assuming this need
| to consume the replication log from the primary database? Or
| write request going through ReadSet proxy will produce its own
| change feed ?
| thecompilr wrote:
| Hi, ReadySet engineer here. It does replicate using the
| replication log.
| zomglings wrote:
| This seems like it could be a good fit for my team (and we have
| been discussing this kind of caching).
|
| We frequently (~ once per second) run queries over relations that
| are increasing at a rate of ~100 rows per second (append only, no
| updates).
|
| Could this cause any performance concerns for ReadySet? How much
| control do we have over the frequency of reconstruction of cached
| data based on the flow graph?
| vinay_ys wrote:
| I can't think of real-world examples of apps that have read path
| scaling problem that this tech would solve well. It would be
| great if the authors can catalogue a bunch of real-world use-
| cases. (real-world customer case-studies would be even better).
|
| Today, machines are super huge (in terms of compute cores, memory
| and iops for storage and network) and a single Mysql or
| PostgreSQL database can do a lot of work. This makes it much much
| easier to build apps that don't have as many users at Internet
| scale - that is pretty much all enterprise apps - without
| resorting to distributed databases.
|
| In Internet scale consumer app domains like e-commerce/delivery
| or fintech where relational databases are used heavily, most
| queries would have strict correctness requirements and won't
| tolerate staleness. Also, most query-results would be highly
| specific to each user and won't have much cache hits. Also, apps
| in general are increasingly personalised and have fast changing
| content.
|
| In terms of technology evolution, I see people moving from single
| large machine databases to distributed sql databases as their
| use-cases scale.
|
| And as distributed sql databases mature, I expect they will get
| built-in capability to generate user-defined materialised views
| with flexibility to manage their placement w.r.t class and number
| of machines to compute and serve them etc.
| staticassertion wrote:
| My company is building a realtime analytics service for
| security. You have a lot of reads that can often be answered
| with stale data (thanks to our data modeling, which provides
| ACID 2.0 semantics).
|
| I think a lot of applications could probably benefit from this
| _if_ they were built with a data model in mind that leverages
| it properly. But if you 1:1 migrate your code that relies on
| ACID transactions over to something with Strong Eventual
| Consistency... yeah, that 's gonna be a bad time.
| js4ever wrote:
| Is it open source?
| staticassertion wrote:
| Very cool tech here, excited for the future. Congrats on
| fundraising!
| sulam wrote:
| I was prepared to be super skeptical here, but this actually
| looks like it could be really good, without all the gotchas I was
| expecting. I think the only potential hole I see is if you do a
| lot of db-side code as stored procedures and what-not. I can't
| tell from the write up if they can keep those consistent.
___________________________________________________________________
(page generated 2022-04-05 23:00 UTC)