[HN Gopher] How PlanetScale Boost serves SQL queries faster
___________________________________________________________________
How PlanetScale Boost serves SQL queries faster
Author : mrbbk
Score : 113 points
Date : 2022-11-15 16:58 UTC (6 hours ago)
(HTM) web link (planetscale.com)
(TXT) w3m dump (planetscale.com)
| bsnnkv wrote:
| PlanetScale is such a cool name, fits really well for a database
| company. Just goes to show that even these days when I think that
| naming something new is impossible, there is still a lot of room
| to be creative.
| bearjaws wrote:
| They probably paid a pretty penny for it.
|
| My first startup job was a skunks work project and we had
| around 128 noun-adjective pairs we wanted to find a .com domain
| for. All of them were taken.
|
| We had to settle on a .io domain, and this was 7 years ago.
|
| 2 year in we came up with a better name and managed to get a
| .com... with a dash in the URL.
| emptysea wrote:
| Startup I worked at paid like 100k + equity for a two word
| .com domain name and it wasn't anywhere near as nice of name
| as planet scale is
| swozey wrote:
| I worked at a 4 letter .com startup and we paid $800k for the
| domain. We never made revenue anywhere near our domain and
| when the company eventually shuttered the most revenue we'd
| ever made was selling the domain again.
| laristine wrote:
| I deal in domains and stories like this still never cease
| to amaze me. In any case, I'm glad your company was able to
| sell the domain for a good sum back.
| ushakov wrote:
| there are still plenty of great .com available, you just
| gotta be more creative
| vyrotek wrote:
| This reminds me a little of "materialized views". But essentially
| every query is potentially a view you can materialize (cache).
| And with this being managed at the DB level it knows when new
| data invalidated the previous results.
|
| Traditionally, other materialized view implementations have very
| strict query requirements though. The queries had to be
| deterministic. No left joins, dates, etc. This is required in
| order to properly detect when data changes "impact" the view. I
| wonder how they get around it.
|
| Update: Ah, ok! Here's a write up on how it works a bit. My last
| startup built a system like this specifically to power a
| gamification engine. Would have been nice to have this 10 years
| ago.
|
| https://planetscale.com/blog/how-planetscale-boost-serves-yo...
|
| > The Boost cluster lives alongside your database's shards and
| continuously processes the events relayed by the VStream. The
| result of processing these events is a partially materialized
| view that can be accessed by the database's edge tier. This view
| contains some, but not all, of the rows that could be queried.
| dang wrote:
| (We've merged the threads so that writeup is now the URL at the
| top)
| obviyus wrote:
| Has anyone who has used PlanetScale in production comment about
| their experience? I was evaluating a few options a couple of
| weeks ago but ended up going with just RDS due to lack of
| feedback for PlanetScale here on HN.
| gtCameron wrote:
| We have been running PlanetScale as our production database for
| about 6 months, migrated from Aurora Serverless. I love it,
| their query insights tool has been a game changer for us and
| has allowed us to optimize a ton of queries in our application.
| Their support is always available and highly technical.
|
| For a sense of scale, we have ~150gb of data running around 5
| trillion row reads + 500 million row writes per month
| revicon wrote:
| We're you using the Aurora Serverless data APIs? Curious if
| there is something equivalent on PlanetScale.
| samlambert wrote:
| https://github.com/planetscale/database-js
| gtCameron wrote:
| I was not, we are a Laravel PHP backend, using the standard
| PHP stuff for connection management
| wolfhumble wrote:
| From what I understand your webserver and php
| implementation is stored on different servers from
| PlanetScale's DBs(?)
|
| Just wonder: How are the DB queries from your php
| implementation to the Planetscale DBs affected by network
| latency (hops and length between servers) as well as
| bandwidth (query results returned by PlanetScale DBs)?
|
| Thanks! :-)
| [deleted]
| mythrwy wrote:
| We looked at it, but it was a little "different" and we didn't
| want the learning curve, so we went with ScaleGrid instead.
|
| This caching does look cool, perhaps I'll revisit PlanetScale
| later on my own time.
| joshstrange wrote:
| I left Aurora serverless (v1, v2 pricing was insane) for PS and
| I've been extremely happy. It was way cheaper and less of a
| headache than RDS and my actual usage was way less than I
| anticipated (since it's hard to think in terms of row
| reads/writes when working at a high level). With PS I get a
| dev/qa/staging/prod DB for $30/mo vs spinning up multiple RDS
| instances. Even with Aurora Serverless (v1) where you can spin
| down to 0 it was cheaper to go with PS. 1 DB unit on Aurora
| Serverless (v1) cost like $45/mo (for my 1 prod instance) so
| for $15 less I got all my other environments without having to
| wait for them to spin up after they went to sleep.
|
| My usage is way under some of my sibling comments but it's a
| been a joy to use and $360/yr to not have to worry about
| scaling my DB, backups, schema migrations, and now caching is a
| steal for me. Could I run my DB on a $5/mo DO box (or similar)?
| Probably, though I'd probably want at least the $10/$15 size
| box for when my software gets a little more load. Even if I
| knew for sure I could run on the $5 box I'd still rather pay
| $30/mo to never worry about my DB and the schema migration tool
| is awesome.
| xmorse wrote:
| Query memoization with optimistic updates
| dianfishekqi wrote:
| It looks like it uses the same ideas as Noria
|
| https://www.youtube.com/watch?v=s19G6n0UjsM
|
| https://github.com/mit-pdos/noria
| wwarner wrote:
| Yes and good for Planetscale to build it out!
| dang wrote:
| Discussed in a few small past threads:
|
| _Noria: Dynamic, partially-stateful data-flow for high-perf
| web applications_ -
| https://news.ycombinator.com/item?id=29615085 - Dec 2021 (10
| comments)
|
| _Noria: dynamic, partially-stateful data-flow for high-
| performance web apps_ -
| https://news.ycombinator.com/item?id=18330477 - Oct 2018 (1
| comment)
|
| _Noria: dynamic, partially-stateful data-flow for high-
| performance web apps_ -
| https://news.ycombinator.com/item?id=18176135 - Oct 2018 (1
| comment)
| stalluri wrote:
| Vstream looks super cool. Can we also use it create subscriptions
| that can bind with ReactHooks on the front-end ? I think
| PlanetScale can easily deliver amazing or better than firebase
| subscriptions. All we need is React and NextJs SDKs to get
| started with :-)
| httgp wrote:
| Supabase does real-time subscriptions really well!
|
| And it does have great guides for use with React and Next.js
| p10jkle wrote:
| See also https://readyset.io/ for generic SQL support (not just
| Planetscale)
| hotdamnson wrote:
| Why do these new big thing databases make SQL look like some
| witchcraft?
|
| Here is some proper SQL query:
|
| SELECT DISTINCT r.id,
| r.owner_id, r.name,
| COUNT(r.id) OVER (PARTITION BY r.id) AS COUNT FROM
| repository r JOIN star s ON s.repository_id = r.id
|
| ORDER BY 4 DESC;
| jakewins wrote:
| This is not what the query in the post is doing.
|
| You are counting all stars of all repos, they are counting
| stars of one (parameterized) repo id.
| hotdamnson wrote:
| I just posted the essence of the query, add
|
| Where r.id = :repo
|
| and you will have the same thing.
| Jonhoo wrote:
| :wave: Author of the paper this work is based on here.
|
| I'm so excited to see dynamic, partially-stateful data-flow for
| incremental materialized view maintenance becoming more wide-
| spread! I continue to think it's a _great_ idea, and the speed-
| ups (and complexity reduction) it can yield are pretty immense,
| so seeing more folks building on the idea makes me very happy.
|
| The PlanetScale blog post references my original "Noria" OSDI
| paper (https://pdos.csail.mit.edu/papers/noria:osdi18.pdf), but
| I'd actually recommend my PhD thesis instead
| (https://jon.thesquareplanet.com/papers/phd-thesis.pdf), as it
| goes much deeper about some of the technical challenges and
| solutions involved. It also has a chapter (Appendix A) that
| covers how it all works by analogy, which the less-technical
| among the audience may appreciate :) A recording of my thesis
| defense on this, which may be more digestible than the thesis
| itself, is also online at
| https://www.youtube.com/watch?v=GctxvSPIfr8, as well as a shorter
| talk from a few years earlier at
| https://www.youtube.com/watch?v=s19G6n0UjsM. And the Noria
| research prototype (written in Rust) is on GitHub:
| https://github.com/mit-pdos/noria.
|
| As others have already mentioned in the comments, I co-founded
| ReadySet (https://readyset.io/) shortly after graduating
| specifically to build off of Noria, and they're doing amazing
| work to provide these kinds of speed-ups for general-purpose
| relational databases. If you're using one of those, it's worth
| giving ReadySet a look to get these kinds of speedups there! It's
| also source-available @ https://github.com/readysettech/readyset
| if you're curious.
| exabrial wrote:
| for readyset: Is there a deb package available or something
| lighter weight than docker, kubernets, etc? I'd just like to
| run it as a regular unix process and start/stop it with
| systemd.
| greg-m wrote:
| yes! shoot me an email - greg@readyset.io - we're in the
| process of building binaries for more platforms, lmk which
| you need.
| exabrial wrote:
| I mean just the standard x86/ubuntu 22.04 would be nice.
| It'd reduce a lot of friction to people try to evaluate
| your product!
| brancz wrote:
| I don't really know either very well, but how does Noria
| compare to Naiad? Are they comparable at all?
|
| I already had Naiad on my reading list, definitely adding Noria
| as well! Thank you very much for your work!
| Nican wrote:
| Awesome! I have seen PlanetScale hype up this release for weeks,
| and glad to finally be reading about it.
|
| My initial thoughts after reading the blog post, just to poke
| holes in their new product:
|
| 1. Costs. This can save time on read, but it is also introducing
| additional writes to the database, that can be pretty expensive.
| PlanetScale can scale horizontally, but have to watch out how
| much it is going to be paying for the extra machines. (Albeit-
| machines are usually always cheaper than developers)
|
| 2. Consistency. It was not clear if it is going to make
| committing transactions slower to keep all the views up to date,
| or if the materialized view is running slightly behind real-time.
|
| 2a. And how does the materialized view handle large/slow
| transactions? Is there going to be any kind of serialization
| locks? Are the views correct inside of the transaction?
|
| 3. Predictability. Query planning is a necessary hell, and
| different queries might have different patterns that might
| introduce slightly different materialized views, that could have
| been maybe served under the same view. Increasing the cost.
|
| 3a. SQL Server took a slightly different route lately for
| performance, in which queries will have different plans depended
| on the table statistics. I wonder how such a feature would play
| with Boost, and if slightly different query plans might generate
| different materialized views.
| mwarkentin wrote:
| The docs indicated the cache may be behind by a few hundred MS:
| https://planetscale.com/docs/concepts/query-caching-with-pla...
|
| > There is a small delay between when these changes are
| committed to the database and when the cache has been updated.
| This delay is typically measured in hundreds of milliseconds.
| Those of you familiar with MySQL replication can think of it as
| reading from a replica. Typically we've found that most use
| cases work perfectly fine, even when returning results that may
| be slightly out-of-date.
| tanoku wrote:
| Hey Nican! Thanks for the feedback. It wasn't clear from the
| blog post, but as the sibling poster points out, the system has
| full eventual consistency: it behaves like a replica, but it
| replicates a whole cluster of MySQL instances simultaneously
| (i.e. your full PlanetScale database). Because of this design,
| we never lock or affect the performance of writes to the main
| database.
|
| As for predictability, we're working on some interesting
| optimizations that allow similar queries to reuse the internal
| state of each other, so the system becomes more efficient the
| more queries it's caching. Stay tuned!
| joshstrange wrote:
| > As rows are inserted, updated, and deleted in the database, the
| cache is kept up-to-date in real-time, just like a read replica.
| No TTLs, no invalidation logic, and no caching infrastructure to
| maintain.
|
| This is so freaking neat. Caching is one of the harder things to
| get consistently right and even if this was a tool that had
| TTLs+API to invalidate it would be cool but not even having to
| worry about that is even better.
|
| PlanetScale continues to be an awesome service that lets you not
| worry about your DB and instead focus on your application.
|
| My only wish for PlanetScale would be a few more (lower) tiers.
| Their free tier is very generous but has a few little things
| (like more than 1 dev/prod branch) that aren't supported and I
| always feel antsy about not having a prod-like DB for qa/staging.
| I normally use 3 branches and the free plan only supports 2,
| which I think changed, I thought I used more than 1 dev branch
| before I started paying.
|
| I have a very burst-y application (it's for events, so it ramps
| up a few months before the event, then is crazy for 2-7 days
| during, then usage drops to pretty much 0 for the next ~9
| months), I'd love to lower my costs for those 9 months (I could
| look into downgrading to the free plan but I'd rather pay just a
| little less and have my quotas drop accordingly). In the end
| PlanetScale is still worth it for me at $360/yr so I'm not
| complaining too much. For smaller projects I just worry about
| using the PS free tier since if I go over those limits the jump
| is steep ($0->$30/mo), that said I might be overthinking it.
| datalopers wrote:
| they don't say but I assume this is an implementation of
| differential dataflow (edit: changed to a better link) [1]
|
| [1] https://www.microsoft.com/en-us/research/wp-
| content/uploads/...
| killjoywashere wrote:
| link is unresponsive. HN death hug?
| rbranson wrote:
| alternate: https://github.com/TimelyDataflow/differential-
| dataflow/blob...
| killjoywashere wrote:
| Also available here: https://www.microsoft.com/en-
| us/research/publication/differe...
| ignoramous wrote:
| See also https://readyset.io/ and https://materialize.com/
|
| There's also the exotic https://dynimize.com/ (unsure of
| their current state).
| rorymalcolm wrote:
| They acknowledge it is based off Noria[1] in the longer
| blogpost
|
| [1] https://pdos.csail.mit.edu/papers/noria:osdi18.pdf
| juancampa wrote:
| It's awesome to see Jon Gjengset's work being used in
| production. He has one of the best Rust youtube channels
| IMO and goes deep into Rust nuances. Highly recommended for
| anyone trying to learn Rust or deepen their understanding.
| dang wrote:
| The longer blogpost is now the URL at the top (we merged
| the threads)
| hinkley wrote:
| If there is one thing I'd like to have in this decade (well,
| last decade really but that ship has sailed) it would be a
| standardized format for WAL. There's been a whole bunch of tech
| over the last ten years that was in large part plastering over
| this hole. Having a reporting database that directly consumed
| WAL data from an OLTP database for instance.
| kevinburke wrote:
| Seems neat, but why is this better than Hadoop?
| LewisJEllis wrote:
| Hadoop isn't a database, they don't do anything close to the
| same thing. Nobody is cross-shopping PlanetScale vs Hadoop.
|
| The cross-shop is PlanetScale vs Amazon RDS, Amazon Aurora,
| Google Cloud SQL, Firebase, Supabase, self-hosting Vitess or
| MySQL, etc.
| gigatexal wrote:
| Because Hadoop is super duper slow? Isn't that why the industry
| moved away from it years ago?
| marzoevam wrote:
| It's super exciting to see Noria-based partially materialized
| views get this well-deserved airtime! Eliminating error-prone
| caching logic without any code or infrastructure changes in the
| context of _any_ database is our core mission over at ReadySet,
| and is the reason why Jon Gjengset and I spun the company out of
| MIT research on Noria back in 2020. You can read more in our
| initial announcement here: https://readyset.io/blog/introducing-
| readyset
|
| If you're reading this announcement post and want to play around
| with instant query caching ala Noria in your existing Postgres or
| MySQL database, shoot me a me an email and we'll bump you up on
| our cloud waitlist :) alana@readyset.io
| saybar wrote:
| At PolyScale [1], we agree that eliminating error-prone caching
| logic without any code or infrastructure changes is a worthy
| goal. However, we have taken a different approach to caching,
| zero configuration, fully automated.
|
| You can get connected today in a few minutes, without code or
| configuration. PolyScale supports Postgres, MySQL, MariaDB and
| SQL Server, with GraphQL with others coming soon. You can also
| try the live demo [2].
|
| [1] https://www.polyscale.ai/ [2]
| https://playground.polyscale.ai/
| vyrotek wrote:
| This looks really great. Happy to see SQL Server support
| there.
| emptysea wrote:
| I'm really curious how this works and how it's implementation
| compares to something like materialize -- I wonder if there are
| any caveats around consistency
| giovannibonetti wrote:
| It seems similar to MIT's Noria [1]
|
| > Noria is a new streaming data-flow system designed to act as
| a fast storage backend for read-heavy web applications based on
| Jon Gjengset's Phd Thesis, as well as this paper from OSDI'18.
| It acts like a database, but precomputes and caches relational
| query results so that reads are blazingly fast. Noria
| automatically keeps cached results up-to-date as the underlying
| data, stored in persistent base tables, change. Noria uses
| partially-stateful data-flow to reduce memory overhead, and
| supports dynamic, runtime data-flow and query change.
|
| [1] https://github.com/mit-pdos/noria
| nickvanw wrote:
| Great question! We have a technical blog post about how
| PlanetScale Boost is implemented:
| https://planetscale.com/blog/how-planetscale-boost-serves-yo...
|
| In short, it can be compared in consistency to an up-to-date
| read replica; PlanetScale Boost uses Vitess' VStream to process
| events as they happen and keep itself up to date. The blog has
| much more information if you're curious.
| [deleted]
| dang wrote:
| (We've merged the threads so that blog post is now the URL at
| the top)
| nickvanw wrote:
| Thank you as always for everything that you do @dang!
| kerblang wrote:
| It appears the catch is that you have to use their managed
| service; no DIY installation.
| https://planetscale.com/docs/concepts/deployment-options
|
| Acceptable for some, maybe not others
| Eclyps wrote:
| I just started using Planetscale for small projects here and
| there. More and more of my projects are FE-heavy and don't
| require a big dedicated database (NextJS apps, mostly hardcoded
| designs or headless CMS like Sanity). There are times where I
| need to store just small bits of data, maybe contact form
| submissions or something. It's been super great to be able to
| quickly hook up planetscale to a nextjs api function and have
| that data persisted within a matter of minutes.
|
| I've yet to use it on anything large-scale, though, so I can't
| speak to performance when you're really pushing it.
| aantix wrote:
| Didn't MySQL implement query level caching a while back?
| ryanisnan wrote:
| I assume this is a bit of a joke, but query caching at least
| was not good in 5.5-5.7, so it would often be disabled. I don't
| know how 8 performs.
| rbranson wrote:
| It was removed from MySQL in 8.0 because it wasn't very useful.
| MySQL query caching does exact matching on the query string and
| any row update to a table used for a given cached query nukes
| the entire cache. So it's only useful for a small set of niche
| use cases where tables are essentially static.
| CharlesW wrote:
| Dupe: https://news.ycombinator.com/item?id=33610996
| endisneigh wrote:
| Anyone compare this and cockroachdb?
| capableweb wrote:
| Slightly off-topic but trying to understand something from the
| landing page:
|
| > Powered by open source tech - Built at Google to scale
| YouTube.com to billions of users
|
| Is this a Google project/business owned by Alphabet? The text
| seems to indicate so, but I find no information about it when
| doing some quick searching or browsing through the website.
| aarondf wrote:
| Nope! That part you quoted is referring to Vitess, which was
| built at Google to scale YouTube.
|
| See more: https://planetscale.com/vitess
| capableweb wrote:
| Aha, I see. Thanks for explaining!
|
| So I'm guessing PlanetScale now helps maintain Vitess and
| PlanetScale is somewhat of a hosted Vitess for people who
| don't want to self-host?
| aarondf wrote:
| Yup! Vitess is at the core of PlanetScale and enables us to
| add lots of cool stuff on top (branching, Boost, etc) but
| Vitess itself is still open source!
| _ben_ wrote:
| For database caching outside of PlanetScale, PolyScale.ai [1]
| provides a serverless database edge cache that is compatible with
| Postgres, MySQL, MariaDB and MS SQL Server. Requires zero
| configuration or sizing etc.
|
| 1.https://www.polyscale.ai/
| rbranson wrote:
| I tried to use PolyScale in the past but had issues with
| performance because updating a row would invalidate the entire
| cache. I wonder if that has improved?
| _ben_ wrote:
| Yes, in the early versions of the automated invalidation, the
| logic cleared all cached data based on tables. That is no
| longer the case. The invalidations only remove the affected
| data from the cache, globally. You can read more here:
| https://docs.polyscale.ai/how-does-it-work#smart-
| invalidatio...
| rbranson wrote:
| It didn't impact everything, I think I was hitting this
| case:
|
| > When a query is deemed too complex to determine what
| specific cached data may have become invalidated, a
| fallback to a simple but effective table level invalidation
| occurs.
| saybar wrote:
| We've made a lot of changes - give it a try again or feel
| free to reach out to support@polyscale.ai and we'd be
| happy to assist you.
| edmundsauto wrote:
| I just started a small hobby project and selected supabase for my
| db provider. Anyone with experience in both Supa and PlanetScale
| care to comment about the differences?
|
| To me, it looks like supabase is designed to take full advantage
| of postgres features. plpgsql triggers + RLS + clientside auth +
| streaming changes to subscribers (including via web hooks) are my
| favorite features. (They also have js edge functions, but I use
| lambda instead b/c I prefer python)
|
| Supabase feels like the scrappy company with amazing focus, akin
| to an early MailChimp (circa 2007). PlanetBase feels more like
| early Snowflake - massive scale, focus on performance, can match
| anything feature-by-feature. One is a master of their craft, the
| other is a gorilla at scale.
|
| Curious what others think. I haven't used PlanetBase extensively
| so don't have much to go on except their marketing.
| Jarwain wrote:
| That sounds about right from my understanding. Supabase was
| made as an alternative to firebase, acting as a data layer with
| a lot of features simplifying application development.
|
| Planet Base feels like Snowflake, or some aspects of fly.io, or
| timescale's managed cloud offering; their focus is on the core
| database tech and delivering that in a scalable manner.
| greg-m wrote:
| ReadySet (readyset.io) supports the same style of caching and
| works with Supabase, if you want to check us out :)
|
| I have a few extra cloud invites: greg@readyset.io
| edmundsauto wrote:
| Much appreciated, I will check you out for my next project.
| Right now I'm not able to migrate as I'm trying to get an MVP
| up and running and have spent a few days deeply integrating
| w/ Supabase.
|
| What are your core value prop differences between your
| service and sb? Just curious how I should think about your
| offering compared to what I'm familiar with.
| greg-m wrote:
| We work with supabase - so your DB still deploys out to
| them, but we cache queries in memory so they're
| significantly lower latency. Since we're offloading reads,
| we also help handling traffic spikes, lower costs, etc.
___________________________________________________________________
(page generated 2022-11-15 23:00 UTC)