[HN Gopher] How we built a serverless SQL database
       ___________________________________________________________________
        
       How we built a serverless SQL database
        
       Author : TheresaBraccio
       Score  : 179 points
       Date   : 2021-10-26 15:04 UTC (7 hours ago)
        
 (HTM) web link (www.cockroachlabs.com)
 (TXT) w3m dump (www.cockroachlabs.com)
        
       | boynamedsue wrote:
       | Any plans to expand the region availability for AWS beyond us-
       | west-2 in the US? I am interested in us-east-2.
        
         | andydb wrote:
         | Yes, definitely, we'll be expanding to more regions soon.
        
       | ed25519FUUU wrote:
       | > _And you'll never be surprised by a bill, because you can set a
       | guaranteed monthly spend limit._
       | 
       | It's amazing that this a killer and not standard feature, but
       | here we are!
        
       | geoduck14 wrote:
       | How does this compare to Snowflake?
        
         | reilly3000 wrote:
         | Snowflake is OLAP and billed based on usage time + storage.
         | This looks like it's a regular OLTP SQL DB and pay by request.
        
       | phoboslab wrote:
       | > If you've created a database before, you probably had to
       | estimate how many servers to use based on the expected traffic.
       | 
       | The answer is "one". If you have less than 10k req/s you
       | shouldn't even start to think about multiple DB servers or
       | migrating from bog-standard MySQL/MariaDB or Postgres.
       | 
       | I will never understand this obsession with "scaling". Modern web
       | dev seriously over-complicates so many things, it's not even
       | funny anymore.
        
         | [deleted]
        
         | brokencode wrote:
         | The question isn't whether you have 10k req/s now, but whether
         | you expect to in the future. If you are designing a blog, then
         | yeah, you probably don't need to worry about it. If you are
         | starting a social network or SAAS business application, then
         | you probably do.
        
           | GuB-42 wrote:
           | The future will be different.
           | 
           | A lot of successful businesses start with things that are not
           | scalable, and it is a strength, not a weakness. If you start
           | a social network for instance, you can't beat Facebook at its
           | own game. You have to do something that Facebook can't do
           | because it is too big. Scalability problems will be tackled
           | as you grow.
           | 
           | Among the many things Facebook can't do is running its
           | service on a single database. It makes things much harder on
           | them, thankfully, you are much smaller than Facebook and you
           | can. Take that advantage.
        
         | wyager wrote:
         | Alternatively, people really _do_ need to put a lot of thought
         | into scaling, but only because they did something like write
         | some core web service in an interpreted language framework that
         | maxes out at 100 requests per second.
        
         | bitwize wrote:
         | (robot voice) Does /dev/null support sharding? Sharding is the
         | secret ingredient in the web scale sauce.
        
           | zkldi wrote:
           | it's impressive how well that video has aged with modern web
           | dev.
        
         | lumost wrote:
         | The answer is unfortunately less clear cut. Particularly if you
         | assume that whoever is tasked with scaling this hypothetical DB
         | doesn't know what they are doing a-priori.
         | 
         | The following questions are likely to come up
         | 
         | 1) My t3.xl DB is down, how much bigger can I make it?
         | 
         | 2) My r3.24xl DB can only handle 100 TPS and now my site is
         | down, what can I do?
         | 
         | 3) My 2x r3.24xl DB cluster costs a lot of money, Are other
         | solutions cheaper?
         | 
         | 4) My latency is high, are other solutions faster?
         | 
         | For someone who hasn't dealt with these questions before, these
         | will become long and painful lessons with massive material
         | impacts to the business.
         | 
         | It's appealing to use Dynamo as it takes the magic out of
         | scaling. It's appealing to use serverless RDBMS as you don't
         | have to think about it anymore unless it has high
         | costs/latency.
        
         | mjb wrote:
         | What happens when that database fails? Are you OK losing some
         | data, or do you want the data to be synchronously replicated
         | off the machine and be available somewhere else after failure?
         | Distribution isn't only about scale, it's also about
         | availability.
         | 
         | What happens when that database loses some data? Do you want an
         | up-to-the second backup, or point-in-time recovery? Or are you
         | OK restoring last night's backup? Distribution isn't only about
         | scale, it's also about durability.
         | 
         | What happens when you need to run an expensive business process
         | ad-hoc? Do you want it to be easy to scale out reads, or to
         | export that data to an analytics system? Or are you OK building
         | something else to handle that case? Distribution isn't only
         | about scale, it's also about flexibility.
         | 
         | What happens when you want to serve customers in one market,
         | and make sure that their data stays local for regulatory
         | compliance reasons or latency? Are you OK with having separate
         | databases? Distribution isn't only about scale, it's also about
         | locality.
        
           | phoboslab wrote:
           | Fair points. I would argue that for most people a simple
           | master-slave setup with manual failover will produce far
           | fewer headaches than a "serverless" architecture.
           | 
           | When you are big enough to worry about the other issues, you
           | surely are big enough to handle the requirements in-house. I
           | see the dependence on some specific companies as the bigger
           | threat to reliability.
        
             | mjibson wrote:
             | The setup you describe is very much not simple. I worked at
             | a place with very good DBAs and our replication setup
             | caused us more downtime than anything else. Cockroach and
             | Spanner exist because many programmers observed that what
             | you describe is hard.
        
               | Groxx wrote:
               | As a counter-anecdote: multiple startup projects I've
               | worked on with separate MySQL setups where each had just
               | a single master + two slaves (one warm for fast failover
               | in case of hardware failure or upgrades, one cold for
               | slow analytics-style queries) did just fine with millions
               | (to tens of millions) of users. No downtime at all for
               | years on end.
               | 
               | MySQL and Postgres are _massively_ more widely-used than
               | Cockroach and Spanner, broadly very successfully. It 's
               | entirely feasible to run it with high uptime.
        
               | dboreham wrote:
               | Very few deployments experience actual failures. Could be
               | some fridge-door/light situation going on.
        
         | dcposch wrote:
         | This is a good argument against running a k8s cluster when u
         | don't need one, but not a good argument against this new
         | serverless Cockroach product.
         | 
         | Serverless is not just about auto scaling up from 1 to n, it's
         | about autoscaling down from 1 to 0.
         | 
         | If Cockroach provides a robust SQL DB at a marginal cost of
         | ~$0/mo for small request volumes, that is a real value add over
         | running your own pg server.
         | 
         | Not having to deal with administration or backups is another
         | big value add.
         | 
         | This offering looks like it compares very nicely to say running
         | an rds instance with auto backups enabled.
        
         | psadri wrote:
         | Unfortunately, easy to hit that with say GraphQL where each
         | client request can resolve to dozens of db selects vs a single
         | hand written/tuned SQL select.
        
       | babelfish wrote:
       | Really great post, thanks for sharing. I spent a lot of time a
       | couple months ago researching 'DBaaS' offerings (for fun, not
       | business) and found it difficult to find any posts outlining the
       | architecture of a DBaaS. Really cool to see CRDB putting this out
       | in the open.
        
         | andydb wrote:
         | It's been something we've done since the start and plan to
         | continue doing. If you read back over our engineering blog,
         | you'll find a surprisingly thorough description of the entire
         | CockroachDB stack, from the lowest storage layer to distributed
         | transactions, Raft consensus, SQL => key-value mapping, online
         | schema changes, cost-based SQL optimizer, Kubernetes usage, and
         | so on. In fact, when we onboard new engineers, we typically
         | point them to a stack of external blog entries to read in order
         | to get up-to-speed on how CockroachDB works.
         | 
         | Being open on how we solve hard problems is the way to build
         | our collective knowledge as a developer community. Certainly
         | CockroachDB itself has benefited enormously from all that has
         | gone before and been published in the open.
        
       | rabaut wrote:
       | Does CockroachDB Serverless expose an HTTP api? This sounds like
       | a great fit for use with Cloudflare Workers, but that requires an
       | http api.
        
         | sorenbs wrote:
         | Prisma will launch a data proxy, as well as support for
         | cockroachdb and Cloudflare workers early next year.
        
         | sharps_xp wrote:
         | Why do you prefer an HTTP API versus a DB connection? Isn't the
         | former going to inherently have the overhead cost of creating
         | the connection + TLS handshakes?
         | 
         | My question is similar, which is, is CockroachDB going to have
         | an equivalent RDS proxy so that apps can handle traffic spikes
         | and not have to deal with problems with DB connection pools
        
           | pistoriusp wrote:
           | I think we won't be getting socket connections in some of the
           | wasm powered JS runtime engines soon. Using http solves that,
           | and a bunch of caching issues.
        
             | estambar wrote:
             | there are some great projects that present an http api for
             | postgres that you could use with CRDB I think. I'm thinking
             | of something like https://github.com/pramsey/pgsql-http
        
               | pistoriusp wrote:
               | Now it would be awesome if we could use Prisma against an
               | open standard like this.
        
         | zaidf wrote:
         | I kind of assumed this came with an HTTP api, woops :) I've
         | been using Airtable with Workers for tiny side projects. This
         | seemed like a nice alternative to Airtable's lack of support
         | for vanilla SQL.
        
         | andydb wrote:
         | Great question. We recognize how important this is and are
         | actively working on it.
        
       | andydb wrote:
       | To any who might see this, I'm the author of the blog post, and
       | led the engineering team that built CockroachDB Serverless. I'll
       | be monitoring this thread in case there are any questions you'd
       | like to ask me about it.
        
         | dilyevsky wrote:
         | Haven't yet got time to read the whole thing so sorry if it's
         | already answered but is it possible to run this sql pod/storage
         | pod separation setup yourself with crdb community/enterprise?
         | We run enterprise crdb but it's all in one process (with
         | replicas)
        
           | andydb wrote:
           | It's not currently possible, partly because it complicates
           | the deployment model quite a bit. Dynamically bringing SQL
           | pods up and down requires a technology like Kubernetes. It
           | takes some serious operational know-how to keep it running
           | smoothly, which is why we thought it would be perfect for a
           | managed Cloud service.
           | 
           | What would be your company's reasons for wanting this
           | available in self-hosted CRDB? What kinds of use cases would
           | it address for you?
        
             | nonameiguess wrote:
             | It's understandly not something a lot of vendors consider,
             | especially early stage products, but classified systems
             | don't have Internet access and can't use managed cloud
             | services (unless offered by AWS via C2S, and possibly in
             | the future by whoever wins JEDI if they resurrect that),
             | leaving no choice but to self-host anything. That's the
             | extremest example, but lots of enterprises airgap part of
             | their network for other reasons, and private companies that
             | don't handle officially classified data may still be
             | restricted in where they can host data based on PII/PHI
             | legal regulations, and may not be able to use your data
             | center.
        
               | dilyevsky wrote:
               | In addition to security/legal stuff there are also
               | concerns around slo/business continuity and costs
        
               | andydb wrote:
               | Sure, those are all good points. We'll definitely keep an
               | eye on customer demand in these areas so we can make good
               | prioritization decisions. It can be frustrating when
               | there are so many things that we'd like to do, but we can
               | only pick a small subset to actually do.
        
             | dilyevsky wrote:
             | I'm actually thinking for side proj (I'm very familiar with
             | running production crdb in k8s via my current job) - the
             | usecase is to bring isolated "virtual" crdb cluster per
             | customer to save on operational overdhead of managing many
             | crdb clusters in kubernetes (which is doable but requires
             | extra automation). That's not how we handle customer
             | mutitenancy at my current job but I'm sure this could be
             | common usecase. Now I could use your cloud but I sure am
             | interested in diy option as well
        
         | vegasje wrote:
         | I'm quite confused by Request Units, and trying to predict how
         | many would be used by queries/operations.
         | 
         | I launched a test cluster, and the RUs are continuously
         | increasing without me having connected to the cluster yet. At
         | this rate of RU climb, the cluster would use over 8mil of the
         | available 10mil RUs in a month without me touching it.
         | 
         | Coming from AWS, one of the most difficult aspects of using
         | Aurora is guessing how many I/Os will be used for different
         | workloads. It would be a shame to introduce this complexity for
         | CockroachDB Serverless, especially if the RUs are impacted by
         | internal cluster operations that aren't initiated by the user.
        
           | sebbul wrote:
           | I haven't connected to my cluster but my RUs keep going up.
           | Extrapolating I'll be at 20M RUs over 30 days without using
           | it.
        
           | andydb wrote:
           | You've run into a "rough edge" of the beta release that will
           | be fixed soon. When you keep the Cluster Overview page open,
           | it runs queries against your cluster so that it can display
           | information like "# databases in the cluster". Unfortunately,
           | those queries run every 10 seconds in the background, and are
           | consuming RUs, which is why you see RU usage without having
           | connected to the cluster yet. But never fear, we'll get that
           | fixed.
           | 
           | One thing that may not be clear - you get 10M RUs for free,
           | up front, but you also get a constant accumulation of 100
           | RU/s for free throughout the month. That adds up to >250M
           | free RUs per month. This ensures that your cluster is always
           | accessible, and that you never truly "run out" of RUs - at
           | most you get throttled to 100 RU/s.
           | 
           | I hear you on the difficulty of understanding how your
           | queries map to RUs. SQL queries can be enormously complex and
           | differ by multiple orders of magnitude from one another in
           | terms of their compute cost. That's why we built a near real-
           | time dashboard that shows you how quickly you're consuming
           | RUs. You can run your workload for a few minutes and then
           | check back on the dashboard to see how many RUs that workload
           | consumed.
        
         | corentin88 wrote:
         | Do you have a getting started guide on CockroachDB Serverless?
         | I couldn't find one in your docs [1]. This looks very
         | interesting.
         | 
         | [1] https://www.cockroachlabs.com/docs/v21.1/example-apps.html
        
           | shampeon wrote:
           | The Quickstart [1] is what you're looking for. Examples for
           | NodeJS, Python, Go, and Java.
           | 
           | [1]https://www.cockroachlabs.com/docs/cockroachcloud/quicksta
           | rt...
        
         | jawns wrote:
         | What's your elevator pitch for why my organization should use
         | CockroachDB Serverless vs. something like AWS Aurora
         | Serverless, particularly if we're already relatively invested
         | in the AWS ecosystem?
        
           | andydb wrote:
           | Oh boy, I'm an engineer, but I'll do my best to pretend I'm
           | on the sales or marketing team for a minute...
           | 
           | First of all, CockroachDB Serverless is available on AWS, and
           | should integrate quite well with that ecosystem, including
           | with Serverless functions offered by AWS Lambda.
           | 
           | Here are a few advantages of CockroachDB Serverless that
           | Aurora will struggle to match (note that we're still working
           | on Serverless multi-region support):
           | 
           | 1. Free-forever tier. We offer a generous "free forever" tier
           | that doesn't end after a month or a year. As the blog post
           | outlines, our architecture is custom-built to make this
           | economical.
           | 
           | 2. No ceiling on write scalability. Even non-Serverless
           | Aurora runs into increasing trouble as the number of writes /
           | second increases past what a single machine can handle.
           | CockroachDB just keeps going. We've had multiple high-scale
           | customers who hit Aurora limits and had to move over to
           | Cockroach to support business growth.
           | 
           | 3. True multi-region support. Aurora only allows read-only,
           | stale replicas in other regions, while CRDB allows full ACID
           | SQL transactions. If you want to move into other regions of
           | the world and have latency concerns or GDPR concerns, CRDB is
           | custom-built to make the full SQL experience possible.
           | 
           | 4. No Cloud lock-in. Perhaps this is not a concern for you
           | company, but many companies don't like getting completely
           | locked in to a single Cloud provider. CockroachDB works on
           | multiple cloud providers and doesn't have a monetary interest
           | in locking you in to just one.
           | 
           | 5. Online schema changes. CockroachDB supports operations
           | like adding/removing columns, renaming tables, and adding
           | constraints without any downtime. You can perform arbitrary
           | schema changes without disturbing your running application
           | workloads. SQL DDL "just works".
           | 
           | 6. Cold start in an instant. CockroachDB clusters
           | automatically "scale to zero" when they're not in use. When
           | traffic arrives, they resume in a fraction of a second.
           | Compare that to Aurora, where you need to either have a
           | minimum compute reservation, or you need to endure multi-
           | second cold starts.
           | 
           | 7. Great support. We've got a friendly Slack room where you
           | can get free support and rub shoulders with fellow
           | CockroachDB users, as well as CockroachDB folks like myself.
           | We also have 24/7 paid support for deeper problems you might
           | encounter.
           | 
           | Taken altogether, CockroachDB can go wherever your business
           | needs it to go, without all the constraints that traditional
           | SQL databases usually have. Do you want thousands of clusters
           | for testing/development/tiny apps at a reasonable cost? Could
           | your business take off and need the scale that CRDB offers?
           | Could your business need to expand into multiple geographic
           | regions? Are some of your workloads erratic or periodic, but
           | still should start up instantly when needed? It's not just
           | about what you need now, but what you _may need_ in the
           | future. It makes sense to plan ahead and go with a database
           | that has  "got you covered" wherever you need to go.
        
             | zaphar wrote:
             | I'm sold. What's your migration story from decades old
             | MySQL Database over to CockroachDB?
             | 
             | I'm only half joking there.
        
               | estambar wrote:
               | not sure if this is working yet for Serverless but...
               | https://www.cockroachlabs.com/docs/v21.1/migrate-from-
               | mysql....
        
             | the_arun wrote:
             | You will do great with Sales role as well. Thanks for this
             | write up!
        
             | 9dev wrote:
             | That's a nice pitch, and certainly catched my curiosity!
             | Going to check out CRDB.
        
           | qaq wrote:
           | Not CDB employee but CDB scales beyond what Aurora can
           | support.
        
       | kendru wrote:
       | The lack of a serverless option was the only reason that I did
       | not use Cockroach on a recent project. I am excited to see the
       | serverless offering now, and the architectural details in the
       | post are awesome. Nice work!
        
       | timwis wrote:
       | This sounds great! I've wanted to create an open data portal for
       | a while that lets you spin up a (ephemeral, read-only) Postgres
       | database of a dataset and run queries on it, maybe with a
       | notebook. Sounds like this might be perfect!
        
         | Shelnutt2 wrote:
         | [Disclaimer: I work for TileDB, Inc]
         | 
         | We have developed just this[1] except using a MariaDB storage
         | engine (MyTile) we've written. You can serverless run queries
         | against TileDB arrays without spinning up a MariaDB instance.
         | You can run any type of query MariaDB supports (joins,
         | aggregates, CTE, etc). I've linked the basic documentation and
         | an example notebook below[2]. You can run SQL queries from
         | python/R or even JS or curl. We support a number of data return
         | formats, i.e. arrow and JSON to facilitate use cases.
         | 
         | I'll also mention that we have a number of public example
         | dataset[3] in TileDB cloud, such as the NYC taxi data used in
         | this notebook[4], which you can explore!
         | 
         | [1] https://docs.tiledb.com/cloud/api-reference/serverless-sql
         | 
         | [2] https://cloud.tiledb.com/notebooks/details/TileDB-
         | Inc/Quicks...
         | 
         | [3] https://cloud.tiledb.com/explore/arrays
         | 
         | [4] https://cloud.tiledb.com/notebooks/details/TileDB-
         | Inc/tutori...
        
         | chatmasta wrote:
         | You might like what we're building at Splitgraph:
         | https://www.splitgraph.com/connect
        
           | timwis wrote:
           | Oh wow, very relevant indeed! I guess I thought ephemeral DBs
           | would be better so that a user's expensive query wouldn't bog
           | down the db for other users. And rather than just limiting
           | them, enabling them to do whatever queries they could with pg
           | running locally
        
       | pachico wrote:
       | I really see the greatness of the serverless option.
       | Congratulations!
       | 
       | What I can't really understand is why would someone use the
       | dedicated cluster in AWS at that price.
        
         | qaq wrote:
         | pricing
        
         | dmitriid wrote:
         | - Pricing
         | 
         | - storage and db access is still mostly through hoops
         | 
         | - long running processes
         | 
         | - pain to develop, test and debug
        
       | estambar wrote:
       | One of the first apps to migrate from CockroachDB Dedicated to
       | Serverless is live now if someone wants to try it out.
       | https://web.flightchop.com/dashboard - posting for a friend.
        
       ___________________________________________________________________
       (page generated 2021-10-26 23:01 UTC)