[HN Gopher] Databases in 2022: A Year in Review
___________________________________________________________________
Databases in 2022: A Year in Review
Author : gavinray
Score : 171 points
Date : 2023-01-02 17:17 UTC (5 hours ago)
(HTM) web link (ottertune.com)
(TXT) w3m dump (ottertune.com)
| friedman23 wrote:
| Happy to find this blog, wish there was more talk about NewSQL
| databases. I've personally seen the amount of cost that goes into
| migrating off of traditional RDBMs systems to sharded RDBMs
| systems and was wondering why people don't go with these scalable
| systems from the start now that they exist.
| pm90 wrote:
| Probably because there is a lot more operational understanding
| of "traditional" RDBMs and they're sufficient in early stages.
| Once your product takes off it can be difficult to scale but by
| that time you probably have enough money to attract engineers
| that can do the migration.
| friedman23 wrote:
| Yes, that's the common refrain, just migrate when you hit
| scale. Just pay people to do the migration. My experience at
| multiple, billion dollar companies has been that it's never
| that easy.
|
| People become afraid of doing schema migrations and tools
| like dynamodb become the default datastore choice for all new
| feature development.
|
| Every single startup I've worked at, technical cruft that
| existed from company founding never disappears despite the
| founders intent that it was never supposed to stay that way
| and it slows down feature development to a crawl.
|
| If NewSQL databases exist that support 90% of the features of
| postgres but give support for cross shard transactions then
| I'd rather use them than technology that has been proven to
| not scale. Unfortunately I'm not an expert on distributed
| databases and there is not much written about these databases
| in practice outside of the startups creating the databases
| themselves.
| likeabbas wrote:
| +1. Recently, Yugabyte has seemed intriguing to me. (It seems
| like) You can launch it as a standalone instance and use it
| just like Postgres early in your company, and then when you
| need to distribute, you can do that.
| brnewd wrote:
| I think many love the promise of a distributed database, but
| in practice there is a real speed and feature set compromise
| to make.
| zX41ZdbW wrote:
| > he claimed that because MonetDB focuses on read-only OLAP
| workloads, then MMAP was good enough
|
| ClickHouse has switchable IO engines: - read; - pread; - mmap; -
| pread_threadpool;
|
| The best performance is achieved with the combined method, named
| 'pread_threadpool'. It uses the `preadv2` syscall to check if the
| file is in the page cache.
|
| More details here:
| https://github.com/ClickHouse/ClickHouse/pull/26791
|
| There was also big research of performance on HDD, SSD, and
| Optane, comparing various methods:
| https://clickhouse.com/blog/a-journey-to-io_uring-aio-and-mo...
|
| And there is a pull request adding io_uring:
| https://github.com/ClickHouse/ClickHouse/pull/38456 -
| Unfortunately, it's unfinished and cannot be merged because the
| CI has found bugs. Nevertheless, the advantages of io_uring for
| analytical databases are negligible.
|
| The `mmap` method is still useful. For example, for data import.
| See here: https://github.com/ClickHouse/ClickHouse/pull/43927
| nikita wrote:
| Neon CEO here.
|
| There are a number of ideas in the database space that the
| industry is adopting across the board:
|
| - Separation of storage and compute (Neon, AlloyDB, Aurora).
| Every cloud database should built one. It's a big undertaking,
| but benefits are undeniable.
|
| - Good query processor for analytics (Snowflake, Velox,
| Singlestore)
|
| - Open source. Especially in OLTP open source == trust
|
| - HTAP. Can run mixed workload (Singlestore, Unistore): both OLTP
| (apps) and OLAP (reporting). This has always been a dream, but we
| still live in the world of dedicated systems: E.g. Snowflake and
| Postgres.
|
| - Shared nothing sharding (Vitess). This is the most
| controversial as you lose compatibility with the mothership
| (MySQL for Vitess). So it's unclear this will be the dominant
| architecture in the future. I think the world may get to "dynamic
| sharding" where storage stays separate and compute can be
| multinode and the user can easily and instantly change the number
| of nodes.
| emptysea wrote:
| I'm not well versed in the tech but is Spanner the sort of
| dynamic sharding approach your talking about?
| gavinray wrote:
| Scylla also uses a shared-nothing sharding approach, they
| have an open-source "asynchronous I/O engine" toolkit that is
| used inside of ScyllaDB called Seastar
|
| They have some good reading on this approach in the Seastar
| docs if you're curious, it's a great general introduction to
| Shared Nothing:
|
| https://seastar.io/shared-
| nothing/#:~:text=The%20Seastar%20M....
|
| https://www.linuxfoundation.org/webinars/under-the-hood-
| of-a...
|
| This is usually generalized to some variation of "Physical-
| OS-thread-per-CPU-core" architecture, where each CPU core
| gets it's own dedicated chunk of data. You use hashing or
| some other mechanism to route requests for data to the right
| core.
|
| Each CPU core can have user-land concurrency or "virtual
| threads" (fibers/coroutines/etc) as well, so you're not
| limited to sequential code necessarily.
|
| A combination of thread-per-core x userland-coroutines-per-
| thread is how modern query processing approaches like
| "Morsel-Driven Parallelism" work. CPU-1-OS-
| THREAD-1 COROUTINE_1 COROUTINE_2
| CPU-2-OS-THREAD-2 COROUTINE_35 ...
| COROUTINE_102
|
| https://db.in.tum.de/~leis/papers/morsels.pdf
| swyx wrote:
| since you were also a founder of Singlestore, just wondering -
| what will be Neon's take on HTAP workloads?
|
| trying to see how much you mean it that the industry is
| "adopting (HTAP) across the board"
| nikita wrote:
| I wouldn't say HTAP is an across the board trend. The need is
| mostly in the enterprise where we see high value mixed
| workloads.
|
| Neon will take a Postgres centric POV. I consider a few
| approaches, but not committing to any particular at this
| point. The reason to not commit is that there is a lot more
| opportunity for Neon in serverless, devtools, and edge. And
| it makes a ton more sense to partner for OLAP workloads. So
| what is Postgres centric POV:
|
| - Plugins like Timescale and Citus
|
| - Seamless integration with sending CDC to Snowflake,
| Singlestore, Databricks, Clickhouse, and Motherduck
|
| - FDW for integration with Snowflake, Singlestore,
| Clickhouse, Databricks, and Motherduck
|
| - We looked at putting DuckDB on Neon storage with Jordan
| Tigani, but it was too big of a lift as DuckDB doesn't push
| all data through the WAL. Maybe in the future.
| zX41ZdbW wrote:
| The most developed analytical query engine as of today is
| ClickHouse [1].
|
| It is open-source, has the separation of storage and compute
| and the best performance.
|
| ClickHouse is a source of inspiration for many emerging
| database engines: Velox, Doris, DuckDB, Datafusion...
|
| [1] https://github.com/ClickHouse/ClickHouse
|
| Disclaimer: I work on ClickHouse.
| ddorian43 wrote:
| > has the separation of storage and compute and the best
| performance.
|
| This part is not yet production ready though. And it's
| unclear when it will be. Though no other open source OLAP
| does it either AFAIK.
| jandrewrogers wrote:
| The separation of storage and compute is controversial, though
| it is definitely convenient for cloud operators. It is
| literally going backward in terms of the available storage
| bandwidth, and many modern database architectures can very
| effectively use all the storage bandwidth available in modern
| systems. There is not enough network bandwidth, and won't be
| for the foreseeable future, for running storage over networks
| to be efficient. Database workloads are also starting to move
| to edge environments, which make it even worse. The resource
| mismatch currently leaves a lot of compute sitting idle in
| practice.
|
| Another issue with this model is that it tends to offer poor
| ratios of bandwidth to storage size. This is at odds with the
| desire to run mixed workloads in a single system.
| glogla wrote:
| The separation of compute and storage is most of all
| convenient for whoever operates the database.
|
| Need more compute, or scale it down when not in use? You can
| add it in minutes, no re-balancing, no careful moving of
| data. Need more storage? You have it on tap. Want to give
| some group of users their dedicated compute? Or run
| differently configured computes on top of one data, including
| even using different compute engines for batch and for
| interactive? You can.
|
| Compare Snowflake "create warehouse", which can give you
| compute that can immediately access any data, with a dance
| you need to do to resize Clickhouse and decide which tables
| are replicated in what way.
|
| At the same time, everything you say is true, and that is why
| system without storage-compute separation can give you order
| of magnitude more performance for the same money than a
| separated system can. If someone could solve the fragility
| and hassle of it, it would be quite a winner.
| nikita wrote:
| > The resource mismatch currently leaves a lot of compute
| sitting idle in practice.
|
| When separate storage is multi tenant and compute is elastic
| you can really squeeze more efficiency. With that you can
| keep utilization extremely high, b/c storage is effectively
| spread across the whole fleet. We really see this now running
| 10K databases on a single (single per region really) storage
| deployment.
|
| Compute is also elastic, so you again can drive utilization
| up. So my argument is that separation of storage and compute
| is not necessarily for performance but rather for
| utilization, convenience to operate, and additional
| capabilities (branching, serverless).
| AdamProut wrote:
| There are hybrid designs for separation of storage and
| compute that are aimed at mixed workloads[1]. They avoid
| writes to remote storage on transaction commit (i.e., act
| like a shared nothing databases for commits, but still push
| data asynchronously to a shared remote disk that can be used
| for scale up/point in time restores/branching).
|
| [1] https://www.singlestore.com/blog/separating-storage-and-
| comp...
|
| (disclosure: SingleStoreDB CTO)
| nikita wrote:
| Adam, you still synchronously replicate each log record to
| 2 places, right? Technically this should be roughly
| equivalent to writing into a consensus. And use write
| through cache for reads.
| AdamProut wrote:
| Yep, for writes network bandwidth usage is independent of
| separation of storage of compute in some sense. Any
| database that provides high availability is writing over
| the network somewhere before it acks a transaction
| committed. It matters who is on the other end of that
| network write though. Take the typical Cloud DW (i.e.,
| Snowflake) design of forcing writes to the blob storage
| (shared remote disks) before commit. That is a much
| higher latency write then what a high performance
| transaction log replication protocol will do to replicate
| a write to another host.
| gavinray wrote:
| > The separation of storage and compute is controversial
|
| I don't think it's a controversial topic, it's an emerging
| trend. See for example this slide from the recent RocksDB
| meetup on their vision for the next 3 years of RocksDB's
| future:
|
| https://twitter.com/GavinRayDev/status/1600666127025156096
|
| Disaggregated storage is becoming quite a hot topic, and
| there are several technologies fueling this.
|
| There are two major advances happening in terms of disk I/O
| right now (that aren't to do with async IO uring stuff)
|
| 1. NVMe Zones + NVMe FDP specs, which are ways for userland
| applications to control placement of writes on drive (like a
| software Flash-Translation-Layer)
|
| 2. NVMe-over-Fabrics or similar technologies (with rdma/tcp)
| enables the access of a remote disk with 'comparable' latency
| to that of a local one.
|
| E.g. if local disk access is 100us, AWS can give you 110us
| with NVMe-OF.
|
| These two advances support the growing trend of disaggregated
| storage
|
| ---
|
| Full credit to Arun George from the Samsung Memory group for
| explaining this to me.
| jandrewrogers wrote:
| The storage-over-fabric latency improvements allow it to
| work well for software that was not particularly optimized
| for storage performance, which has obvious benefits that I
| am not arguing. There is a good reason storage-over-fabric
| has been a popular database deployment model for decades,
| it does have advantages, but this emerging trend is nothing
| new nor are the issues that it has.
|
| For systems that _are_ highly optimized for storage
| performance, storage bandwidth tends to be more important
| than storage latency and that is what is being sacrificed
| for the sake of disaggregating storage. I see integer
| factor reductions in workload throughput when deployed on
| disaggregated storage versus the usually NVMe JBOD, and
| that factor tracks real-world storage bandwidth pretty
| well, particularly at high storage densities. At scale,
| these economics start to matter quite a lot.
|
| To be clear, disaggregated storage has no effect on
| database design generally, seeing as how it has been widely
| used for decades. It does put an artificial floor on
| achievable throughput for many workloads due to the
| bandwidth bottleneck. When people realize this then the
| pendulum swings the other way (again), so I wouldn't want
| to overfit for disaggregated storage.
| gavinray wrote:
| I didn't realize who I was talking to, just checked the
| username
|
| Would encourage other folks to do the same/read post
| history
| sitkack wrote:
| The truthiness of your comments shouldn't change upon who
| hears them?
| gavinray wrote:
| No, I still stand by my opinion that I don't find
| disaggregated storage controversial
|
| But if you're going to take an opinion on databases from
| someone, jandrewrogers is probably a good person to take
| them from. Me, on the other hand, I'm just a
| hobbyist/enthusiast, I've no accolades under my belt and
| my experience on the engineering end is shy of a year.
| gavinray wrote:
| Really enjoyed the talk from your engineer Heikki at Andy's CMU
| "Databases!" seminar series earlier this year, great stuff =)
|
| https://www.youtube.com/watch?v=rES0yzeERns
|
| Also, it's been neat to have mostly worked our replacing our
| old Heroku integration at Hasura with you folks.
| Alifatisk wrote:
| Is there a site that lists every db in every category?
|
| Like Timescale, Clickhouse for time series & Mongodb for
| documents etc
| [deleted]
| jsumrall wrote:
| Check out https://db-engines.com/en/ranking
|
| Note that you can select categories on the left side to limit
| rankings to that category.
| boberoni wrote:
| Dbdb.io has a catalog of database tech. It is maintained by the
| CMU database research group.
|
| https://dbdb.io
| apavlo wrote:
| It's maintained by me (also the author of this article).
| yingjunwu wrote:
| I'm the founder of RisingWave Labs. Glad to know that our company
| was mentioned in the blog.
|
| Technology wise, agreed with Nikita that storage-compute-
| decoupling is the trend. Severless will also be the main focus.
|
| Market wise, I observed several trends:
|
| * streaming. streamlining the data pipeline is a new trend.
| RisingWave and Materialize are both addresssing this market. They
| will probably need to figure out how to compete against
| established solutions like KsqlDB and Apache Flink.
|
| * real-time. There are probably 20+ vendors selling real-time
| OLAP databases. They all claim that they are cheaper, faster,
| better than Snowflake/Redshift. Just name a few: Druid, Pinot,
| ClickHouse, StarRocks etc. Let's see who will be the winner in
| 2023.
|
| * small data. The biggest winner in this domain is Motherduck, a
| startup commercializing DuckDB. Redshift/Snowflake are a overkill
| for most companies who do not own large amount of data.
| Motherduck may be the right choice.
|
| * hybrid. Several companies have started selling hybrid databases
| (mainly HTAP for now): Google AlloyDB, Snowflake Unistore,
| SingleStore, PingCAP TiDB+TiFlash, etc.
|
| In 2023, we will also watch how Databricks challenges Snowflake
| in the data warehouse domain.
|
| Best luck!
| danielvaughn wrote:
| Druid is indeed a bit faster than Snowflake, but a previous
| team I worked on was able to achieve similar latencies under
| heavy load using some clever techniques with materialized
| views.
| lioeters wrote:
| Seems like Supabase deserves a mention, with their contributions
| to the Postgres ecosystem.
|
| https://supabase.com/
|
| pg_graphql - https://supabase.com/blog/pg-graphql-v1
|
| pg_jsonschema - https://supabase.com/blog/pg-jsonschema-a-
| postgres-extension...
|
| Postgres WASM - https://supabase.com/blog/postgres-wasm
|
| ..And other open-source features.
| gavinray wrote:
| Supabase didn't do Postgres WASM first, wasmer did FYI (3 years
| ago)
|
| https://github.com/wasmerio/wasmer-postgres
| kiwicopple wrote:
| (supabase ceo)
|
| these are two different things - wasmer runs WASM code
| _inside_ postgres (as an extension), whereas our
| implementation runs an entire Postgres server as WASM (eg,
| you can run it in a browser: wasm.supabase.com)
|
| We weren't the first to do this though - CrunchyData was.
| Theirs wasn't open source, so we worked with Snaplet to build
| an open source version and added networking:
| https://news.ycombinator.com/item?id=33067962
| gavinray wrote:
| My mistake, I fully eat crow, and this is what I get for
| not reading the thing I argued against!
| apavlo wrote:
| We want to write a paper on the vast extensions ecosystem for
| Postgres at some point. It's on my TODO list for 2023.
|
| Having so many plug-ins/add-ons available is unique to Postgres
| and due to Stonebraker's original vision of including
| extensibility as a first-class design principle in the DBMS.
| dang wrote:
| I didn't know that and it would make for an interesting
| historical article in its own right.
| danielvaughn wrote:
| Also Planetscale
|
| edit: I'm okay with being downvoted, but I'm honestly asking
| why, not complaining. Is Planetscale not an interesting
| database?
| mritchie712 wrote:
| fwiw, I don't get the downvotes either
| viraptor wrote:
| Possibly lack of context. Without the edit, I would assume
| you're going for a version of "webscale" meme about postgres.
| (Haven't heard of planetscale before)
| danielvaughn wrote:
| Ah interesting. Nope, definitely talking about a real
| database. I think cloud native is interesting, and two
| solutions I'm aware of in the space are CockroachDB and
| Planetscale.
| mythhouse wrote:
| > Timescale's $110m series C, Voltron Data's $110m seed + series
| A, and Dbt Labs's $222m series D.
|
| weird that dbt is mentioned in 'big database' funding. Also
| suprised to see the huge investment in dbt.
| rorymalcolm wrote:
| I think it makes sense in the context of the poential of plain
| old SQL (and additions such as dbt) winning out vs Python or
| other general purpose programming solutions as the analytics
| workbench (although this is controversial, there's a school of
| thought that backs it).
| eximius wrote:
| What I want is a new database that defines tables in terms of
| _rich_ structures like proto, like Spanner.
|
| If that doesn't exist in a couple years, maybe I'll try to build
| one.
| lukeramsden wrote:
| > What I want is a new database that defines tables in terms of
| rich structures like proto, like Spanner.
|
| Do you mean like https://www.edgedb.com/? Or something more...
| rich?
| eximius wrote:
| Well, that's certainly something. I haven't stumbled across
| this before.
|
| It's very interesting, but one reason I want it to be proto
| is so you can re-use that for bindings with your
| applications.
| hardwaresofton wrote:
| You might be interested in Project M36:
|
| https://github.com/agentm/project-m36
| [deleted]
| saurabhnanda wrote:
| What an amazingly written piece. I was learning interesting stuff
| and chuckling alongside, as well.
|
| Didn't get the closing remarks about Larry Ellison, though. Any
| explanation?
| SPBS wrote:
| Oh great, it seems like this year he made the sarcasm about
| Larry Ellison a lot easier to pick up. Frankly I was still
| unsure how he really felt about Larry after his review from
| last year[1]. After reading this, it confirms to me that he
| doesn't like Larry :).
|
| [1] https://news.ycombinator.com/item?id=29733729
| friedman23 wrote:
| The Larry Ellison remarks are written with a lot of sarcasm.
| mildbyte wrote:
| I mentioned it recently[0], but this looks like a very good topic
| to plug our new database, Seafowl, that we released last year
| [1]. It also uses Apache DataFusion (like IOx) and separates
| storage and compute (like Neon, Snowflake etc) but is designed
| for client-side Web apps to run analytical SQL queries over HTTP
| (using semantics that make the query results cacheable by browser
| caches and CDNs). This makes it really useful for things like
| interactive visualizations or dashboards.
|
| We're currently doing a lot of work at Splitgraph to reposition
| the product around this "analytics at the edge" use case, with
| usage-based billing, eventually moving our query execution from
| PostgreSQL to Seafowl.
|
| [0] https://news.ycombinator.com/item?id=34175545
|
| [1] https://seafowl.io
| ddorian43 wrote:
| Your first link is not the correct one (click the timestamp of
| the comment to get the exact link)
| mildbyte wrote:
| Thanks for the catch! Fixed.
| getcrunk wrote:
| I see the author links to his youtube playlist for his CMU
| Databases class.
| https://www.youtube.com/playlist?list=PLSE8ODhjZXjaKScG3l0nu...
|
| Are there any good recommendations for a lecture or series of
| lectures that go over Databases from a theoretical perspective
| but also do a practical survey of the various DB systems and
| technologies out there?
| marsupialtail_2 wrote:
| I am writing a Python based SQL query engine:
| https://github.com/marsupialtail/quokka. My personal goal is to
| get Andy Pavlo to mention it in his year-end blogs.
|
| I agree with many of the points made in the blog by Andy. Writing
| a distributed database has become way easier due to open source
| components like Ray, Arrow, Velox, DuckDB, SQLGlot etc.
|
| I personally believe we will see a switch from JVM based
| technologies to Rust/C based with Python wrapper
| apavlo wrote:
| I saw your engine a week or so ago. Good stuff. I wanted to add
| it to dbdb.io but I couldn't find an SVG version of this:
|
| https://github.com/marsupialtail/quokka/blob/master/docs/doc...
|
| Can you email it to me?
| marsupialtail_2 wrote:
| Just did.
| bullen wrote:
| I came back to MariaDB after leaving MySQL about a decade ago.
|
| The JDBC driver is 10x as big and times-out after less than 12
| hours default.
|
| Things are not going in the right direction, but on the upside I
| now learned how to install MariaDB manually so installers can't
| brick my OS like MySQL did back in 2014 (MySQL bricked itself and
| no way to unbrick it except reinstalling Windows).
| KronisLV wrote:
| > Things are not going in the right direction, but on the
| upside I now learned how to install MariaDB manually so
| installers can't brick my OS like MySQL did back in 2014 (MySQL
| bricked itself and no way to unbrick it except reinstalling
| Windows).
|
| For local development and testing, running a database in a
| container is generally a good idea:
| https://hub.docker.com/_/mariadb/
|
| You know, if you ever want to be able to create various
| throwaway environments with resource limits or port mappings
| easily, for any number of versions you might need to run in
| each of your projects.
|
| This works similarly well for something like PostgreSQL, too!
| And if you need persistence, you can use either volumes or bind
| mounts.
| gavinray wrote:
| Can we also start a general "Andy Pavlo" appreciation comment
| chain?
|
| Please comment below if Andy has in any way positively improved
| your life or helped you further your education/career.
|
| I can start:
|
| - Early in 2022, I decided I wanted to learn how databases like
| Postgres work and build one from scratch. If it wasn't for the
| CMU lectures available to the public on YouTube, I don't think I
| would have been able to give myself anywhere near a decent
| education.
|
| - This year, Andy even made the homework-grading platform
| available to the public, so you could submit solutions to the
| database coursework and get graded as if you were a student
| attending CMU.
|
| - Not only does Andy take the time to record, edit, and upload
| his lectures to the public, but he hosts a yearly seminar series
| where he interviews speakers from all over the database industry
| (https://db.cs.cmu.edu/seminar2022)
|
| - Andy does all he does (Professorship) on top of running
| OtterTune, I have no idea how he functions. He truly is the
| realest of all gangsters, and a secret member of the Wu-Tang
| Clan. Andy makes databases fun.
| [deleted]
| polskibus wrote:
| Can you point me to the homework-grading platform ? Can one use
| it outside of standard course schedule ? It would be great to
| do the task in my own pace.
| gavinray wrote:
| The homework solutions are submitted to a platform called
| "Gradescope", see this tweet and the tweets above it in the
| thread for more details:
|
| https://twitter.com/andy_pavlo/status/1564305764730167303
|
| Also, I should mention there is a very large (unofficial)
| Discord server (+1,000 people) where some of the TA's from
| this DB course and myself hang out if anyone is interested:
|
| https://discord.gg/7HPufafdsW
|
| And if you find database engineering interesting in general,
| I might also invite you to come hang out in Phil Eaton's
| hacker-space Discord, which is quite good:
|
| https://discord.gg/SAqUyrPkY7
| aanfhn wrote:
| What are people's and the market take on Starburst? I haven't
| kept up with databases and the general market, but recently heard
| about them. Is there a sizeable demand and market for them? Any
| other big players to watch for?
___________________________________________________________________
(page generated 2023-01-02 23:00 UTC)