[HN Gopher] Databases in 2024: A Year in Review
___________________________________________________________________
Databases in 2024: A Year in Review
Author : avinassh
Score : 315 points
Date : 2025-01-01 14:27 UTC (8 hours ago)
(HTM) web link (www.cs.cmu.edu)
(TXT) w3m dump (www.cs.cmu.edu)
| mihirrd wrote:
| Quite informative
| badindentation wrote:
| The section on Larry Ellison is amusing.
| epolanski wrote:
| I couldn't understand if it was satire or something.
|
| Do they believe the guy marries a 30 years old cause she loves
| him?
|
| In any case, who cares, how was that relevant..
| masklinn wrote:
| It's definitely satire, how would you take this sentence as
| serious:
|
| > I told Larry this especially means a lot to me because my
| former #1 ranked Ph.D. student is now a professor in
| Michigan's Computer Science department with their famous
| Database Group.
| leeoniya wrote:
| "Do not fall into the trap of anthropomorphising Larry
| Ellison."
|
| https://m.youtube.com/watch?t=33m1s&v=-zRN7XLCRhc&feature=yo...
| avinassh wrote:
| Larry always makes appearances in his reviews, lol
|
| > But the real big news in 2023 was how Elon Musk personally
| helped reset Larry's Twitter password after he invested $1b in
| Musk's takeover of the social media company. And with this $1b
| password reset, we were graced in October 2023 with Larry's
| second-ever tweet and his first new one in over a decade.
|
| https://www.cs.cmu.edu/~pavlo/blog/2024/01/2023-databases-re...
|
| > These journalists made it sound like Larry was doing
| something nefarious or indecent, like the time he made his
| pregnant third wife sign a prenup two hours before their
| wedding. I can assure you that Larry was only trying to use his
| vast wealth as the 7th richest person in the world to help his
| country. His participation in this call is admirable and should
| be lauded. Free and fair elections are not a trivial affair,
| like a boat race where sometimes shenanigans are okay as long
| as you win. Larry has done other great things with his money
| that are overlooked, like spending $370m on anti-aging research
| so that he can live forever
|
| https://www.cs.cmu.edu/~pavlo/blog/2022/12/2022-databases-re...
| samanthasu wrote:
| would love to see what Andy's take on GreptimeDB
| https://github.com/GreptimeTeam/greptimedb
| leeoniya wrote:
| made same comment recently in
| https://news.ycombinator.com/item?id=42330055#42331927
| samanthasu wrote:
| looking forward to the bonus content!
| rozenmd wrote:
| I loved ottertune, it's a shame it died the way it did.
| memhole wrote:
| Love the style! CMU making databases cool. Sorry to hear about
| OtterTune.
| Beefin wrote:
| TL;DR SQL is king
| m_ke wrote:
| Andy is a treasure, if only we had more professors like him
| antirez wrote:
| Wow, the reasons why Redis commands API suck in Andy's video
| (linked in the post) are the weakest ever. It is possible to make
| a case against the Redis API (I would not agree of course but...
| it's totally legitimate), but you gotta have stronger arguments
| than those, particularly if you are a teacher of some kind.
| Especially: you need to be somewhat fluent in Redis and how
| developers use Redis in order to understand why so many people
| like it, and then elaborate what it's wrong about it (if you
| believe there is something wrong). The video shows a general
| feeling of "I don't really use / know this, but I don't like how
| NON-SQL it is".
| nojito wrote:
| SQL is king and history has shown non-sql languages are not
| good which causes many non-sql DBMS's to adopt sql eventually.
| antirez wrote:
| Many non-SQL DBs had query languages that were broken
| Javascript-ish versions of SQL. Of course, this is wrong, and
| people will eventually adopt SQL instead. But if your _data
| model_ isn 't anything like relational DBs, non-SQL makes a
| ton of sense. OP seems to miss exactly this, that the Redis
| query language is shaped on the Redis data model, that is
| basically alien to the relational model.
|
| The idea behind Redis data model is that "describe data" then
| "query those data in random ways" is conceptually nice but
| practically will not model very well many use cases. SQL
| databases plagued tech with performance issues for decades
| because of that. So Redis says instead: you need to put your
| data thinking about fundamental things like data structures
| and access times and _the way you 'll need those data back_.
| And the API reflects this.
|
| You don't have to automatically agree with that. But you have
| to understand that, then provide your "I'm against"
| arguments. Especially if you are in front of young people
| listening to you.
| im_down_w_otp wrote:
| Agreed. Many noSQL-boom-era databases eventually bolted on
| a SQL-esque layer, but that was also because they were
| mostly also all targeting "enterprise database" use cases
| and customers who both expected that and whose use cases
| largely fit with it. So, there was a lot of pressure to
| conform to norms when the advantage of not doing so wasn't
| immediately self-evident.
|
| We have a database [1] and query language [2] that's
| tailored to storing & querying trace/telemetry data
| produced by different layers and components of cyber-
| physical systems for systems engineers to analyze, verify,
| and validate what a complex system is doing. It's not quite
| a traditional relational problem. It's not quite a
| traditional time series problem. It's not quite a
| traditional graph problem.
|
| Addressing the way that systems engineers think about their
| domain in an effective way required coming up with
| something different. Are there caveats and rough edges?
| Sure. But, they're a lot less pernicious and onerous than
| the alternative of trying to leverage a bunch of ill-
| fitting menageries of different solutions.
|
| Redis is fit-for-purpose. So, it makes sense that its query
| interface would also express that.
|
| [1] https://docs.auxon.io/modality/
|
| [2] https://docs.auxon.io/speqtr/
| nojito wrote:
| >But if your data model isn't anything like relational DBs,
| non-SQL makes a ton of sense. OP seems to miss exactly
| this, that the Redis query language is shaped on the Redis
| data model, that is basically alien to the relational
| model.
|
| Sure...but all roads lead back to SQL eventually. Another
| recent example also mentioned in the OP is BigTable
| adopting SQL.
| threeseed wrote:
| > but all roads lead back to SQL eventually
|
| No it doesn't. SQL is designed for relational databases.
|
| For other forms i.e. JSON, Graph, Key/Value they all use
| other query languages.
| anovick wrote:
| SQL (and RDBMS in general) has its limitations, particularly
| with regards to recursive operations.
|
| An extended Datalog[1] can provide performance optimizations
| not available to RDBMS.
|
| [1]: https://dl.acm.org/doi/10.1145/3639271
| fforflo wrote:
| I've been working on something Redis-y over the holidays, and
| it has reinforced my view that it's the epitome of a 20%- 80%
| tool. I've always used the 20%, but anything beyond that sounds
| useless unless you've encountered the requirement in a
| production environment. The challenges Redis has been solving
| for years, never really touched the research/academic community
| (even the 20%).
|
| Even in the various taxonomies of DBS in the research
| literature, Redis was mentioned with a wave of the hand as an
| "in-memory" database, which undersells the important (for me)
| part of the "data structure" server.
|
| Putting the "database" after Redis could be a marketing
| misstep. Because it puts you in the is-it-sql territory.
|
| TL;DR: Redis is mostly appreciated by practitioners (web)
| developers. Academics find it lacking a theoretical foundation,
| so... meh.
| Tanjreeve wrote:
| Developers know it's limits. Or you have developers with
| vague "scaling issues" or "buggy caching" who don't
| understand why they have them or suddenly start suffering
| from them at inconvenient moments.
| nicoritschel wrote:
| With all due respect, the linked video was pretty fair. It
| didn't imply not to use Redis, just not as a primary datastore.
|
| I don't think folks work with Redis out of fondness for the
| model, but because it's the least worst datastore for caching,
| lightweight message broker, and simple realtime things like
| counters.
| antirez wrote:
| Talking about the broken API argument here. Also Redis is
| particularly useful exactly in other situations compared to
| what OP says. Leaderboards style use cases with sorted sets
| are killer applications (super hard to model with SQL) of the
| data structure server thing. Apparently OP does not
| understand this and says "simple GET/SET" is what you should
| use Redis for.
|
| Redis has probabilistic data structures, the ability to
| implement complex queueing patterns, and so forth. That's
| where the value is. Otherwise we would still be just with
| Memcached without caring about Redis. Another killer app was
| Twitter initial use case (then they used it for pretty much
| everything): to cache latest N Tweets, using capped lists. I
| could continue forever.
|
| So OP argoment is flawed IMHO, for the above arguments, not
| fair. When you talk to students you need to make your
| homeworks. Really understand the system you are talking and
| provide a realistic image of it. Then, yes, if you want,
| criticize it as much as you want, with grounded arguments.
|
| You know what? I re-read this comment and it's embarassing I
| ever have to write this, because after 15 years of Redis
| history at such scale and popularity, pretty much everybody
| that was seriously exposed to Redis knows those stuff. Is
| tech culture really degraded so much that we have to restate
| the obvious? Do I really need to explain GET/SET is not
| exactly where Redis shines after 15 years of half the
| Internet used all the kind of Redis patterns?
| memhole wrote:
| What are your thoughts about Rails switching to SQLite from
| Redis? I've only used Redis to store session data and cache
| app data. So my opinions are pretty limited and mostly
| positive.
|
| https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
| antirez wrote:
| My feeling is that for their use case, it makes sense to
| have something vertical that just cover the needs of
| Rails. AFAIK SQLite has a RAM backend, so still you are
| not going to hit disk. Seems like a good idea to reduce
| system complexity, to me.
| sureglymop wrote:
| Maybe this is a weird question but, knowing only some math
| and not redis, what is a sorted set and how is it different
| than a list/tuple?
| antirez wrote:
| Sorted sets are abstract data structures were you insert
| elements into a set, but every element is associated with
| a floating point score. Elements are taken ordered inside
| the sorted sets, so you can ask for ranges, or a specific
| element rank (position), and so forth. It sounds like the
| (many) cases where Redis is the best idea to get started
| and deliver (see for instance the Instagram case, that
| used Redis for years while becoming bigger and bigger).
| Then as you understand you are at scale and need just
| XYZ, you may choose to implement XYZ inside your system
| in other ways and that's it.
| nicoritschel wrote:
| I am grateful for Redis and I agree you pioneered a lot of
| data access patterns in production for a lot of people,
| myself included. I've used Redis for 10 years, at times for
| use cases as you mention, for real time feature engineering
| for ML as well.
|
| The API is just different compared to SQL, which is a
| downside for many. There's modern advancements in the space
| with IVM and more databases are supporting probabilistic
| data structures.
| daneel_w wrote:
| _> Is tech culture really degraded so much that we have to
| restate the obvious?_
|
| Maybe, though the author of the article is known to be a
| little bit too opinionated, and unfortunately habitual with
| phrasing himself in a bombastic manner. The piece reads
| like a dramatic recap of the past year's sporting events,
| littered with irrelevant and disconnected references to
| lyrics and drama in the world of rap and hip hop. A "quirky
| and fun" journalistic abortion.
| cloverich wrote:
| Redis is stable, powerful, widely supported, and has been
| running strong... over a decade now? Ive never heard it
| recommended as a primary datastore... why would someone do
| that? Ive seen it used at scale for numerous businesses now
| and its caused problems exactly never. People understand how
| to use it because its relatively simple and provides the
| first things you need beyond the database. Do people complain
| about redis commonly? News to me.
| nicoritschel wrote:
| Adtech/ML
| tayo42 wrote:
| > Andy's video (linked in the post)
|
| Is there a "to long didnt watch" summary any one knows of? I
| hate videos, but am curious lol
| apavlo wrote:
| > Wow, the reasons why Redis commands API suck in Andy's video
| (linked in the post) are the weakest ever.
|
| In my example, the API on a key changes based on its value
| type. And the same collection can have different value types
| mixed together. You've recreated the worst parts of IBM IMS
| from the 1960s. However, the original version of IMS only
| changed the API when a collection's backing data structure
| changed. Redis can change it on every key!
|
| We didn't get into the semantics of Redis' MULTI...EXEC, which
| the documentation mischaracterizes as "Transactions". I'm happy
| that at least you didn't use BEGIN...COMMIT.
| antirez wrote:
| You totally miss that Redis is more like a remote interpreter
| with a DSL that manipulates data structures stored at global
| variables (keys): you (hopefully) would never complain about
| languages having this semantics.
|
| I don't think you understood how Redis collections work. The
| items are just strings, they can't be mixed like integers or
| strings together or whatever, nor collections can be nested.
|
| The Redis commands do type checking to ensure the application
| is performing the right operation.
|
| In your example, GET against a list, does not make sense
| because:
|
| 1. GET is the retrieve-the-key-of-string-type operation.
|
| 2. Having GET doing something like LRANGE 0 -1 would have
| many side effects. Getting for error a huge list and
| returning a huge data set without any reason, creating
| latency issues. Also having options for GET to provide ranges
| (SQL alike query languages horror story). And so forth.
|
| So each "verb" should do a specific action in a given data
| type. Are you absolutely sure you were exposed enough to the
| Redis API, how it works, and so forth?
|
| About MULTI/EXEC, when AOF with fsync configured correctly is
| used, MULTI/EXEC provide some of the transactional guarantees
| you think when you hear "transaction", but in general the
| concept refers to the fact that commands inside MULTI/EXEC
| have an atomic effect from the point of view of an external
| observer AND point-in-time RDB files (and AOF as well). MULTI
| / INCR a / INCR a / EXEC will always result in the observer
| to see either 2, 4, 6, 8, and so forth, and never 3 or 5.
|
| Anyway, I believe you didn't put enough efforts in
| understanding how really Redis works. Yet you criticized it
| with weak arguments in front of the most precious good we
| have: students. This is the sole reason why I wrote my first
| comment, I believe this to be a wrong teaching approach.
| zzzeek wrote:
| > You totally miss that Redis is more like a remote
| interpreter with a DSL that manipulates data structures
| stored at global variables (keys):
|
| I think he makes the point that these "global variables"
| are dynamically typed; you can have "listX" and then write
| a non-list into that same name; statically typed systems
| would not allow this. He makes the fairly non-controversial
| point that a statically typed system (SQL, other than that
| of SQLite) adds a level of type safety that can guard
| against software bugs.
| jsnell wrote:
| > 1. GET is the retrieve-the-key-of-string-type operation.
|
| That's a tautological argument. The question isn't what the
| definition of GET is, but whether the design is good.
|
| > 2. Having GET doing something like LRANGE 0 -1 would have
| many side effects. Getting for error a huge list and
| returning a huge data set without any reason, creating
| latency issues.
|
| If this really were the reason, you'd have separate
| operations for tiny strings and huge strings. After all, by
| analogy having GET return a huge string "without any
| reason" would create latency issues.
|
| But that's not how Redis works, right?
| antirez wrote:
| The examples I made are just a subset of the protection
| that this provides. Similarly you can't LRANGE a set
| type, and so forth. So this in general makes certain
| errors evident ASAP (command mismatch with the key type).
|
| This does not meant that Redis would not work having
| generic LEN, INSERT, RANGE commands. But such commands
| would end also having type-specific options, that I have
| the feeling is not very clean. Anyway these are design
| tastes, but I don't think they dramatically change what
| Redis is or isn't. The interesting part is the data
| model, the idea of commands operating on abstract data
| structures, the memory-disk duality, and so forth. If one
| wants to analyze Redis, and understand merits and issues,
| a serious analysis should hardly focus on these kind of
| small choices.
| DrBenCarson wrote:
| Just because there are reasons for why Redis sucks doesn't
| meant it doesn't suck
| osigurdson wrote:
| >> stored at global variables
|
| This is an interesting (and correct) perspective. Global
| variables scare us in software but we are ok with it when
| it comes to application state stored in a db.
| brightball wrote:
| Yea, I always use Redis for very specialized purposes.
|
| Like offloading a shared data structure between threads /
| processes / machines so that I don't have to deal with thread
| safety issues.
| Spivak wrote:
| I understand machines but _threads_?! Why introduce IPC
| overhead on the fastest /easiest way to share data? This is
| beyond a solved problem and your language probably has
| multiple ready-made battle tested solutions.
|
| In Python you don't even need a lib, dict is thread safe even
| in nogil.
| pdhborges wrote:
| In Python you don't even need a lib, dict is thread safe
| even in nogil.
|
| Is it? https://google.github.io/styleguide/pyguide.html#218
| -threadi...
| codeulike wrote:
| Weird how SQL Server and its Azure variants gets no mention. It
| dominates in certain sectors. DBEngines ranks it third most
| popular overall https://db-engines.com/en/ranking
| patja wrote:
| The fawning over Larry Ellison is also weird.
| cactusfrog wrote:
| The joke is that his greed/ unwillingness to squeeze margins
| has made the entire database company ecosystem possible.
| RadiozRadioz wrote:
| Lots of people deliberately avoid Microsoft technologies and
| their whole ecosystem. There's of course interesting stuff
| happening there, but not enough for those outside the ecosystem
| to care.
|
| It's more a cultural thing than anything else. HN for example
| largely leans away from MS. It's quite interesting how little
| overlap there is between the two worlds sometimes.
|
| Speaking as one of those people, it's just not my thing, so
| it's not on my radar at all. There's enough stuff happening
| outside MS to keep me busy forever.
| teej wrote:
| Are people choosing SQL Server independently of the Microsoft
| ecosystem? My understating is that you typically use it because
| you're forced to choose a MS product.
| rawgabbit wrote:
| SQL Server is a terrific product. And I detest most things
| Microsoft.
| Tostino wrote:
| Agreed with the other person. It's a great database. I
| wouldn't choose it for a startup over Postgres, but it is
| extremely capable.
| olavgg wrote:
| I would use it if it supported backup/restore over unix
| pipes / ssh.
| stackskipton wrote:
| SRE who deals with some .Net stuff that uses MSSQL but is
| converting to MySQL. so I feel somewhat qualified to talk about
| MSSQL. TL;DR: Nothing interesting going on.
|
| There is nothing to talk about here. It's boring database
| engine that powers boring business applications. It's pretty
| efficient and can scale vertically pretty well. With state of
| modern hardware, that vertical limit is high enough most people
| won't encounter it.
|
| It's also going the way of Windows Server which is to say, it's
| being sold but not a ton of work is being done on it. Companies
| that are still invested in it are likely because they don't
| care about cost ultimately or cost of switching is too high to
| greenlight the switch.
|
| Anyone who does care about cost like my current company has
| switched to OSS solutions like
| PostGres/MySQL/$RandomNoSQLOSSOption. My company switched away
| when turned into SaaS business and those MSSQL server costs ate
| into bottom line.
|
| This has been happening throughout the ecosystem. Proget which
| is THE solution for .Net Artifacts is switching to PostGres:
| https://blog.inedo.com/inedo/so-long-sql-server-thanks-for-a...
|
| Also, I saw this article from Brent Ozar, who I see as MSSQL
| smart person, which basically said if you have the option, just
| go with PostGres:
| https://www.brentozar.com/archive/2023/11/the-real-problem-w...
|
| It's also worth noting that Microsoft even bought PostGres
| scaling solution called Citus so they read the writing on the
| wall: https://blogs.microsoft.com/blog/2019/01/24/microsoft-
| acquir...
| rawgabbit wrote:
| I was a big proponent of MSSQL. It is still a good product
| but I see Microsoft constantly fumbling with new OLAP tools.
| It is a shame but it seems Microsoft is abandoning MSSQL.
| mbreese wrote:
| _> It 's boring database engine that powers boring business
| applications_
|
| I'm taking that as a positive thing... it's boring and does
| its job with little fanfare. That's pretty much what I want
| out of a RDBMS. So long as it is "fast-enough" with enough
| features for the applications that use it, that seems like a
| good place for an RDBMS to be.
|
| One could still argue about Windows and licensing fees, but
| from a technical point of view, for business customers,
| boring isn't necessarily a bad thing.
| FridgeSeal wrote:
| There's other boring databases that also reliably fill that
| job, and they also cost far less.
|
| It can also be a bit of a pain outside the C# ecosystem,
| whereas every language ever has nice postgres drivers that
| don't require us to download arms setup ODBC. It runs on
| Linux as of a few years ago, but I also wouldn't be
| surprised if many people didn't realise that.
| stackskipton wrote:
| I've run into MSSQL on Linux. Most DBAs know but their
| entire ecosystem is Windows Server so what's another
| Windows Server is their thinking.
| sigbottle wrote:
| These year in review posts are really neat, I liked the AI in
| review posts really well.
|
| Maybe algorithms review or TCS review or some specific math topic
| review next?
| CT4u8798 wrote:
| I love SQL. I'm not a full-time developer but always use SQL over
| other abstractions, which I find extremely confusing and way more
| complicated that plain SQL.
| skeeter2020 wrote:
| I'm now firmly into management but the one skill I use very
| regularly is SQL. By far the best investment I made in my
| entire career was a little bit of relational algebra, some
| casual study of DBMS internals and a lot of hands-on SQL. The
| quasi-standards have also made it the easiest transfer across
| specific DBs and their flavours over the years.
|
| PSA: Hi kids, here's a dinosaur with yet more free advice: put
| the tiniest bit of effort into SQL early on and watch the
| compound interest add up.
| threeseed wrote:
| That's just because it is what you are comfortable with.
|
| Many developers will jump straight for ORMs when given the
| chance.
| Tostino wrote:
| Which for certain types of applications ORMs absolutely have
| their use.
| wahnfrieden wrote:
| 2024 was also the year that Realm died
| travisgriggs wrote:
| The funding section had me thinking "one of these is not like the
| others". Both the amount and count of successive rounds.
| ksec wrote:
| >Six years after MySQL v8 went GA, the team turned v9 out on the
| streets. ......Oracle is putting all its time and energy into its
| proprietary MySQL Heatwave service.
|
| Oracle actually released 9.1 already in 2024. [1] And expect
| another release this month, and every quarter. So I think MySQL
| continues to get some new features bug fix and support like it
| used to. Contrary to most people think it is all going to
| Heatwave. I just hope Vector will be open source later as
| official to MySQL rather than behind Heatwaves.
|
| [1]
| https://dev.mysql.com/doc/relnotes/mysql/9.1/en/news-9-1-0.h...
| the_arun wrote:
| This person started with news on DB - reviewing all prominent DBs
| & finally ended talking about love of Larry Ellison. A perfect
| human in the days of LLMs. Amazing write up.
| RedShift1 wrote:
| I've been using plain postgres for over 5 years now, reading this
| I feel like I'm in the eye of a storm...
| mebcitto wrote:
| A couple of spicy things:
|
| > OtterTune. Dana, Bohan, and I worked on this research project
| and startup for almost a decade. And now it is dead. I am
| disappointed at how a particular company treated us at the end,
| so they are forever banned from recruiting CMU-DB students. They
| know who they are and what they did.
|
| Ouch.
|
| > Lastly, I want to give a shout-out to ByteBase for their
| article Database Tools in 2024: A Year in Review. In previous
| years, they emailed me asking for permission to translate my end-
| of-year database articles into Chinese for their blog. This year,
| they could not wait for me to finish writing this one, so they
| jocked my flow and wrote their own off-brand article with the
| same title and premise.
|
| Also sounds like he's preparing a new company:
|
| > I hope to announce our next start-up soon (hint: it's about
| databases).
| spprashant wrote:
| Anyone know what company he may be talking about?
| iso8859-1 wrote:
| How do they enforce the ban? Do universities have non-compete
| clauses for PhD students?
| mebcitto wrote:
| I assume it's not that kind of ban, but more like he'll
| recommend his students to avoid the company.
| mrtimo wrote:
| Enjoyed his roundup in the "Shoving Ducks into Everything"
| section.
|
| DuckDB is a great tool. In April 2020, the creator of DuckDB gave
| a talk at CMU. In the beginning he makes a convincing argument
| (in 5 minutes) why data scientists don't use RDBMS and how this
| was the genesis of DuckDB. Here is a video that starts 3 minutes
| into the talk (where is argument starts):
| https://youtu.be/PFUZlNQIndo?si=ql9n2QuBlAEuGIqo&t=204
| dig1 wrote:
| > There was no major effort to fork off MongoDB, Neo4j, Kafka, or
| CockroachDB when they announced their license changes.
|
| AFAIK people didn't take MongoDB seriously from the start,
| especially with the "web scale database" joke circulating. The
| Neo4j Community version has been under GPLv3 for quite some time,
| while the Enterprise version has always been somewhat closed,
| regardless of whether the source code was available on GitHub
| (the mentioned license change affected the Enterprise version).
|
| Regarding CockroachDB, I must admit that I've only heard about it
| on HN and don't know anyone who seriously uses it. As for Kafka,
| there are two versions: Apache Kafka, the open-source version
| that almost everyone uses (under the Apache license), and
| Confluent Kafka, which is Apache Kafka enhanced with many
| additional features from Confluent, and the license change
| affected Confluent Kafka. In short, maybe the majority simply
| didn't care about these projects very much, so there is no major
| fork.
|
| > It cannot be because the Redis and Elasticsearch install base
| is so much larger than these other systems, and therefore, there
| were more people upset by the change since the number of MongoDB
| and Kafka installations was equally as large when they switched
| their licenses.
|
| I can't speak for MongoDB, but the Confluent Kafka install base
| is significantly smaller than that of Apache Kafka, Redis and ES.
|
| > Dana, Bohan, and I worked on this research project and startup
| for almost a decade. And now it is dead. I am disappointed at how
| a particular company treated us at the end, so they are forever
| banned from recruiting CMU-DB students. They know who they are
| and what they did.
|
| Call me a skeptic, but I can't see this as a fair approach. If
| your company fails for whatever reasons, you should not recruit
| the university department/group/students against your peers (I
| can't find that CMU-DB was one of the founders of Ottertune).
|
| Wrt Andy, here are [1] somehow interesting views from
| (presumably) previous employees.
|
| [1]
| https://www.reddit.com/r/Database/comments/1dgaazw/comment/l...
| paulddraper wrote:
| There are production uses of MongoDB (Stripe comes to mind).
|
| But it is certainly not a popular choice there.
| threeseed wrote:
| MongoDB does over $2b in revenue (and growing by 20%) each
| year.
|
| There are a lot of production uses.
| apavlo wrote:
| > Wrt Andy, here are [1] somehow interesting views from
| (presumably) previous employees.
|
| I am only seeing this now and I take the complaints about being
| "slightly racist and offensive" very seriously. I am checking
| with investors, former HR people, and co-founders. I was not
| made aware of any issues. If anything, I was overly cautious at
| the company.
|
| I was openly transparent with our employees about every
| direction the company was pursuing up until the very end. The
| complaint that "He thinks he knows everything about business"
| makes me believe this person is just trolling because I was
| always the first to admit in meetings that I was not an expert
| in how to run a business. We had to fire people because of
| inappropriate behavior, but not because I had strong
| disagreements with how to run the company.
| moab wrote:
| FWIW, as someone who had multiple friends who worked closely
| with Andy over 5+ years and at all stages of their career
| (both BS / PhD) those comments reek of someone with an axe to
| grind. All of the many anecdotes I have about Andy paint a
| picture of a great advisor and mentor. I suppose I should say
| "shenanigans aside", but if you can't separate his jokes from
| his academic side you need to develop a sense of humor.
| maeil wrote:
| Good read!
|
| > Postgres' support for extensions and plugins is impressive. One
| of the original design goals of Postgres from the 1980s was to be
| extensible. The intention was to easily support new access
| methods and new data types and operations on those data types
| (i.e., object-relational). Since 2006, Postgres' "hook" API. Our
| research shows that Postgres has the most expansive and diverse
| extension ecosystem compared to every other DBMS.
|
| Greenhorn developers don't even know that there are non-Postgres
| databases which have extensions too - such is the gap! I wouldn't
| be surprised if Postgres had as many as all others combined.
| softwaredoug wrote:
| On the "Amazon can just offer your DB as a service"
|
| Yes this can happen. But a lot of people don't want a AWS managed
| service. They're like 30% cheaper for 30% less value. They can
| develop a bad reputation and feel like weird forks (kinesis vs
| Kafka) that have weird undocumented gotchas and edge cases that
| never get fixed. Many teams want to host on k8s anyway, and
| you'll probably have better k8s support from the main project.
| Another example is the success of Flink over hosted Google
| Dataflow. Seems eventually the teams I know trend to the most
| mainstream OSS implementation over time, maybe after early
| prototyping on a managed system.
|
| IMO it might not be the highest growth market anymore. Those who
| want to pay for a managed service will. But many are just
| figuring out a k8s based solution to their infra needs as k8s
| knowledge becomes more ubiquitous.
| wslh wrote:
| Great heads up. I wonder about graph databases. He mentioned
| <https://umbra-db.com/> and <https://cedardb.com/> both include
| the graph use case and I wonder how they compare to
| <https://neo4j.com/>.
| atombender wrote:
| The article mentions Greenplum, but it's worth noting that when
| the code was closed, several of the original developers created
| an open-source fork, Cloudberry, which seems to be thriving.
| Cloudberry was accepted into the Apache project this year, and
| has synced with Postgres 14, whereas the closed-source Greenplum
| is still stuck on Postgres 12.
|
| The architecture is quite ancient at this point, but I'm not sure
| it's completely outdated. It's single-master shared-nothing, with
| shards distributed among replicas, similar to Citus. But the
| GPORCA query planner is probably the most advanced distributed
| query planner in the open source world at this point. From what I
| know, Greenplum/Cloudberry can be significantly faster than Citus
| thanks to the planner being smarter about splitting the work
| across shards.
| ledgerdev wrote:
| Thanks for the cloudberry mention, wasn't aware of it.
| PeterZaitsev wrote:
| I think one thing Andy misses about why people were pissed about
| Elastic and Redis but not as many for MongoDB and some other is
| their license and size of Contributors Community.
|
| When original license is as restricted as AGPL it is unlikely
| there is much of embedded use... so less people are impacted in
| truly catastrophic way
|
| Also if there is no contributor community to speak of... who is
| going to do the fork ?
|
| I put some thoughts about it in my post about ScyllaDB
| https://peterzaitsev.com/thoughts-on-scylladb-license-change...
| tayo42 wrote:
| > In the case of Redis, I can only think that people perceive
| Redis Ltd. as unfairly profiting off others' work since the
| company's founders were not the system's original creators. An
| analysis of Redis' source code repository also shows that a
| sizable percentage of contributions to the DBMS comes from
| outside the company
|
| He mentions this in "Andy's Take" section btw
| PeterZaitsev wrote:
| Yes. Not the license though.
| kwillets wrote:
| I spent the past year puzzling over the DB market as well, but I
| don't feel like I'm much closer to understanding it.
|
| It appears that a lot of attention is now directed at the folks
| doing 100 MB queries, and the high end has moved past everybody's
| radar. My idea of an exciting product is Ocient, who have skipped
| over Cloud and gone for hyperscale on-prem hardware. Yellowbrick
| is also a contender here.
|
| I have a lot of experience with Vertica, and they seem to have
| gotten stuck in this niche as well, with sales tilted towards big
| accounts, but less traction in smaller shops, and a difficult
| road to get a SaaS or similar easy-start offering.
|
| There's a crossover point where self-managed is cheaper than
| cloud, but nobody seems to have any idea where it is. Snowflake
| will gladly tell you that your sub-$1M Vertica cluster should be
| replaced by $10M of sluggish SaaS, and that you are saving money
| by doing so. These decisions seem more in the realm of psychology
| or political science.
|
| DHH's cloud exit was a refreshing take on the expense issue, even
| if it wasn't strictly in the database space -- the cost per VCPU
| and so forth that he documented is a good start for estimating
| savings, and he debunked a lot of the "hidden costs" that cloud
| maximalists claim.
|
| In the business/financial space the biggest news to me was the
| correction in Snowflake's stock price, which seemed to indicate
| that investors were finally noticing metrics like price-
| performance, but they added a little more AI and went back into
| irrationality.
|
| I'm heavily in favor of DuckDB, Hudi, Iceberg, S3 tables, and the
| like. Mixing high-end and low-end tools seems like the best
| strategy (although settling on one high-end DWH has also worked
| IME), and the low end is getting better and cheaper, squeezing
| out the mid-range SaaS vendors.
|
| In research I found Goetz Graefe's work in offset-value coding
| exciting -- he's wired it into query operators in a way that
| saves a lot of CPU on sorting and joins/aggregation. This is a
| technique that I've applied favorably in string sorting, and it
| was discovered in the DB community decades ago but largely
| forgotten. (This work precedes 2024, but I'm a slow study.)
| bionhoward wrote:
| Redis is slow?
| kermatt wrote:
| I wish there was more context around that statement in his
| post.
|
| Redis while not having some of the features he mentions in [1]
| (i.e. SQL), when used for what it excels at is usually not
| considered "slow".
|
| As an in-memory data structure server, a common use case is to
| use it for where some operations in a typical RDBMS are slow.
|
| [1] https://youtu.be/fZbwD1gzjLk?t=2018
| ak_111 wrote:
| Wow his database startup that raised 12M died this year after
| only three years.
|
| If anything this shows how insanely difficult it must be to
| succeed as a database startup (when was the most recent startup
| success in this space?), as the founding team is stellar.
|
| On the other hand I am surprised it died this quick and
| interested to know if they did a proper postmortem. Not only did
| they raise way more than is needed to survive for three years but
| the idea is about utilising AI to improve DB performance and I
| find it hard to imagine they couldn't find more investors to lend
| them a lifeline with all the AI hype.
| Tanjreeve wrote:
| No idea about internal workings but as a "DB optimisation"
| startup you're competing with
|
| - most people don't need it
|
| - People who do need it having DBAs/Operations people
|
| - or consultancies
|
| - Database vendors that have automatic optimisation as a
| feature
|
| Ok "AI" in the name but I think for something as specific as DB
| optimisation AI jazz hands probably don't work as well. Writing
| it out it almost seems harder than being an actual DB vendor.
| lvl155 wrote:
| I highly recommend their Youtube series on databases. They have
| great guest speakers.
| based2 wrote:
| https://dbos-project.github.io news
| refset wrote:
| More specifically, DBOS Inc. raised a $8.5 million seed round
| [0] and is backed by Michael Stonebraker (the creator of
| Postgres). I initially assumed Andy was alluding to this when
| he wrote "the most famous database octogenarian splashing cash"
| :)
|
| [0] https://techcrunch.com/2024/03/12/new-startup-from-
| postgres-...
| polishdude20 wrote:
| After I interviewing at OtterTune a while back and being
| bombarded with multiple rounds of leetcode questions, I somehow
| knew OtterTune wouldn't make it
| roark_howard wrote:
| DuckDB dominating over DataFusion could fuel the ongoing language
| war with a great half baked argument!
| gigatexal wrote:
| This take screams more than a technical criticism but of
| something personal. "I'll be blunt: I don't care for Redis. It is
| slow, it has fake transactions, and its query syntax is a
| freakshow. Our experiments at CMU found Dragonfly to have much
| more impressive performance numbers (even with a single CPU
| core). In my database course, I use the Redis query language as
| an example of what not to do." (From the article)
|
| Of course it's not to be used as a general purpose DB it's keys
| and values. Used for caches and things like that. In my
| experience in real world scenarios and loads vanilla single
| threaded Redis is stable, fast, and nigh bulletproof.
| rednafi wrote:
| Loved the overview. Hated the shade toward Redis. Redis has
| arguably the best key-value query syntax, and there's a reason so
| many people swear by it. True, the decision-makers at Redis Ltd
| are absolute pieces of trash, but Redis itself is a delightful
| piece of engineering artifact.
|
| I don't care about the billion-dollar drama behind a piece of
| tech, but Redis defined the key-value query API for many similar
| databases. Trashing it just because it isn't SQL-like feels
| unjustified.
| quotemstr wrote:
| There's one QOL extension that I haven't seen anyone else
| implement: dimensional analysis. I can declare a column is an
| integer. Why not an integer that expresses feet? Why shouldn't I
| be able to write SELECT 1inch + 1cm and get a correctly computed
| length? Why can't the query parser help me avoid nonsense like
| SELECT 1kg + 1hr? All this stuff is pretty straightforward to add
| and would help avoid avoidable mistakes.
| Upvoter33 wrote:
| Pretty funny.
|
| One factual issue: "The university had previously announced that
| this player was transferring from Louisiana State to Michigan."
| This is not true. Underwood had committed to LSU but then
| switched his commitment to Michigan. He was still in high school
| at the time, and has never attended LSU.
|
| But, do you really expect a funny database prof to know much
| about football?
| mpbart wrote:
| I never thought I'd see a discussion about the Underwood NIL
| drama on a databases blog post but here we are.
| bcoates wrote:
| "I've never met anybody that used Alteryx"
|
| I have! It's a pretty good no-code/minimal-code graphical
| ELT+Analytics in one tool. It's one of those alternate-universe
| tools that has it's own way of doing things from everything else
| in the industry, but it's pragmatic and the people who use it
| tend to love it.
|
| The one thing that makes it viable is that is has/had (pre-
| acquisition) _very_ aggressive compatibility with anything else
| that can hold data, so you can use it as a bolt-on to whatever
| other databases or files your company has.
|
| Despite what the PE press release about the acquisition says, it
| has virtually nothing to do with AI, at lease in the modern big
| NN sense.
|
| If you're looking to fix your giant pile of alteryx workbooks or
| migrate them to something else, hmu
| osigurdson wrote:
| More like: "Database license drama - a year in review".
___________________________________________________________________
(page generated 2025-01-01 23:00 UTC)