[HN Gopher] Databases in 2024: A Year in Review
       ___________________________________________________________________
        
       Databases in 2024: A Year in Review
        
       Author : avinassh
       Score  : 315 points
       Date   : 2025-01-01 14:27 UTC (8 hours ago)
        
 (HTM) web link (www.cs.cmu.edu)
 (TXT) w3m dump (www.cs.cmu.edu)
        
       | mihirrd wrote:
       | Quite informative
        
       | badindentation wrote:
       | The section on Larry Ellison is amusing.
        
         | epolanski wrote:
         | I couldn't understand if it was satire or something.
         | 
         | Do they believe the guy marries a 30 years old cause she loves
         | him?
         | 
         | In any case, who cares, how was that relevant..
        
           | masklinn wrote:
           | It's definitely satire, how would you take this sentence as
           | serious:
           | 
           | > I told Larry this especially means a lot to me because my
           | former #1 ranked Ph.D. student is now a professor in
           | Michigan's Computer Science department with their famous
           | Database Group.
        
         | leeoniya wrote:
         | "Do not fall into the trap of anthropomorphising Larry
         | Ellison."
         | 
         | https://m.youtube.com/watch?t=33m1s&v=-zRN7XLCRhc&feature=yo...
        
         | avinassh wrote:
         | Larry always makes appearances in his reviews, lol
         | 
         | > But the real big news in 2023 was how Elon Musk personally
         | helped reset Larry's Twitter password after he invested $1b in
         | Musk's takeover of the social media company. And with this $1b
         | password reset, we were graced in October 2023 with Larry's
         | second-ever tweet and his first new one in over a decade.
         | 
         | https://www.cs.cmu.edu/~pavlo/blog/2024/01/2023-databases-re...
         | 
         | > These journalists made it sound like Larry was doing
         | something nefarious or indecent, like the time he made his
         | pregnant third wife sign a prenup two hours before their
         | wedding. I can assure you that Larry was only trying to use his
         | vast wealth as the 7th richest person in the world to help his
         | country. His participation in this call is admirable and should
         | be lauded. Free and fair elections are not a trivial affair,
         | like a boat race where sometimes shenanigans are okay as long
         | as you win. Larry has done other great things with his money
         | that are overlooked, like spending $370m on anti-aging research
         | so that he can live forever
         | 
         | https://www.cs.cmu.edu/~pavlo/blog/2022/12/2022-databases-re...
        
       | samanthasu wrote:
       | would love to see what Andy's take on GreptimeDB
       | https://github.com/GreptimeTeam/greptimedb
        
         | leeoniya wrote:
         | made same comment recently in
         | https://news.ycombinator.com/item?id=42330055#42331927
        
           | samanthasu wrote:
           | looking forward to the bonus content!
        
       | rozenmd wrote:
       | I loved ottertune, it's a shame it died the way it did.
        
       | memhole wrote:
       | Love the style! CMU making databases cool. Sorry to hear about
       | OtterTune.
        
       | Beefin wrote:
       | TL;DR SQL is king
        
       | m_ke wrote:
       | Andy is a treasure, if only we had more professors like him
        
       | antirez wrote:
       | Wow, the reasons why Redis commands API suck in Andy's video
       | (linked in the post) are the weakest ever. It is possible to make
       | a case against the Redis API (I would not agree of course but...
       | it's totally legitimate), but you gotta have stronger arguments
       | than those, particularly if you are a teacher of some kind.
       | Especially: you need to be somewhat fluent in Redis and how
       | developers use Redis in order to understand why so many people
       | like it, and then elaborate what it's wrong about it (if you
       | believe there is something wrong). The video shows a general
       | feeling of "I don't really use / know this, but I don't like how
       | NON-SQL it is".
        
         | nojito wrote:
         | SQL is king and history has shown non-sql languages are not
         | good which causes many non-sql DBMS's to adopt sql eventually.
        
           | antirez wrote:
           | Many non-SQL DBs had query languages that were broken
           | Javascript-ish versions of SQL. Of course, this is wrong, and
           | people will eventually adopt SQL instead. But if your _data
           | model_ isn 't anything like relational DBs, non-SQL makes a
           | ton of sense. OP seems to miss exactly this, that the Redis
           | query language is shaped on the Redis data model, that is
           | basically alien to the relational model.
           | 
           | The idea behind Redis data model is that "describe data" then
           | "query those data in random ways" is conceptually nice but
           | practically will not model very well many use cases. SQL
           | databases plagued tech with performance issues for decades
           | because of that. So Redis says instead: you need to put your
           | data thinking about fundamental things like data structures
           | and access times and _the way you 'll need those data back_.
           | And the API reflects this.
           | 
           | You don't have to automatically agree with that. But you have
           | to understand that, then provide your "I'm against"
           | arguments. Especially if you are in front of young people
           | listening to you.
        
             | im_down_w_otp wrote:
             | Agreed. Many noSQL-boom-era databases eventually bolted on
             | a SQL-esque layer, but that was also because they were
             | mostly also all targeting "enterprise database" use cases
             | and customers who both expected that and whose use cases
             | largely fit with it. So, there was a lot of pressure to
             | conform to norms when the advantage of not doing so wasn't
             | immediately self-evident.
             | 
             | We have a database [1] and query language [2] that's
             | tailored to storing & querying trace/telemetry data
             | produced by different layers and components of cyber-
             | physical systems for systems engineers to analyze, verify,
             | and validate what a complex system is doing. It's not quite
             | a traditional relational problem. It's not quite a
             | traditional time series problem. It's not quite a
             | traditional graph problem.
             | 
             | Addressing the way that systems engineers think about their
             | domain in an effective way required coming up with
             | something different. Are there caveats and rough edges?
             | Sure. But, they're a lot less pernicious and onerous than
             | the alternative of trying to leverage a bunch of ill-
             | fitting menageries of different solutions.
             | 
             | Redis is fit-for-purpose. So, it makes sense that its query
             | interface would also express that.
             | 
             | [1] https://docs.auxon.io/modality/
             | 
             | [2] https://docs.auxon.io/speqtr/
        
             | nojito wrote:
             | >But if your data model isn't anything like relational DBs,
             | non-SQL makes a ton of sense. OP seems to miss exactly
             | this, that the Redis query language is shaped on the Redis
             | data model, that is basically alien to the relational
             | model.
             | 
             | Sure...but all roads lead back to SQL eventually. Another
             | recent example also mentioned in the OP is BigTable
             | adopting SQL.
        
               | threeseed wrote:
               | > but all roads lead back to SQL eventually
               | 
               | No it doesn't. SQL is designed for relational databases.
               | 
               | For other forms i.e. JSON, Graph, Key/Value they all use
               | other query languages.
        
           | anovick wrote:
           | SQL (and RDBMS in general) has its limitations, particularly
           | with regards to recursive operations.
           | 
           | An extended Datalog[1] can provide performance optimizations
           | not available to RDBMS.
           | 
           | [1]: https://dl.acm.org/doi/10.1145/3639271
        
         | fforflo wrote:
         | I've been working on something Redis-y over the holidays, and
         | it has reinforced my view that it's the epitome of a 20%- 80%
         | tool. I've always used the 20%, but anything beyond that sounds
         | useless unless you've encountered the requirement in a
         | production environment. The challenges Redis has been solving
         | for years, never really touched the research/academic community
         | (even the 20%).
         | 
         | Even in the various taxonomies of DBS in the research
         | literature, Redis was mentioned with a wave of the hand as an
         | "in-memory" database, which undersells the important (for me)
         | part of the "data structure" server.
         | 
         | Putting the "database" after Redis could be a marketing
         | misstep. Because it puts you in the is-it-sql territory.
         | 
         | TL;DR: Redis is mostly appreciated by practitioners (web)
         | developers. Academics find it lacking a theoretical foundation,
         | so... meh.
        
           | Tanjreeve wrote:
           | Developers know it's limits. Or you have developers with
           | vague "scaling issues" or "buggy caching" who don't
           | understand why they have them or suddenly start suffering
           | from them at inconvenient moments.
        
         | nicoritschel wrote:
         | With all due respect, the linked video was pretty fair. It
         | didn't imply not to use Redis, just not as a primary datastore.
         | 
         | I don't think folks work with Redis out of fondness for the
         | model, but because it's the least worst datastore for caching,
         | lightweight message broker, and simple realtime things like
         | counters.
        
           | antirez wrote:
           | Talking about the broken API argument here. Also Redis is
           | particularly useful exactly in other situations compared to
           | what OP says. Leaderboards style use cases with sorted sets
           | are killer applications (super hard to model with SQL) of the
           | data structure server thing. Apparently OP does not
           | understand this and says "simple GET/SET" is what you should
           | use Redis for.
           | 
           | Redis has probabilistic data structures, the ability to
           | implement complex queueing patterns, and so forth. That's
           | where the value is. Otherwise we would still be just with
           | Memcached without caring about Redis. Another killer app was
           | Twitter initial use case (then they used it for pretty much
           | everything): to cache latest N Tweets, using capped lists. I
           | could continue forever.
           | 
           | So OP argoment is flawed IMHO, for the above arguments, not
           | fair. When you talk to students you need to make your
           | homeworks. Really understand the system you are talking and
           | provide a realistic image of it. Then, yes, if you want,
           | criticize it as much as you want, with grounded arguments.
           | 
           | You know what? I re-read this comment and it's embarassing I
           | ever have to write this, because after 15 years of Redis
           | history at such scale and popularity, pretty much everybody
           | that was seriously exposed to Redis knows those stuff. Is
           | tech culture really degraded so much that we have to restate
           | the obvious? Do I really need to explain GET/SET is not
           | exactly where Redis shines after 15 years of half the
           | Internet used all the kind of Redis patterns?
        
             | memhole wrote:
             | What are your thoughts about Rails switching to SQLite from
             | Redis? I've only used Redis to store session data and cache
             | app data. So my opinions are pretty limited and mostly
             | positive.
             | 
             | https://rubyonrails.org/2024/11/7/rails-8-no-paas-required
        
               | antirez wrote:
               | My feeling is that for their use case, it makes sense to
               | have something vertical that just cover the needs of
               | Rails. AFAIK SQLite has a RAM backend, so still you are
               | not going to hit disk. Seems like a good idea to reduce
               | system complexity, to me.
        
             | sureglymop wrote:
             | Maybe this is a weird question but, knowing only some math
             | and not redis, what is a sorted set and how is it different
             | than a list/tuple?
        
               | antirez wrote:
               | Sorted sets are abstract data structures were you insert
               | elements into a set, but every element is associated with
               | a floating point score. Elements are taken ordered inside
               | the sorted sets, so you can ask for ranges, or a specific
               | element rank (position), and so forth. It sounds like the
               | (many) cases where Redis is the best idea to get started
               | and deliver (see for instance the Instagram case, that
               | used Redis for years while becoming bigger and bigger).
               | Then as you understand you are at scale and need just
               | XYZ, you may choose to implement XYZ inside your system
               | in other ways and that's it.
        
             | nicoritschel wrote:
             | I am grateful for Redis and I agree you pioneered a lot of
             | data access patterns in production for a lot of people,
             | myself included. I've used Redis for 10 years, at times for
             | use cases as you mention, for real time feature engineering
             | for ML as well.
             | 
             | The API is just different compared to SQL, which is a
             | downside for many. There's modern advancements in the space
             | with IVM and more databases are supporting probabilistic
             | data structures.
        
             | daneel_w wrote:
             | _> Is tech culture really degraded so much that we have to
             | restate the obvious?_
             | 
             | Maybe, though the author of the article is known to be a
             | little bit too opinionated, and unfortunately habitual with
             | phrasing himself in a bombastic manner. The piece reads
             | like a dramatic recap of the past year's sporting events,
             | littered with irrelevant and disconnected references to
             | lyrics and drama in the world of rap and hip hop. A "quirky
             | and fun" journalistic abortion.
        
           | cloverich wrote:
           | Redis is stable, powerful, widely supported, and has been
           | running strong... over a decade now? Ive never heard it
           | recommended as a primary datastore... why would someone do
           | that? Ive seen it used at scale for numerous businesses now
           | and its caused problems exactly never. People understand how
           | to use it because its relatively simple and provides the
           | first things you need beyond the database. Do people complain
           | about redis commonly? News to me.
        
             | nicoritschel wrote:
             | Adtech/ML
        
         | tayo42 wrote:
         | > Andy's video (linked in the post)
         | 
         | Is there a "to long didnt watch" summary any one knows of? I
         | hate videos, but am curious lol
        
         | apavlo wrote:
         | > Wow, the reasons why Redis commands API suck in Andy's video
         | (linked in the post) are the weakest ever.
         | 
         | In my example, the API on a key changes based on its value
         | type. And the same collection can have different value types
         | mixed together. You've recreated the worst parts of IBM IMS
         | from the 1960s. However, the original version of IMS only
         | changed the API when a collection's backing data structure
         | changed. Redis can change it on every key!
         | 
         | We didn't get into the semantics of Redis' MULTI...EXEC, which
         | the documentation mischaracterizes as "Transactions". I'm happy
         | that at least you didn't use BEGIN...COMMIT.
        
           | antirez wrote:
           | You totally miss that Redis is more like a remote interpreter
           | with a DSL that manipulates data structures stored at global
           | variables (keys): you (hopefully) would never complain about
           | languages having this semantics.
           | 
           | I don't think you understood how Redis collections work. The
           | items are just strings, they can't be mixed like integers or
           | strings together or whatever, nor collections can be nested.
           | 
           | The Redis commands do type checking to ensure the application
           | is performing the right operation.
           | 
           | In your example, GET against a list, does not make sense
           | because:
           | 
           | 1. GET is the retrieve-the-key-of-string-type operation.
           | 
           | 2. Having GET doing something like LRANGE 0 -1 would have
           | many side effects. Getting for error a huge list and
           | returning a huge data set without any reason, creating
           | latency issues. Also having options for GET to provide ranges
           | (SQL alike query languages horror story). And so forth.
           | 
           | So each "verb" should do a specific action in a given data
           | type. Are you absolutely sure you were exposed enough to the
           | Redis API, how it works, and so forth?
           | 
           | About MULTI/EXEC, when AOF with fsync configured correctly is
           | used, MULTI/EXEC provide some of the transactional guarantees
           | you think when you hear "transaction", but in general the
           | concept refers to the fact that commands inside MULTI/EXEC
           | have an atomic effect from the point of view of an external
           | observer AND point-in-time RDB files (and AOF as well). MULTI
           | / INCR a / INCR a / EXEC will always result in the observer
           | to see either 2, 4, 6, 8, and so forth, and never 3 or 5.
           | 
           | Anyway, I believe you didn't put enough efforts in
           | understanding how really Redis works. Yet you criticized it
           | with weak arguments in front of the most precious good we
           | have: students. This is the sole reason why I wrote my first
           | comment, I believe this to be a wrong teaching approach.
        
             | zzzeek wrote:
             | > You totally miss that Redis is more like a remote
             | interpreter with a DSL that manipulates data structures
             | stored at global variables (keys):
             | 
             | I think he makes the point that these "global variables"
             | are dynamically typed; you can have "listX" and then write
             | a non-list into that same name; statically typed systems
             | would not allow this. He makes the fairly non-controversial
             | point that a statically typed system (SQL, other than that
             | of SQLite) adds a level of type safety that can guard
             | against software bugs.
        
             | jsnell wrote:
             | > 1. GET is the retrieve-the-key-of-string-type operation.
             | 
             | That's a tautological argument. The question isn't what the
             | definition of GET is, but whether the design is good.
             | 
             | > 2. Having GET doing something like LRANGE 0 -1 would have
             | many side effects. Getting for error a huge list and
             | returning a huge data set without any reason, creating
             | latency issues.
             | 
             | If this really were the reason, you'd have separate
             | operations for tiny strings and huge strings. After all, by
             | analogy having GET return a huge string "without any
             | reason" would create latency issues.
             | 
             | But that's not how Redis works, right?
        
               | antirez wrote:
               | The examples I made are just a subset of the protection
               | that this provides. Similarly you can't LRANGE a set
               | type, and so forth. So this in general makes certain
               | errors evident ASAP (command mismatch with the key type).
               | 
               | This does not meant that Redis would not work having
               | generic LEN, INSERT, RANGE commands. But such commands
               | would end also having type-specific options, that I have
               | the feeling is not very clean. Anyway these are design
               | tastes, but I don't think they dramatically change what
               | Redis is or isn't. The interesting part is the data
               | model, the idea of commands operating on abstract data
               | structures, the memory-disk duality, and so forth. If one
               | wants to analyze Redis, and understand merits and issues,
               | a serious analysis should hardly focus on these kind of
               | small choices.
        
             | DrBenCarson wrote:
             | Just because there are reasons for why Redis sucks doesn't
             | meant it doesn't suck
        
             | osigurdson wrote:
             | >> stored at global variables
             | 
             | This is an interesting (and correct) perspective. Global
             | variables scare us in software but we are ok with it when
             | it comes to application state stored in a db.
        
         | brightball wrote:
         | Yea, I always use Redis for very specialized purposes.
         | 
         | Like offloading a shared data structure between threads /
         | processes / machines so that I don't have to deal with thread
         | safety issues.
        
           | Spivak wrote:
           | I understand machines but _threads_?! Why introduce IPC
           | overhead on the fastest /easiest way to share data? This is
           | beyond a solved problem and your language probably has
           | multiple ready-made battle tested solutions.
           | 
           | In Python you don't even need a lib, dict is thread safe even
           | in nogil.
        
             | pdhborges wrote:
             | In Python you don't even need a lib, dict is thread safe
             | even in nogil.
             | 
             | Is it? https://google.github.io/styleguide/pyguide.html#218
             | -threadi...
        
       | codeulike wrote:
       | Weird how SQL Server and its Azure variants gets no mention. It
       | dominates in certain sectors. DBEngines ranks it third most
       | popular overall https://db-engines.com/en/ranking
        
         | patja wrote:
         | The fawning over Larry Ellison is also weird.
        
           | cactusfrog wrote:
           | The joke is that his greed/ unwillingness to squeeze margins
           | has made the entire database company ecosystem possible.
        
         | RadiozRadioz wrote:
         | Lots of people deliberately avoid Microsoft technologies and
         | their whole ecosystem. There's of course interesting stuff
         | happening there, but not enough for those outside the ecosystem
         | to care.
         | 
         | It's more a cultural thing than anything else. HN for example
         | largely leans away from MS. It's quite interesting how little
         | overlap there is between the two worlds sometimes.
         | 
         | Speaking as one of those people, it's just not my thing, so
         | it's not on my radar at all. There's enough stuff happening
         | outside MS to keep me busy forever.
        
         | teej wrote:
         | Are people choosing SQL Server independently of the Microsoft
         | ecosystem? My understating is that you typically use it because
         | you're forced to choose a MS product.
        
           | rawgabbit wrote:
           | SQL Server is a terrific product. And I detest most things
           | Microsoft.
        
           | Tostino wrote:
           | Agreed with the other person. It's a great database. I
           | wouldn't choose it for a startup over Postgres, but it is
           | extremely capable.
        
             | olavgg wrote:
             | I would use it if it supported backup/restore over unix
             | pipes / ssh.
        
         | stackskipton wrote:
         | SRE who deals with some .Net stuff that uses MSSQL but is
         | converting to MySQL. so I feel somewhat qualified to talk about
         | MSSQL. TL;DR: Nothing interesting going on.
         | 
         | There is nothing to talk about here. It's boring database
         | engine that powers boring business applications. It's pretty
         | efficient and can scale vertically pretty well. With state of
         | modern hardware, that vertical limit is high enough most people
         | won't encounter it.
         | 
         | It's also going the way of Windows Server which is to say, it's
         | being sold but not a ton of work is being done on it. Companies
         | that are still invested in it are likely because they don't
         | care about cost ultimately or cost of switching is too high to
         | greenlight the switch.
         | 
         | Anyone who does care about cost like my current company has
         | switched to OSS solutions like
         | PostGres/MySQL/$RandomNoSQLOSSOption. My company switched away
         | when turned into SaaS business and those MSSQL server costs ate
         | into bottom line.
         | 
         | This has been happening throughout the ecosystem. Proget which
         | is THE solution for .Net Artifacts is switching to PostGres:
         | https://blog.inedo.com/inedo/so-long-sql-server-thanks-for-a...
         | 
         | Also, I saw this article from Brent Ozar, who I see as MSSQL
         | smart person, which basically said if you have the option, just
         | go with PostGres:
         | https://www.brentozar.com/archive/2023/11/the-real-problem-w...
         | 
         | It's also worth noting that Microsoft even bought PostGres
         | scaling solution called Citus so they read the writing on the
         | wall: https://blogs.microsoft.com/blog/2019/01/24/microsoft-
         | acquir...
        
           | rawgabbit wrote:
           | I was a big proponent of MSSQL. It is still a good product
           | but I see Microsoft constantly fumbling with new OLAP tools.
           | It is a shame but it seems Microsoft is abandoning MSSQL.
        
           | mbreese wrote:
           | _> It 's boring database engine that powers boring business
           | applications_
           | 
           | I'm taking that as a positive thing... it's boring and does
           | its job with little fanfare. That's pretty much what I want
           | out of a RDBMS. So long as it is "fast-enough" with enough
           | features for the applications that use it, that seems like a
           | good place for an RDBMS to be.
           | 
           | One could still argue about Windows and licensing fees, but
           | from a technical point of view, for business customers,
           | boring isn't necessarily a bad thing.
        
             | FridgeSeal wrote:
             | There's other boring databases that also reliably fill that
             | job, and they also cost far less.
             | 
             | It can also be a bit of a pain outside the C# ecosystem,
             | whereas every language ever has nice postgres drivers that
             | don't require us to download arms setup ODBC. It runs on
             | Linux as of a few years ago, but I also wouldn't be
             | surprised if many people didn't realise that.
        
               | stackskipton wrote:
               | I've run into MSSQL on Linux. Most DBAs know but their
               | entire ecosystem is Windows Server so what's another
               | Windows Server is their thinking.
        
       | sigbottle wrote:
       | These year in review posts are really neat, I liked the AI in
       | review posts really well.
       | 
       | Maybe algorithms review or TCS review or some specific math topic
       | review next?
        
       | CT4u8798 wrote:
       | I love SQL. I'm not a full-time developer but always use SQL over
       | other abstractions, which I find extremely confusing and way more
       | complicated that plain SQL.
        
         | skeeter2020 wrote:
         | I'm now firmly into management but the one skill I use very
         | regularly is SQL. By far the best investment I made in my
         | entire career was a little bit of relational algebra, some
         | casual study of DBMS internals and a lot of hands-on SQL. The
         | quasi-standards have also made it the easiest transfer across
         | specific DBs and their flavours over the years.
         | 
         | PSA: Hi kids, here's a dinosaur with yet more free advice: put
         | the tiniest bit of effort into SQL early on and watch the
         | compound interest add up.
        
         | threeseed wrote:
         | That's just because it is what you are comfortable with.
         | 
         | Many developers will jump straight for ORMs when given the
         | chance.
        
           | Tostino wrote:
           | Which for certain types of applications ORMs absolutely have
           | their use.
        
       | wahnfrieden wrote:
       | 2024 was also the year that Realm died
        
       | travisgriggs wrote:
       | The funding section had me thinking "one of these is not like the
       | others". Both the amount and count of successive rounds.
        
       | ksec wrote:
       | >Six years after MySQL v8 went GA, the team turned v9 out on the
       | streets. ......Oracle is putting all its time and energy into its
       | proprietary MySQL Heatwave service.
       | 
       | Oracle actually released 9.1 already in 2024. [1] And expect
       | another release this month, and every quarter. So I think MySQL
       | continues to get some new features bug fix and support like it
       | used to. Contrary to most people think it is all going to
       | Heatwave. I just hope Vector will be open source later as
       | official to MySQL rather than behind Heatwaves.
       | 
       | [1]
       | https://dev.mysql.com/doc/relnotes/mysql/9.1/en/news-9-1-0.h...
        
       | the_arun wrote:
       | This person started with news on DB - reviewing all prominent DBs
       | & finally ended talking about love of Larry Ellison. A perfect
       | human in the days of LLMs. Amazing write up.
        
       | RedShift1 wrote:
       | I've been using plain postgres for over 5 years now, reading this
       | I feel like I'm in the eye of a storm...
        
       | mebcitto wrote:
       | A couple of spicy things:
       | 
       | > OtterTune. Dana, Bohan, and I worked on this research project
       | and startup for almost a decade. And now it is dead. I am
       | disappointed at how a particular company treated us at the end,
       | so they are forever banned from recruiting CMU-DB students. They
       | know who they are and what they did.
       | 
       | Ouch.
       | 
       | > Lastly, I want to give a shout-out to ByteBase for their
       | article Database Tools in 2024: A Year in Review. In previous
       | years, they emailed me asking for permission to translate my end-
       | of-year database articles into Chinese for their blog. This year,
       | they could not wait for me to finish writing this one, so they
       | jocked my flow and wrote their own off-brand article with the
       | same title and premise.
       | 
       | Also sounds like he's preparing a new company:
       | 
       | > I hope to announce our next start-up soon (hint: it's about
       | databases).
        
         | spprashant wrote:
         | Anyone know what company he may be talking about?
        
         | iso8859-1 wrote:
         | How do they enforce the ban? Do universities have non-compete
         | clauses for PhD students?
        
           | mebcitto wrote:
           | I assume it's not that kind of ban, but more like he'll
           | recommend his students to avoid the company.
        
       | mrtimo wrote:
       | Enjoyed his roundup in the "Shoving Ducks into Everything"
       | section.
       | 
       | DuckDB is a great tool. In April 2020, the creator of DuckDB gave
       | a talk at CMU. In the beginning he makes a convincing argument
       | (in 5 minutes) why data scientists don't use RDBMS and how this
       | was the genesis of DuckDB. Here is a video that starts 3 minutes
       | into the talk (where is argument starts):
       | https://youtu.be/PFUZlNQIndo?si=ql9n2QuBlAEuGIqo&t=204
        
       | dig1 wrote:
       | > There was no major effort to fork off MongoDB, Neo4j, Kafka, or
       | CockroachDB when they announced their license changes.
       | 
       | AFAIK people didn't take MongoDB seriously from the start,
       | especially with the "web scale database" joke circulating. The
       | Neo4j Community version has been under GPLv3 for quite some time,
       | while the Enterprise version has always been somewhat closed,
       | regardless of whether the source code was available on GitHub
       | (the mentioned license change affected the Enterprise version).
       | 
       | Regarding CockroachDB, I must admit that I've only heard about it
       | on HN and don't know anyone who seriously uses it. As for Kafka,
       | there are two versions: Apache Kafka, the open-source version
       | that almost everyone uses (under the Apache license), and
       | Confluent Kafka, which is Apache Kafka enhanced with many
       | additional features from Confluent, and the license change
       | affected Confluent Kafka. In short, maybe the majority simply
       | didn't care about these projects very much, so there is no major
       | fork.
       | 
       | > It cannot be because the Redis and Elasticsearch install base
       | is so much larger than these other systems, and therefore, there
       | were more people upset by the change since the number of MongoDB
       | and Kafka installations was equally as large when they switched
       | their licenses.
       | 
       | I can't speak for MongoDB, but the Confluent Kafka install base
       | is significantly smaller than that of Apache Kafka, Redis and ES.
       | 
       | > Dana, Bohan, and I worked on this research project and startup
       | for almost a decade. And now it is dead. I am disappointed at how
       | a particular company treated us at the end, so they are forever
       | banned from recruiting CMU-DB students. They know who they are
       | and what they did.
       | 
       | Call me a skeptic, but I can't see this as a fair approach. If
       | your company fails for whatever reasons, you should not recruit
       | the university department/group/students against your peers (I
       | can't find that CMU-DB was one of the founders of Ottertune).
       | 
       | Wrt Andy, here are [1] somehow interesting views from
       | (presumably) previous employees.
       | 
       | [1]
       | https://www.reddit.com/r/Database/comments/1dgaazw/comment/l...
        
         | paulddraper wrote:
         | There are production uses of MongoDB (Stripe comes to mind).
         | 
         | But it is certainly not a popular choice there.
        
           | threeseed wrote:
           | MongoDB does over $2b in revenue (and growing by 20%) each
           | year.
           | 
           | There are a lot of production uses.
        
         | apavlo wrote:
         | > Wrt Andy, here are [1] somehow interesting views from
         | (presumably) previous employees.
         | 
         | I am only seeing this now and I take the complaints about being
         | "slightly racist and offensive" very seriously. I am checking
         | with investors, former HR people, and co-founders. I was not
         | made aware of any issues. If anything, I was overly cautious at
         | the company.
         | 
         | I was openly transparent with our employees about every
         | direction the company was pursuing up until the very end. The
         | complaint that "He thinks he knows everything about business"
         | makes me believe this person is just trolling because I was
         | always the first to admit in meetings that I was not an expert
         | in how to run a business. We had to fire people because of
         | inappropriate behavior, but not because I had strong
         | disagreements with how to run the company.
        
           | moab wrote:
           | FWIW, as someone who had multiple friends who worked closely
           | with Andy over 5+ years and at all stages of their career
           | (both BS / PhD) those comments reek of someone with an axe to
           | grind. All of the many anecdotes I have about Andy paint a
           | picture of a great advisor and mentor. I suppose I should say
           | "shenanigans aside", but if you can't separate his jokes from
           | his academic side you need to develop a sense of humor.
        
       | maeil wrote:
       | Good read!
       | 
       | > Postgres' support for extensions and plugins is impressive. One
       | of the original design goals of Postgres from the 1980s was to be
       | extensible. The intention was to easily support new access
       | methods and new data types and operations on those data types
       | (i.e., object-relational). Since 2006, Postgres' "hook" API. Our
       | research shows that Postgres has the most expansive and diverse
       | extension ecosystem compared to every other DBMS.
       | 
       | Greenhorn developers don't even know that there are non-Postgres
       | databases which have extensions too - such is the gap! I wouldn't
       | be surprised if Postgres had as many as all others combined.
        
       | softwaredoug wrote:
       | On the "Amazon can just offer your DB as a service"
       | 
       | Yes this can happen. But a lot of people don't want a AWS managed
       | service. They're like 30% cheaper for 30% less value. They can
       | develop a bad reputation and feel like weird forks (kinesis vs
       | Kafka) that have weird undocumented gotchas and edge cases that
       | never get fixed. Many teams want to host on k8s anyway, and
       | you'll probably have better k8s support from the main project.
       | Another example is the success of Flink over hosted Google
       | Dataflow. Seems eventually the teams I know trend to the most
       | mainstream OSS implementation over time, maybe after early
       | prototyping on a managed system.
       | 
       | IMO it might not be the highest growth market anymore. Those who
       | want to pay for a managed service will. But many are just
       | figuring out a k8s based solution to their infra needs as k8s
       | knowledge becomes more ubiquitous.
        
       | wslh wrote:
       | Great heads up. I wonder about graph databases. He mentioned
       | <https://umbra-db.com/> and <https://cedardb.com/> both include
       | the graph use case and I wonder how they compare to
       | <https://neo4j.com/>.
        
       | atombender wrote:
       | The article mentions Greenplum, but it's worth noting that when
       | the code was closed, several of the original developers created
       | an open-source fork, Cloudberry, which seems to be thriving.
       | Cloudberry was accepted into the Apache project this year, and
       | has synced with Postgres 14, whereas the closed-source Greenplum
       | is still stuck on Postgres 12.
       | 
       | The architecture is quite ancient at this point, but I'm not sure
       | it's completely outdated. It's single-master shared-nothing, with
       | shards distributed among replicas, similar to Citus. But the
       | GPORCA query planner is probably the most advanced distributed
       | query planner in the open source world at this point. From what I
       | know, Greenplum/Cloudberry can be significantly faster than Citus
       | thanks to the planner being smarter about splitting the work
       | across shards.
        
         | ledgerdev wrote:
         | Thanks for the cloudberry mention, wasn't aware of it.
        
       | PeterZaitsev wrote:
       | I think one thing Andy misses about why people were pissed about
       | Elastic and Redis but not as many for MongoDB and some other is
       | their license and size of Contributors Community.
       | 
       | When original license is as restricted as AGPL it is unlikely
       | there is much of embedded use... so less people are impacted in
       | truly catastrophic way
       | 
       | Also if there is no contributor community to speak of... who is
       | going to do the fork ?
       | 
       | I put some thoughts about it in my post about ScyllaDB
       | https://peterzaitsev.com/thoughts-on-scylladb-license-change...
        
         | tayo42 wrote:
         | > In the case of Redis, I can only think that people perceive
         | Redis Ltd. as unfairly profiting off others' work since the
         | company's founders were not the system's original creators. An
         | analysis of Redis' source code repository also shows that a
         | sizable percentage of contributions to the DBMS comes from
         | outside the company
         | 
         | He mentions this in "Andy's Take" section btw
        
           | PeterZaitsev wrote:
           | Yes. Not the license though.
        
       | kwillets wrote:
       | I spent the past year puzzling over the DB market as well, but I
       | don't feel like I'm much closer to understanding it.
       | 
       | It appears that a lot of attention is now directed at the folks
       | doing 100 MB queries, and the high end has moved past everybody's
       | radar. My idea of an exciting product is Ocient, who have skipped
       | over Cloud and gone for hyperscale on-prem hardware. Yellowbrick
       | is also a contender here.
       | 
       | I have a lot of experience with Vertica, and they seem to have
       | gotten stuck in this niche as well, with sales tilted towards big
       | accounts, but less traction in smaller shops, and a difficult
       | road to get a SaaS or similar easy-start offering.
       | 
       | There's a crossover point where self-managed is cheaper than
       | cloud, but nobody seems to have any idea where it is. Snowflake
       | will gladly tell you that your sub-$1M Vertica cluster should be
       | replaced by $10M of sluggish SaaS, and that you are saving money
       | by doing so. These decisions seem more in the realm of psychology
       | or political science.
       | 
       | DHH's cloud exit was a refreshing take on the expense issue, even
       | if it wasn't strictly in the database space -- the cost per VCPU
       | and so forth that he documented is a good start for estimating
       | savings, and he debunked a lot of the "hidden costs" that cloud
       | maximalists claim.
       | 
       | In the business/financial space the biggest news to me was the
       | correction in Snowflake's stock price, which seemed to indicate
       | that investors were finally noticing metrics like price-
       | performance, but they added a little more AI and went back into
       | irrationality.
       | 
       | I'm heavily in favor of DuckDB, Hudi, Iceberg, S3 tables, and the
       | like. Mixing high-end and low-end tools seems like the best
       | strategy (although settling on one high-end DWH has also worked
       | IME), and the low end is getting better and cheaper, squeezing
       | out the mid-range SaaS vendors.
       | 
       | In research I found Goetz Graefe's work in offset-value coding
       | exciting -- he's wired it into query operators in a way that
       | saves a lot of CPU on sorting and joins/aggregation. This is a
       | technique that I've applied favorably in string sorting, and it
       | was discovered in the DB community decades ago but largely
       | forgotten. (This work precedes 2024, but I'm a slow study.)
        
       | bionhoward wrote:
       | Redis is slow?
        
         | kermatt wrote:
         | I wish there was more context around that statement in his
         | post.
         | 
         | Redis while not having some of the features he mentions in [1]
         | (i.e. SQL), when used for what it excels at is usually not
         | considered "slow".
         | 
         | As an in-memory data structure server, a common use case is to
         | use it for where some operations in a typical RDBMS are slow.
         | 
         | [1] https://youtu.be/fZbwD1gzjLk?t=2018
        
       | ak_111 wrote:
       | Wow his database startup that raised 12M died this year after
       | only three years.
       | 
       | If anything this shows how insanely difficult it must be to
       | succeed as a database startup (when was the most recent startup
       | success in this space?), as the founding team is stellar.
       | 
       | On the other hand I am surprised it died this quick and
       | interested to know if they did a proper postmortem. Not only did
       | they raise way more than is needed to survive for three years but
       | the idea is about utilising AI to improve DB performance and I
       | find it hard to imagine they couldn't find more investors to lend
       | them a lifeline with all the AI hype.
        
         | Tanjreeve wrote:
         | No idea about internal workings but as a "DB optimisation"
         | startup you're competing with
         | 
         | - most people don't need it
         | 
         | - People who do need it having DBAs/Operations people
         | 
         | - or consultancies
         | 
         | - Database vendors that have automatic optimisation as a
         | feature
         | 
         | Ok "AI" in the name but I think for something as specific as DB
         | optimisation AI jazz hands probably don't work as well. Writing
         | it out it almost seems harder than being an actual DB vendor.
        
       | lvl155 wrote:
       | I highly recommend their Youtube series on databases. They have
       | great guest speakers.
        
       | based2 wrote:
       | https://dbos-project.github.io news
        
         | refset wrote:
         | More specifically, DBOS Inc. raised a $8.5 million seed round
         | [0] and is backed by Michael Stonebraker (the creator of
         | Postgres). I initially assumed Andy was alluding to this when
         | he wrote "the most famous database octogenarian splashing cash"
         | :)
         | 
         | [0] https://techcrunch.com/2024/03/12/new-startup-from-
         | postgres-...
        
       | polishdude20 wrote:
       | After I interviewing at OtterTune a while back and being
       | bombarded with multiple rounds of leetcode questions, I somehow
       | knew OtterTune wouldn't make it
        
       | roark_howard wrote:
       | DuckDB dominating over DataFusion could fuel the ongoing language
       | war with a great half baked argument!
        
       | gigatexal wrote:
       | This take screams more than a technical criticism but of
       | something personal. "I'll be blunt: I don't care for Redis. It is
       | slow, it has fake transactions, and its query syntax is a
       | freakshow. Our experiments at CMU found Dragonfly to have much
       | more impressive performance numbers (even with a single CPU
       | core). In my database course, I use the Redis query language as
       | an example of what not to do." (From the article)
       | 
       | Of course it's not to be used as a general purpose DB it's keys
       | and values. Used for caches and things like that. In my
       | experience in real world scenarios and loads vanilla single
       | threaded Redis is stable, fast, and nigh bulletproof.
        
       | rednafi wrote:
       | Loved the overview. Hated the shade toward Redis. Redis has
       | arguably the best key-value query syntax, and there's a reason so
       | many people swear by it. True, the decision-makers at Redis Ltd
       | are absolute pieces of trash, but Redis itself is a delightful
       | piece of engineering artifact.
       | 
       | I don't care about the billion-dollar drama behind a piece of
       | tech, but Redis defined the key-value query API for many similar
       | databases. Trashing it just because it isn't SQL-like feels
       | unjustified.
        
       | quotemstr wrote:
       | There's one QOL extension that I haven't seen anyone else
       | implement: dimensional analysis. I can declare a column is an
       | integer. Why not an integer that expresses feet? Why shouldn't I
       | be able to write SELECT 1inch + 1cm and get a correctly computed
       | length? Why can't the query parser help me avoid nonsense like
       | SELECT 1kg + 1hr? All this stuff is pretty straightforward to add
       | and would help avoid avoidable mistakes.
        
       | Upvoter33 wrote:
       | Pretty funny.
       | 
       | One factual issue: "The university had previously announced that
       | this player was transferring from Louisiana State to Michigan."
       | This is not true. Underwood had committed to LSU but then
       | switched his commitment to Michigan. He was still in high school
       | at the time, and has never attended LSU.
       | 
       | But, do you really expect a funny database prof to know much
       | about football?
        
         | mpbart wrote:
         | I never thought I'd see a discussion about the Underwood NIL
         | drama on a databases blog post but here we are.
        
       | bcoates wrote:
       | "I've never met anybody that used Alteryx"
       | 
       | I have! It's a pretty good no-code/minimal-code graphical
       | ELT+Analytics in one tool. It's one of those alternate-universe
       | tools that has it's own way of doing things from everything else
       | in the industry, but it's pragmatic and the people who use it
       | tend to love it.
       | 
       | The one thing that makes it viable is that is has/had (pre-
       | acquisition) _very_ aggressive compatibility with anything else
       | that can hold data, so you can use it as a bolt-on to whatever
       | other databases or files your company has.
       | 
       | Despite what the PE press release about the acquisition says, it
       | has virtually nothing to do with AI, at lease in the modern big
       | NN sense.
       | 
       | If you're looking to fix your giant pile of alteryx workbooks or
       | migrate them to something else, hmu
        
       | osigurdson wrote:
       | More like: "Database license drama - a year in review".
        
       ___________________________________________________________________
       (page generated 2025-01-01 23:00 UTC)