[HN Gopher] MongoDB acquires Voyage AI
       ___________________________________________________________________
        
       MongoDB acquires Voyage AI
        
       Author : marc__1
       Score  : 85 points
       Date   : 2025-02-24 15:37 UTC (7 hours ago)
        
 (HTM) web link (investors.mongodb.com)
 (TXT) w3m dump (investors.mongodb.com)
        
       | schnebbau wrote:
       | How does MongoDB still have that much available to spend?
       | Everyone I know moved off it years ago.
        
         | geodel wrote:
         | Everyone you know put a dollar in donation basket while moving
         | off. Mongo collected all and brought Voyage AI
        
         | porridgeraisin wrote:
         | There are a lot of people still on it, including the place I
         | worked at last.
         | 
         | It was starting to get expensive though, so we were
         | experimenting with other document stores (dynamodb was being
         | trialled, since we were already AWS for most things, just
         | around the time I left)
        
         | bithavoc wrote:
         | that's what I thought, but every single candidate I interviewed
         | mentioned MongoDB as their recent reference document database,
         | I asked the last candidate if they were self-hosting, the
         | answer is no, they used MongoDB cloud.
        
           | winrid wrote:
           | I self host a handful of mongodb deployments for personal
           | projects and manage self hosted mongo deployments of almost a
           | hundred nodes for some companies. Atlas can get very
           | expensive if you need good IO.
        
           | slt2021 wrote:
           | if you a developer you wanna use MongoDB as database, not be
           | MongoDB SRE and DBA
           | 
           | thats the reason for using Atlas
        
             | skatanski wrote:
             | Precisely, and if you are enterprise, you want to have an
             | option to request priority support and have a lot of
             | features out of the box. Also some of the search features
             | are only available in Atlas unfortunately.
        
           | rpep wrote:
           | You cant use the embeddings/vector search stuff this refers
           | to in self hosted anyway, it's only implemented in their
           | Atlas Cloud product. It makes it a real PITA to test locally.
           | The Atlas Dev local container didn't work the same when I
           | tried it earlier in the year.
        
         | isoprophlex wrote:
         | Pretty sure they achieved fiscal nirvana by exploiting
         | enterprise brain rot. You hook em, they accumulate tech debt
         | for years, all their devs leave, now they can't move away & you
         | can start increasing prices. Eventually the empty husk will
         | topple over but that's still years away.
        
           | dimgl wrote:
           | Is it possible that they simply have a good product?
        
             | isoprophlex wrote:
             | Impossible! It's not based on sqlite, postgres or written
             | in rust, so it must be terrible!
        
             | vosper wrote:
             | They do have a good product, but "they accumulate tech debt
             | for years, all their devs leave, now they can't move away"
             | is the story of the place I worked at a few years ago. The
             | database was such a disorganized, inconsistent mess that
             | no-one had the stomach (or budget) to try and get off it.
        
           | xyst wrote:
           | Then they get acquired by BloodMoor and they squeeze every
           | last cent out of the remaining customers.
        
           | axpy906 wrote:
           | Unironically, this.
        
         | DarmokJalad1701 wrote:
         | Because they are web-scale obviously.
        
         | Cshelton wrote:
         | We use it a lot for a specific use-case and it works great.
         | Mongo has come a long long way since the release over a decade
         | ago, and if you keep it in Majority Read and Write, it's very
         | reliable.
         | 
         | Also, on some things, it allows us to pivot much faster. And
         | now with the help of LLMs, writing "Aggregation Pipelines" are
         | very fast.
        
           | burningion wrote:
           | I've been using Mongo while developing some analysis /
           | retrieval systems around video, and this is the correct
           | answer. Aggregation pipelines allow me to do really powerful
           | search around amorphous / changing data. Adding a way to
           | automatically update / recalculate embeddings to your
           | database makes even more sense.
        
             | magarnicle wrote:
             | Do you have any tricks for writing and debugging pipelines?
             | I feel like there are so many little hiccups that I spend
             | ages figuring out if that one field name needs a $ or not.
        
           | codr7 wrote:
           | Pretending a pile of json is a database is great for
           | pivoting, not so great for anything else.
           | 
           | Maintaining apps built on MongoDB is soul killing.
        
         | SilasX wrote:
         | Well, it's referred to as a cash-and-stock deal but I can't
         | find any more detail about how much is stock:
         | 
         | https://seekingalpha.com/news/4412466-mongodb-acquires-voyag...
        
         | mgfist wrote:
         | $2.3B in cash as of last quarter
        
         | yfontana wrote:
         | This may be a shock to many HN readers, but MongoDB's revenue
         | has been growing quite fast in the last few years (from 400M in
         | 2020 to 1.7B in 2024). They've been pushing Atlas pretty hard
         | in the Enterprise world. Have no experience with it myself, but
         | I've heard some decently positive things about it (ease of set
         | up and maintenance, reliability).
        
       | thecleaner wrote:
       | Curious - do people migrate due to the price tag, ease of use,
       | sth else ?
        
       | ChrisArchitect wrote:
       | Voyage AI post: https://blog.voyageai.com/2025/02/24/joining-
       | mongodb/
        
         | BlairCurrey wrote:
         | and the mongo blog post for how it will be used:
         | https://www.mongodb.com/blog/post/redefining-database-ai-why...
        
       | infecto wrote:
       | Only skimmed through the release..I hope they continue supporting
       | the API but it comes with a little higher confidence that the
       | company behind it is not collecting all your data. Voyage has
       | some interesting embedding models that I have been hesitant to
       | fully utilize due to the lack of confidence in the startup behind
       | it.
        
         | kaycebasques wrote:
         | This blog post outlines the new roadmap:
         | https://www.mongodb.com/blog/post/redefining-database-ai-why...
        
           | __jl__ wrote:
           | They commit to supporting the API in step 1 but it's not
           | entirely clear to me whether that commitment continues with
           | step 2-3...
        
       | Beefin wrote:
       | what's the calculus here? if i'm a developer choosing a low-level
       | primitive such as a database, i'm likely quite opinionated on
       | which models i use.
        
         | crowcroft wrote:
         | If I had to guess they might see embedding models become small
         | and optimised enough to the point that they can pull them into
         | the DB layer as a feature instead of being something devs need
         | to actively think about and build into their app.
         | 
         | Or it could just be an expansion to their cloud offering. In a
         | lot of cases embedding models just need to be 'good enough' and
         | cheap and/or convenient is a winning GTM approach.
        
       | cpursley wrote:
       | How is MongoDB still a thing when there's already several ways to
       | handle json in Postgres including Microsofts new documentdb
       | extension:
       | 
       | https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
       | 
       | What am I missing? Are Mongo users simply front end folks who
       | didn't have time to learn basic SQL or back end architecture?
        
         | frankfrank13 wrote:
         | Enterprise sales
        
         | amazingamazing wrote:
         | MongoDB is not the same as Postgres and jsonb.
         | 
         | also, I'd challenge your thinking - ultimately the goal is to
         | solve problems. you don't necessarily need SQL, or relations
         | for that matter. that being said, naively modeling your stuff
         | in mongodb (or other things like dynamodb) will cause you
         | severe pain...
         | 
         | what's also true, which people forget, is naively modeling your
         | stuff with a relational database will also cause you pain. as
         | they sometimes say, normalize until it hurts, and then
         | denormalize to scale and make it work
         | 
         | the amount of places I've seen that skip the second part and
         | have extremely normalized databases makes me cringe. it's like
         | people think joins are free...
        
           | pphysch wrote:
           | Then your implementation can be as simple as CREATE TABLE
           | documents (content JSONB);. But I suspect a PK and some
           | metadata columns like timestamps will come in handy.
        
             | amazingamazing wrote:
             | _sigh_ - mongoDB is not the same as creating a table with
             | jsonb. for one, you don 't have to deal with handling
             | connections. that being said, postgres is great, but it's
             | not the same.
        
               | pphysch wrote:
               | Postgres has ways to simplify connection management, if
               | that is a blocker for you (pooling, pgbouncer, postgrest,
               | etc)
        
         | pphysch wrote:
         | It's simply not that widespread of knowledge. Modern Postgres
         | users would never suggest Mongo, but a generation of engineers
         | was taught that Mongo is _the_ NoSQL solution, even though it
         | 's essentially legacy tech.
         | 
         | I just ran into a greenfield project where the dev reached for
         | Mongo, and didn't have a good technical reason for it beyond
         | "I'm handing documents". Probably wasn't aware of alternatives.
         | FWIW Postgres would've been a great fit for it, they were
         | modeling research publications.
        
         | gddgb wrote:
         | Um because it must be worth 2 billion if this acquisition is
         | worth $220 million. I know there's rules about discussion
         | quality on this site, so I guess we can't question that.
        
         | computerfan494 wrote:
         | I will copy and paste a comment I wrote here previously:
         | 
         | "MongoDB ships with horizontal sharding out-of-the-box, has
         | idiomatic and well-maintained drivers for pretty much every
         | language you could want (no C library re-use), is reasonably
         | vendor-neutral and can be run locally, and the data modeling it
         | encourages is both preferential for some people as well as
         | pushes users to avoid patterns that don't scale very well with
         | other models. Whether these things are important to you is a
         | different question, but there is a lot to like that
         | alternatives may not have answers for. If you currently or plan
         | on spending > 10K per month on your database, I think MongoDB
         | is one of the strongest choices out there."
         | 
         | I have also run Postgres at very large scale. Postgres' JSONB
         | has some serious performance drawbacks that don't matter if you
         | don't plan on spending a lot of money to run your database, but
         | MongoDB does solve those problems. This new documentdb
         | extension from Microsoft may solve some of the pain, but this
         | is some very rough code if you browse around, and Postgres
         | extensions are quite painful to use over the long term.
         | 
         | The reality is that it is not possible to run vanilla Postgres
         | at scale. It's possible to fix its issues with third party
         | solutions or cobbling together your own setup, but it takes a
         | lot of effort and knowledge to ensure you've done things
         | correctly. It's true that many people never reach that scale,
         | but if you do, you're willing to spend a lot of money on
         | something that works well.
        
           | thayne wrote:
           | > MongoDB ships with horizontal sharding out-of-the-box
           | 
           | Maybe it's better than it was, but my experience with Mongodb
           | a decade ago is that that horizontal sharding didn't work
           | very well. We constantly ran into data corruption and
           | performance issues with rebalancing the shards. So much so
           | that we had a company party to celebrate moving off of
           | Mongodb.
        
             | threeseed wrote:
             | > my experience with Mongodb a decade ago
             | 
             | So before the Apple Watch was released.
             | 
             | Why is this relevant today ? Technology changes very
             | quickly.
        
         | ecshafer wrote:
         | I have seen a few rather large, production mongodb deployments.
         | I don't understand how so many people chose it as their basis
         | of their applications. There are a not-negligible amount of
         | mongodb deployments I have seen that basically treat mongodb as
         | a memory dump, where they then scan from some key and hope for
         | the best. I have never seen a mongodb solution where I thought
         | that it was better than if they just chose any sql server.
         | 
         | SQL or rather just some schema based database has a ton of
         | advantages. Besides speed, there is a huge benefit for
         | developers to be able to look at a schema and see how the
         | relationships in the data work. Mongodb usually involves
         | looking at a de facto schema, but with fewer guarantees on
         | types relations or existence, then trawling code for how its
         | used.
        
         | orochimaaru wrote:
         | We use their atlas offering. It's a bit pricey but we are very
         | happy with it. It's got a bunch of stuff integrated - vectors,
         | json (obviously), search and charting along with excellent
         | support for drivers and very nice out of the box monitoring.
         | 
         | Now I could possible spend a bunch of time and do the same
         | thing with open source dbs - but why? I have a small team and
         | stuff to deliver. Atlas allows me to do it fast.
        
           | cpursley wrote:
           | There's a ton of hosted Postgres providers that do all of
           | that and more and are just as simple to use. Neon.tech is
           | really easy to set up and if you need more of a baas
           | (firebase alternative), Supabase. Plus, no vendor lock in.
           | I've moved vendors several times, most recently AWS RDS to
           | Neon and it was nearly seamless. Was originally on Heroku
           | Postgres going way back. Try getting off Atlas...
        
             | orochimaaru wrote:
             | Ha - easier said than done in an enterprise, especially
             | when nothing is wrong. Maybe the $$, but at some point the
             | effort involved with supply chain and reengineering dwarfs
             | any "technical" benefit.
             | 
             | This is why startups like to get into a single supply chain
             | contract with an enterprise - it's extremely hard to get it
             | setup, but once done very easy to reuse the template.
        
           | skatanski wrote:
           | Similar here, there are gotchas though. Some versions ago
           | they've changed their query optimization engine - some of our
           | "slow aggregations" become "unresponsive aggregations"
           | because suboptimal indexes were suddenly used. We had to use
           | hints to force proper indexing. Their columnar db offering is
           | quite bad - I'd say if there's need for analytical
           | functionality, its better to go with a different db. Oplog
           | changes format - and although its expected, it still hurts me
           | every now and then when I need to check something. Similarly
           | at some point they've changed how nested arrays are updated
           | in changestream, which has broken our auditing (its not
           | recommended to use changestream for auditing, we still did ;)
           | ). We've started using NVM instances for some of our more
           | heavily used clusters. Well it turned out recovery of an NVM
           | cluster is much much slower than a standard cluster. But all
           | in all I really like mongodb, if there are no relations - its
           | a good choice. Its also good for prototyping.
        
         | crowcroft wrote:
         | If you can learn Mongo you can learn SQL and 'back end
         | architecture' let's be honest the basics are hardly difficult
         | no matter what tool you're using.
         | 
         | Just because Postgres is good doesn't mean other things can't
         | also be good (and better for some use cases).
        
         | nextworddev wrote:
         | Mongo is Firestore for entrprise
        
       | hartator wrote:
       | I rather them focus on performance.
       | 
       | Last MongoDB is still slower than MongoDB 3.4. An almost 10-year
       | old release. For both reads and writes.
        
         | amazingamazing wrote:
         | mongodb had consistency issues before v5 if I recall, so take
         | that for what it's worth.
        
         | memco wrote:
         | Can you share more details about the conditions under which it
         | is slow in recent versions? We moved from 3.x to 7 for our main
         | database and after adding a few indexes we were missing we have
         | seen at least an order of magnitude speed up.
        
           | hartator wrote:
           | Most regular inserts and regular selects:
           | https://medium.com/serpapi/mongodb-
           | benchmark-3-4-vs-4-4-vs-5...
           | 
           | We have internally a benchmark with MongoDB 8.x, but same
           | pattern of disappointing results.
        
         | touche_bag wrote:
         | I think 8 was a release purely focused on performance, with
         | some big improvements. Comparing 3.4 is kinda unfair.. You were
         | fast with the tradeoff of half your data missing half the time
        
           | hartator wrote:
           | That _might_ explain the write performance degradation, but
           | not the reads.
        
       | kaycebasques wrote:
       | Bloomberg says it was a $220M cash & stock deal:
       | https://www.bloomberg.com/news/articles/2025-02-24/mongodb-b...
        
       | markus_zhang wrote:
       | Looks like everyone is jumping into the AI game. Is there a
       | bubble?
        
       | htrp wrote:
       | Voyage AI basically builds embedding models for vector search
        
         | crowcroft wrote:
         | You don't hear the big AI providers talk about embeddings much,
         | but I have to believe in the long run that companies building
         | SOTA foundational LLMs are going to ultimately have the best
         | embedding models.
         | 
         | Unless you can get to a point where you can make these models
         | small enough that basically sit in the DB layer of an
         | application...
        
           | htrp wrote:
           | That and because the embedding models are much easier to
           | improve with at scale usage (hence why everyone has a deep
           | search/research/RAG tool built into their AI web app now).
        
       | connectsnk wrote:
       | I understand the criticisms, but in my experience, MongoDB has
       | come a long way. Many of the earlier issues people mention have
       | been addressed. Features like sharding, built-in replication, and
       | flexible schemas have made scaling large datasets much smoother
       | for me. It's not perfect, but it's a solid choice.
        
         | beoberha wrote:
         | I think the amount of people working on large enterprise
         | systems here is a lot smaller than one would think.
         | 
         | Whenever a fly.io post about sqlite ends up in here, there are
         | a scary amount of comments about using sqlite in way more
         | scenarios than it should be.
        
           | connectsnk wrote:
           | True. I have that feeling many times that the enterprise
           | crowd doesnt visits hacker news.
        
           | koakuma-chan wrote:
           | Why would I use anything other than sqlite?
        
       | moralestapia wrote:
       | 10x exit in a couple years, quite nice on the VC side!
       | 
       | On the tech side ... no idea what Mongo's plan is ... their
       | embedding model is not SOTA, does not even outperform the open
       | ones out there, and reranking is a dead end in 2025.
       | 
       | I think the value is on Voyage's team, their user base and having
       | a vision that aligned with Mongo's.
       | 
       | Congrats!
        
         | touche_bag wrote:
         | Interesting take. Have you benchmarked models on your own data?
         | Cause at this point everything is contaminated so I find it
         | impossible to tell what proper sota is. Also - most folks still
         | just use openai. Last time I checked, reranking _always_
         | performs better than pure vector search. And to my knowledge it
         | 's still the superior fusion method for keyword and vector
         | results.
        
           | moralestapia wrote:
           | In my experience, storing RAG chunks with a little bit of
           | context helps a lot when doing the retrieval, then you can
           | skip the whole "rerank" bit and halve your cost and latency.
           | 
           | With embedding/generative models becoming better with time,
           | the need for a rerank step will be optimized away.
        
         | hweller wrote:
         | >their embedding model is not SOTA, does not even outperform
         | the open ones out there, and reranking is a dead end in 2025.
         | 
         | Are you referring to the MTEB leaderboard? It's widely believed
         | many of those test datasets are considered during the training
         | of most open-source text embedding models, hence why you see
         | novel + private benchmarks discussed in many launch blogs that
         | don't exclusively refer to MTEB. There are problems there, and
         | it would be great to see more folks in the search benchmark
         | dataset production space like what Marqo AI has done in recent
         | months.
         | 
         | Also what makes you say reranking is dead? Mongo doesn't
         | provide it out of the box but many other search providers like
         | ES, Pinecone, Opensearch do so it must provide some value to
         | their customers? Maybe you're saying it's overrated in terms of
         | how many apps actually need it?
         | 
         | disclosure: I work on vector search at Mongo
        
           | moralestapia wrote:
           | >Maybe you're saying it's overrated in terms of how many apps
           | actually need it?
           | 
           | Yes, my comment leans more towards that, rather than
           | suggesting is useless.
        
             | redwood wrote:
             | Taking a step back, accuracy/quality of retrieval is
             | critical as input to anything generated b/c your generated
             | output is only as good as your input. And right now folks
             | are struggling to adopt generative use cases due to risk
             | and fear of how to control outputs. Therefore I think this
             | could be bigger than you think.
        
       | lpapez wrote:
       | Is Voyage AI web-scale yet?
        
       | jamesrr39 wrote:
       | Genuine question: I appreciate the comments about MongoDB being
       | much better than it was 10 years ago; but Postgres is also much
       | better today than then as well. What situations is Mongo better
       | than Postgres? Why choose Mongo in 2025?
        
         | 999900000999 wrote:
         | Simple.
         | 
         | Postgres is hard, you have to learn SQL. SQL is hard and mean.
         | 
         | Mongo means we can just dump everyone into a magic box and
         | worry about it later.No tables to create.
         | 
         | But their is little time, we need to ship our CRUD APP NOW! No
         | one on the team knows SQL!
         | 
         | I'm actually using Postgres via Supabase for my current
         | project, but I would probably never use straight up Postgres.
        
           | SEJeff wrote:
           | Postgres supports JSONB natively. It literally speaks mongo
           | line protocol and you can shove unstructured json into it.
           | 
           | It has supported this since 9.4:
           | https://www.postgresql.org/docs/current/datatype-json.html
        
             | 999900000999 wrote:
             | I don't necessarily agree with the above justifications,
             | but in my experience this is basically why teams pick
             | Mongo.
             | 
             | It's easier to get started with.
        
           | chpatrick wrote:
           | Even as a JSON document store I'd rather use postgres with a
           | jsonb column.
        
         | riku_iki wrote:
         | Mongo is real distributed and scalable DB, while postgres is
         | single server DB, so main consideration could be if you need to
         | scale beyond single server.
        
           | throw14082020 wrote:
           | Ahhh, this sounds familiar!
           | https://www.youtube.com/watch?v=b2F-DItXtZs
        
           | threeseed wrote:
           | High availability is more important than scalability for
           | most.
           | 
           | On average an AWS availability zone tends to suffer at least
           | one failure a year. Some are disclosed. Many are not. And so
           | that database you are running on a single instance will die.
           | 
           | Question is do you want to do something about it or just
           | suffer the outage.
        
           | amazingamazing wrote:
           | It's sad that this was downvoted. It's literally true.
           | MongoDB vs. vanilla Postgres is not in Postgres' favor with
           | respect to horizontal scaling. It's the same situation with
           | Postgres vs. MySQL.
           | 
           | That being said there are plenty of ways to shard Postgres
           | that are free, e.g. Citus. It's also questionable whether
           | many need sharding. You can go a long way with simply a
           | replica.
           | 
           | Postgres also has plenty of its own strengths. For one, you
           | can get a managed solution without being locked into MongoDB
           | the company.
        
             | threeseed wrote:
             | Citus is owned by Microsoft.
             | 
             | And history has not been nice to startups like this
             | continuing their products over the long term.
             | 
             | It's why unless it is built-in and supported it's not
             | feasible for most to depend on it.
        
               | amazingamazing wrote:
               | that's fair, but that's true of mongodb itself too. I
               | wouldn't count that against either of them.
        
         | threeseed wrote:
         | a) MongoDB has built-in, supported, proven scalability and high
         | availability features. PostgreSQL does not. If it wasn't for
         | cloud offerings like AWS Aurora providing them no company would
         | even bother with PostgreSQL at all. It's 2025 these features
         | are not-negotiable for most use cases.
         | 
         | b) MongoDB does one thing well. JSON documents. If your domain
         | model is built around that then nothing is faster. Seriously
         | nothing. You can do tuple updates on complex structures at
         | speeds that cripple PostgreSQL in seconds.
         | 
         | c) Nobody who is architecting systems ever thinks this way. It
         | is never MongoDB _or_ PostgreSQL. They specialise in different
         | things and have different strengths. It is far more common to
         | see both deployed.
        
           | jeremycarter wrote:
           | Great response. All arguments are valid and fair.
        
           | scosman wrote:
           | A) Postgres easily scales to billions of rows without
           | breaking a sweat. After that shard. It's definitely
           | negotiable.
        
             | threeseed wrote:
             | So does a text file.
             | 
             | Statements like yours are meaningless when you aren't
             | specific about the operations, schema, access patterns etc.
             | 
             | If you have a single server, relational use case then
             | PostgreSQL is great. But like all technology it's not great
             | at everything.
        
               | scosman wrote:
               | The use a text file.
               | 
               | In all seriousness, calling Postgres' scalability "not-
               | negotiable for most use cases" is wild.
        
           | delusional wrote:
           | > It's 2025 these features are not-negotiable for most use
           | cases.
           | 
           | Excuse me? I do enterprise apps, along with most of the
           | developers I know. We run like 100 transactions per second
           | and can easily survive hours of planned downtime.
           | 
           | It's 2025, computers are really fast. I barely need a
           | database, but ACID makes transaction processing so much
           | easier.
        
         | beAbU wrote:
         | Mongo is Web scale.
        
         | koakuma-chan wrote:
         | Choose Mongo if you need web scale.
        
       ___________________________________________________________________
       (page generated 2025-02-24 23:01 UTC)