[HN Gopher] MongoDB acquires Voyage AI
___________________________________________________________________
MongoDB acquires Voyage AI
Author : marc__1
Score : 85 points
Date : 2025-02-24 15:37 UTC (7 hours ago)
(HTM) web link (investors.mongodb.com)
(TXT) w3m dump (investors.mongodb.com)
| schnebbau wrote:
| How does MongoDB still have that much available to spend?
| Everyone I know moved off it years ago.
| geodel wrote:
| Everyone you know put a dollar in donation basket while moving
| off. Mongo collected all and brought Voyage AI
| porridgeraisin wrote:
| There are a lot of people still on it, including the place I
| worked at last.
|
| It was starting to get expensive though, so we were
| experimenting with other document stores (dynamodb was being
| trialled, since we were already AWS for most things, just
| around the time I left)
| bithavoc wrote:
| that's what I thought, but every single candidate I interviewed
| mentioned MongoDB as their recent reference document database,
| I asked the last candidate if they were self-hosting, the
| answer is no, they used MongoDB cloud.
| winrid wrote:
| I self host a handful of mongodb deployments for personal
| projects and manage self hosted mongo deployments of almost a
| hundred nodes for some companies. Atlas can get very
| expensive if you need good IO.
| slt2021 wrote:
| if you a developer you wanna use MongoDB as database, not be
| MongoDB SRE and DBA
|
| thats the reason for using Atlas
| skatanski wrote:
| Precisely, and if you are enterprise, you want to have an
| option to request priority support and have a lot of
| features out of the box. Also some of the search features
| are only available in Atlas unfortunately.
| rpep wrote:
| You cant use the embeddings/vector search stuff this refers
| to in self hosted anyway, it's only implemented in their
| Atlas Cloud product. It makes it a real PITA to test locally.
| The Atlas Dev local container didn't work the same when I
| tried it earlier in the year.
| isoprophlex wrote:
| Pretty sure they achieved fiscal nirvana by exploiting
| enterprise brain rot. You hook em, they accumulate tech debt
| for years, all their devs leave, now they can't move away & you
| can start increasing prices. Eventually the empty husk will
| topple over but that's still years away.
| dimgl wrote:
| Is it possible that they simply have a good product?
| isoprophlex wrote:
| Impossible! It's not based on sqlite, postgres or written
| in rust, so it must be terrible!
| vosper wrote:
| They do have a good product, but "they accumulate tech debt
| for years, all their devs leave, now they can't move away"
| is the story of the place I worked at a few years ago. The
| database was such a disorganized, inconsistent mess that
| no-one had the stomach (or budget) to try and get off it.
| xyst wrote:
| Then they get acquired by BloodMoor and they squeeze every
| last cent out of the remaining customers.
| axpy906 wrote:
| Unironically, this.
| DarmokJalad1701 wrote:
| Because they are web-scale obviously.
| Cshelton wrote:
| We use it a lot for a specific use-case and it works great.
| Mongo has come a long long way since the release over a decade
| ago, and if you keep it in Majority Read and Write, it's very
| reliable.
|
| Also, on some things, it allows us to pivot much faster. And
| now with the help of LLMs, writing "Aggregation Pipelines" are
| very fast.
| burningion wrote:
| I've been using Mongo while developing some analysis /
| retrieval systems around video, and this is the correct
| answer. Aggregation pipelines allow me to do really powerful
| search around amorphous / changing data. Adding a way to
| automatically update / recalculate embeddings to your
| database makes even more sense.
| magarnicle wrote:
| Do you have any tricks for writing and debugging pipelines?
| I feel like there are so many little hiccups that I spend
| ages figuring out if that one field name needs a $ or not.
| codr7 wrote:
| Pretending a pile of json is a database is great for
| pivoting, not so great for anything else.
|
| Maintaining apps built on MongoDB is soul killing.
| SilasX wrote:
| Well, it's referred to as a cash-and-stock deal but I can't
| find any more detail about how much is stock:
|
| https://seekingalpha.com/news/4412466-mongodb-acquires-voyag...
| mgfist wrote:
| $2.3B in cash as of last quarter
| yfontana wrote:
| This may be a shock to many HN readers, but MongoDB's revenue
| has been growing quite fast in the last few years (from 400M in
| 2020 to 1.7B in 2024). They've been pushing Atlas pretty hard
| in the Enterprise world. Have no experience with it myself, but
| I've heard some decently positive things about it (ease of set
| up and maintenance, reliability).
| thecleaner wrote:
| Curious - do people migrate due to the price tag, ease of use,
| sth else ?
| ChrisArchitect wrote:
| Voyage AI post: https://blog.voyageai.com/2025/02/24/joining-
| mongodb/
| BlairCurrey wrote:
| and the mongo blog post for how it will be used:
| https://www.mongodb.com/blog/post/redefining-database-ai-why...
| infecto wrote:
| Only skimmed through the release..I hope they continue supporting
| the API but it comes with a little higher confidence that the
| company behind it is not collecting all your data. Voyage has
| some interesting embedding models that I have been hesitant to
| fully utilize due to the lack of confidence in the startup behind
| it.
| kaycebasques wrote:
| This blog post outlines the new roadmap:
| https://www.mongodb.com/blog/post/redefining-database-ai-why...
| __jl__ wrote:
| They commit to supporting the API in step 1 but it's not
| entirely clear to me whether that commitment continues with
| step 2-3...
| Beefin wrote:
| what's the calculus here? if i'm a developer choosing a low-level
| primitive such as a database, i'm likely quite opinionated on
| which models i use.
| crowcroft wrote:
| If I had to guess they might see embedding models become small
| and optimised enough to the point that they can pull them into
| the DB layer as a feature instead of being something devs need
| to actively think about and build into their app.
|
| Or it could just be an expansion to their cloud offering. In a
| lot of cases embedding models just need to be 'good enough' and
| cheap and/or convenient is a winning GTM approach.
| cpursley wrote:
| How is MongoDB still a thing when there's already several ways to
| handle json in Postgres including Microsofts new documentdb
| extension:
|
| https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
|
| What am I missing? Are Mongo users simply front end folks who
| didn't have time to learn basic SQL or back end architecture?
| frankfrank13 wrote:
| Enterprise sales
| amazingamazing wrote:
| MongoDB is not the same as Postgres and jsonb.
|
| also, I'd challenge your thinking - ultimately the goal is to
| solve problems. you don't necessarily need SQL, or relations
| for that matter. that being said, naively modeling your stuff
| in mongodb (or other things like dynamodb) will cause you
| severe pain...
|
| what's also true, which people forget, is naively modeling your
| stuff with a relational database will also cause you pain. as
| they sometimes say, normalize until it hurts, and then
| denormalize to scale and make it work
|
| the amount of places I've seen that skip the second part and
| have extremely normalized databases makes me cringe. it's like
| people think joins are free...
| pphysch wrote:
| Then your implementation can be as simple as CREATE TABLE
| documents (content JSONB);. But I suspect a PK and some
| metadata columns like timestamps will come in handy.
| amazingamazing wrote:
| _sigh_ - mongoDB is not the same as creating a table with
| jsonb. for one, you don 't have to deal with handling
| connections. that being said, postgres is great, but it's
| not the same.
| pphysch wrote:
| Postgres has ways to simplify connection management, if
| that is a blocker for you (pooling, pgbouncer, postgrest,
| etc)
| pphysch wrote:
| It's simply not that widespread of knowledge. Modern Postgres
| users would never suggest Mongo, but a generation of engineers
| was taught that Mongo is _the_ NoSQL solution, even though it
| 's essentially legacy tech.
|
| I just ran into a greenfield project where the dev reached for
| Mongo, and didn't have a good technical reason for it beyond
| "I'm handing documents". Probably wasn't aware of alternatives.
| FWIW Postgres would've been a great fit for it, they were
| modeling research publications.
| gddgb wrote:
| Um because it must be worth 2 billion if this acquisition is
| worth $220 million. I know there's rules about discussion
| quality on this site, so I guess we can't question that.
| computerfan494 wrote:
| I will copy and paste a comment I wrote here previously:
|
| "MongoDB ships with horizontal sharding out-of-the-box, has
| idiomatic and well-maintained drivers for pretty much every
| language you could want (no C library re-use), is reasonably
| vendor-neutral and can be run locally, and the data modeling it
| encourages is both preferential for some people as well as
| pushes users to avoid patterns that don't scale very well with
| other models. Whether these things are important to you is a
| different question, but there is a lot to like that
| alternatives may not have answers for. If you currently or plan
| on spending > 10K per month on your database, I think MongoDB
| is one of the strongest choices out there."
|
| I have also run Postgres at very large scale. Postgres' JSONB
| has some serious performance drawbacks that don't matter if you
| don't plan on spending a lot of money to run your database, but
| MongoDB does solve those problems. This new documentdb
| extension from Microsoft may solve some of the pain, but this
| is some very rough code if you browse around, and Postgres
| extensions are quite painful to use over the long term.
|
| The reality is that it is not possible to run vanilla Postgres
| at scale. It's possible to fix its issues with third party
| solutions or cobbling together your own setup, but it takes a
| lot of effort and knowledge to ensure you've done things
| correctly. It's true that many people never reach that scale,
| but if you do, you're willing to spend a lot of money on
| something that works well.
| thayne wrote:
| > MongoDB ships with horizontal sharding out-of-the-box
|
| Maybe it's better than it was, but my experience with Mongodb
| a decade ago is that that horizontal sharding didn't work
| very well. We constantly ran into data corruption and
| performance issues with rebalancing the shards. So much so
| that we had a company party to celebrate moving off of
| Mongodb.
| threeseed wrote:
| > my experience with Mongodb a decade ago
|
| So before the Apple Watch was released.
|
| Why is this relevant today ? Technology changes very
| quickly.
| ecshafer wrote:
| I have seen a few rather large, production mongodb deployments.
| I don't understand how so many people chose it as their basis
| of their applications. There are a not-negligible amount of
| mongodb deployments I have seen that basically treat mongodb as
| a memory dump, where they then scan from some key and hope for
| the best. I have never seen a mongodb solution where I thought
| that it was better than if they just chose any sql server.
|
| SQL or rather just some schema based database has a ton of
| advantages. Besides speed, there is a huge benefit for
| developers to be able to look at a schema and see how the
| relationships in the data work. Mongodb usually involves
| looking at a de facto schema, but with fewer guarantees on
| types relations or existence, then trawling code for how its
| used.
| orochimaaru wrote:
| We use their atlas offering. It's a bit pricey but we are very
| happy with it. It's got a bunch of stuff integrated - vectors,
| json (obviously), search and charting along with excellent
| support for drivers and very nice out of the box monitoring.
|
| Now I could possible spend a bunch of time and do the same
| thing with open source dbs - but why? I have a small team and
| stuff to deliver. Atlas allows me to do it fast.
| cpursley wrote:
| There's a ton of hosted Postgres providers that do all of
| that and more and are just as simple to use. Neon.tech is
| really easy to set up and if you need more of a baas
| (firebase alternative), Supabase. Plus, no vendor lock in.
| I've moved vendors several times, most recently AWS RDS to
| Neon and it was nearly seamless. Was originally on Heroku
| Postgres going way back. Try getting off Atlas...
| orochimaaru wrote:
| Ha - easier said than done in an enterprise, especially
| when nothing is wrong. Maybe the $$, but at some point the
| effort involved with supply chain and reengineering dwarfs
| any "technical" benefit.
|
| This is why startups like to get into a single supply chain
| contract with an enterprise - it's extremely hard to get it
| setup, but once done very easy to reuse the template.
| skatanski wrote:
| Similar here, there are gotchas though. Some versions ago
| they've changed their query optimization engine - some of our
| "slow aggregations" become "unresponsive aggregations"
| because suboptimal indexes were suddenly used. We had to use
| hints to force proper indexing. Their columnar db offering is
| quite bad - I'd say if there's need for analytical
| functionality, its better to go with a different db. Oplog
| changes format - and although its expected, it still hurts me
| every now and then when I need to check something. Similarly
| at some point they've changed how nested arrays are updated
| in changestream, which has broken our auditing (its not
| recommended to use changestream for auditing, we still did ;)
| ). We've started using NVM instances for some of our more
| heavily used clusters. Well it turned out recovery of an NVM
| cluster is much much slower than a standard cluster. But all
| in all I really like mongodb, if there are no relations - its
| a good choice. Its also good for prototyping.
| crowcroft wrote:
| If you can learn Mongo you can learn SQL and 'back end
| architecture' let's be honest the basics are hardly difficult
| no matter what tool you're using.
|
| Just because Postgres is good doesn't mean other things can't
| also be good (and better for some use cases).
| nextworddev wrote:
| Mongo is Firestore for entrprise
| hartator wrote:
| I rather them focus on performance.
|
| Last MongoDB is still slower than MongoDB 3.4. An almost 10-year
| old release. For both reads and writes.
| amazingamazing wrote:
| mongodb had consistency issues before v5 if I recall, so take
| that for what it's worth.
| memco wrote:
| Can you share more details about the conditions under which it
| is slow in recent versions? We moved from 3.x to 7 for our main
| database and after adding a few indexes we were missing we have
| seen at least an order of magnitude speed up.
| hartator wrote:
| Most regular inserts and regular selects:
| https://medium.com/serpapi/mongodb-
| benchmark-3-4-vs-4-4-vs-5...
|
| We have internally a benchmark with MongoDB 8.x, but same
| pattern of disappointing results.
| touche_bag wrote:
| I think 8 was a release purely focused on performance, with
| some big improvements. Comparing 3.4 is kinda unfair.. You were
| fast with the tradeoff of half your data missing half the time
| hartator wrote:
| That _might_ explain the write performance degradation, but
| not the reads.
| kaycebasques wrote:
| Bloomberg says it was a $220M cash & stock deal:
| https://www.bloomberg.com/news/articles/2025-02-24/mongodb-b...
| markus_zhang wrote:
| Looks like everyone is jumping into the AI game. Is there a
| bubble?
| htrp wrote:
| Voyage AI basically builds embedding models for vector search
| crowcroft wrote:
| You don't hear the big AI providers talk about embeddings much,
| but I have to believe in the long run that companies building
| SOTA foundational LLMs are going to ultimately have the best
| embedding models.
|
| Unless you can get to a point where you can make these models
| small enough that basically sit in the DB layer of an
| application...
| htrp wrote:
| That and because the embedding models are much easier to
| improve with at scale usage (hence why everyone has a deep
| search/research/RAG tool built into their AI web app now).
| connectsnk wrote:
| I understand the criticisms, but in my experience, MongoDB has
| come a long way. Many of the earlier issues people mention have
| been addressed. Features like sharding, built-in replication, and
| flexible schemas have made scaling large datasets much smoother
| for me. It's not perfect, but it's a solid choice.
| beoberha wrote:
| I think the amount of people working on large enterprise
| systems here is a lot smaller than one would think.
|
| Whenever a fly.io post about sqlite ends up in here, there are
| a scary amount of comments about using sqlite in way more
| scenarios than it should be.
| connectsnk wrote:
| True. I have that feeling many times that the enterprise
| crowd doesnt visits hacker news.
| koakuma-chan wrote:
| Why would I use anything other than sqlite?
| moralestapia wrote:
| 10x exit in a couple years, quite nice on the VC side!
|
| On the tech side ... no idea what Mongo's plan is ... their
| embedding model is not SOTA, does not even outperform the open
| ones out there, and reranking is a dead end in 2025.
|
| I think the value is on Voyage's team, their user base and having
| a vision that aligned with Mongo's.
|
| Congrats!
| touche_bag wrote:
| Interesting take. Have you benchmarked models on your own data?
| Cause at this point everything is contaminated so I find it
| impossible to tell what proper sota is. Also - most folks still
| just use openai. Last time I checked, reranking _always_
| performs better than pure vector search. And to my knowledge it
| 's still the superior fusion method for keyword and vector
| results.
| moralestapia wrote:
| In my experience, storing RAG chunks with a little bit of
| context helps a lot when doing the retrieval, then you can
| skip the whole "rerank" bit and halve your cost and latency.
|
| With embedding/generative models becoming better with time,
| the need for a rerank step will be optimized away.
| hweller wrote:
| >their embedding model is not SOTA, does not even outperform
| the open ones out there, and reranking is a dead end in 2025.
|
| Are you referring to the MTEB leaderboard? It's widely believed
| many of those test datasets are considered during the training
| of most open-source text embedding models, hence why you see
| novel + private benchmarks discussed in many launch blogs that
| don't exclusively refer to MTEB. There are problems there, and
| it would be great to see more folks in the search benchmark
| dataset production space like what Marqo AI has done in recent
| months.
|
| Also what makes you say reranking is dead? Mongo doesn't
| provide it out of the box but many other search providers like
| ES, Pinecone, Opensearch do so it must provide some value to
| their customers? Maybe you're saying it's overrated in terms of
| how many apps actually need it?
|
| disclosure: I work on vector search at Mongo
| moralestapia wrote:
| >Maybe you're saying it's overrated in terms of how many apps
| actually need it?
|
| Yes, my comment leans more towards that, rather than
| suggesting is useless.
| redwood wrote:
| Taking a step back, accuracy/quality of retrieval is
| critical as input to anything generated b/c your generated
| output is only as good as your input. And right now folks
| are struggling to adopt generative use cases due to risk
| and fear of how to control outputs. Therefore I think this
| could be bigger than you think.
| lpapez wrote:
| Is Voyage AI web-scale yet?
| jamesrr39 wrote:
| Genuine question: I appreciate the comments about MongoDB being
| much better than it was 10 years ago; but Postgres is also much
| better today than then as well. What situations is Mongo better
| than Postgres? Why choose Mongo in 2025?
| 999900000999 wrote:
| Simple.
|
| Postgres is hard, you have to learn SQL. SQL is hard and mean.
|
| Mongo means we can just dump everyone into a magic box and
| worry about it later.No tables to create.
|
| But their is little time, we need to ship our CRUD APP NOW! No
| one on the team knows SQL!
|
| I'm actually using Postgres via Supabase for my current
| project, but I would probably never use straight up Postgres.
| SEJeff wrote:
| Postgres supports JSONB natively. It literally speaks mongo
| line protocol and you can shove unstructured json into it.
|
| It has supported this since 9.4:
| https://www.postgresql.org/docs/current/datatype-json.html
| 999900000999 wrote:
| I don't necessarily agree with the above justifications,
| but in my experience this is basically why teams pick
| Mongo.
|
| It's easier to get started with.
| chpatrick wrote:
| Even as a JSON document store I'd rather use postgres with a
| jsonb column.
| riku_iki wrote:
| Mongo is real distributed and scalable DB, while postgres is
| single server DB, so main consideration could be if you need to
| scale beyond single server.
| throw14082020 wrote:
| Ahhh, this sounds familiar!
| https://www.youtube.com/watch?v=b2F-DItXtZs
| threeseed wrote:
| High availability is more important than scalability for
| most.
|
| On average an AWS availability zone tends to suffer at least
| one failure a year. Some are disclosed. Many are not. And so
| that database you are running on a single instance will die.
|
| Question is do you want to do something about it or just
| suffer the outage.
| amazingamazing wrote:
| It's sad that this was downvoted. It's literally true.
| MongoDB vs. vanilla Postgres is not in Postgres' favor with
| respect to horizontal scaling. It's the same situation with
| Postgres vs. MySQL.
|
| That being said there are plenty of ways to shard Postgres
| that are free, e.g. Citus. It's also questionable whether
| many need sharding. You can go a long way with simply a
| replica.
|
| Postgres also has plenty of its own strengths. For one, you
| can get a managed solution without being locked into MongoDB
| the company.
| threeseed wrote:
| Citus is owned by Microsoft.
|
| And history has not been nice to startups like this
| continuing their products over the long term.
|
| It's why unless it is built-in and supported it's not
| feasible for most to depend on it.
| amazingamazing wrote:
| that's fair, but that's true of mongodb itself too. I
| wouldn't count that against either of them.
| threeseed wrote:
| a) MongoDB has built-in, supported, proven scalability and high
| availability features. PostgreSQL does not. If it wasn't for
| cloud offerings like AWS Aurora providing them no company would
| even bother with PostgreSQL at all. It's 2025 these features
| are not-negotiable for most use cases.
|
| b) MongoDB does one thing well. JSON documents. If your domain
| model is built around that then nothing is faster. Seriously
| nothing. You can do tuple updates on complex structures at
| speeds that cripple PostgreSQL in seconds.
|
| c) Nobody who is architecting systems ever thinks this way. It
| is never MongoDB _or_ PostgreSQL. They specialise in different
| things and have different strengths. It is far more common to
| see both deployed.
| jeremycarter wrote:
| Great response. All arguments are valid and fair.
| scosman wrote:
| A) Postgres easily scales to billions of rows without
| breaking a sweat. After that shard. It's definitely
| negotiable.
| threeseed wrote:
| So does a text file.
|
| Statements like yours are meaningless when you aren't
| specific about the operations, schema, access patterns etc.
|
| If you have a single server, relational use case then
| PostgreSQL is great. But like all technology it's not great
| at everything.
| scosman wrote:
| The use a text file.
|
| In all seriousness, calling Postgres' scalability "not-
| negotiable for most use cases" is wild.
| delusional wrote:
| > It's 2025 these features are not-negotiable for most use
| cases.
|
| Excuse me? I do enterprise apps, along with most of the
| developers I know. We run like 100 transactions per second
| and can easily survive hours of planned downtime.
|
| It's 2025, computers are really fast. I barely need a
| database, but ACID makes transaction processing so much
| easier.
| beAbU wrote:
| Mongo is Web scale.
| koakuma-chan wrote:
| Choose Mongo if you need web scale.
___________________________________________________________________
(page generated 2025-02-24 23:01 UTC)