[HN Gopher] Mongo but on Postgres and with strong consistency be...
       ___________________________________________________________________
        
       Mongo but on Postgres and with strong consistency benefits
        
       Author : oskar_dudycz
       Score  : 150 points
       Date   : 2024-07-07 13:22 UTC (9 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | salomonk_mur wrote:
       | What would be the advantage of using this instead of simple jsonb
       | columns?
        
         | imnotjames wrote:
         | Looks like it natches the mongo node API
        
         | joshmanders wrote:
         | It uses JSONb under the hood. Just gives you a very "mongo"
         | feel to using PostgreSQL. Not sure how I feel about it.
         | CREATE TABLE IF NOT EXISTS %I (_id UUID PRIMARY KEY, data
         | JSONB)
        
           | wood_spirit wrote:
           | Can they make it use uuid7 for ids for better
           | insert_becomes_append performance?
        
             | lgas wrote:
             | Yes
        
               | oskar_dudycz wrote:
               | Yes, I'm using JSONB underneath and translating the
               | MongoDB syntax to native queries. As they're not super
               | pleasant to deal with, then I thought that it'd be nice
               | to use some familiar to many MongoDB API.
               | 
               | Regarding IDs, you can use any UUID-compliant format.
        
         | lopatin wrote:
         | jsonb isn't web scale. Mongo is web scale.
        
           | digger495 wrote:
           | I see what you did there
        
       | zulban wrote:
       | Neat. When I migrated a project from mongo to postgres I took a
       | similar approach, except I only implemented the mongo feel I
       | needed within my own project instead of building a proper library
       | as done here. I was surprised how much performance improved
       | despite using a hacky wrapper.
       | 
       | https://blog.stuartspence.ca/2023-05-goodbye-mongo.html
       | 
       | Personally tho, I plan to just drop all similarity to mongo in
       | future projects.
        
         | oskar_dudycz wrote:
         | Yup, I might not reach full compliance, but I will try to
         | follow the Pareto principle. Thanks for the link and kind
         | feedback!
        
       | joeyagreco wrote:
       | Good work! I would like to see a section on the README outlining
       | the benefits of Pongo
        
         | oskar_dudycz wrote:
         | Thanks, I'll try to cover that, good call!
        
       | Squarex wrote:
       | How does it compare with FerretDB[0]?
       | 
       | [0] https://www.ferretdb.com/
        
         | aleksi wrote:
         | (I'm FerretDB co-founder)
         | 
         | As far as I can tell, Pongo provides an API similar to the
         | MongoDB driver for Node that uses PostgreSQL under the hood.
         | FerretDB operates on a different layer - it implements MongoDB
         | network protocol, allowing it to work with any drivers and
         | applications that use MongoDB without modifications.
        
           | Keyframe wrote:
           | Even monstache?
        
             | aleksi wrote:
             | That was the first time I heard about that project. Someone
             | could check it using our guide:
             | https://docs.ferretdb.io/migration/premigration-testing/ Or
             | we will check it ourselves later:
             | https://github.com/FerretDB/FerretDB/issues/4429
        
         | Zambyte wrote:
         | The posted project looks like a client that connects to pg but
         | behaves like Mongo, where Ferret is a server that accepts Mongo
         | client connections and uses pg as backend storage.
        
         | oskar_dudycz wrote:
         | Yes, I'm using MongoDB API in Pongo to keep the muscle memory.
         | So, it's a library that translates the MongoDB syntax to native
         | PostgreSQL JSONB queries.
        
       | ramchip wrote:
       | Have you tried it with CockroachDB?
        
         | oskar_dudycz wrote:
         | I did not, but I'm not using any fancy syntax so far besides
         | JSONB operators. If it won't work, then I'm happy to adjust it
         | to make it compliant.
        
       | pipe_connector wrote:
       | MongoDB has supported the equivalent of Postgres' serializable
       | isolation for many years now. I'm not sure what "with strong
       | consistency benefits" means.
        
         | Izkata wrote:
         | > MongoDB has supported the equivalent of Postgres'
         | serializable isolation for many years now.
         | 
         | That would be the "I" in ACID
         | 
         | > I'm not sure what "with strong consistency benefits" means.
         | 
         | Probably the "C" in ACID: Data integrity, such as constraints
         | and foreign keys.
         | 
         | https://www.bmc.com/blogs/acid-atomic-consistent-isolated-du...
        
         | lkdfjlkdfjlg wrote:
         | > Pongo - Mongo but on Postgres and with strong consistency
         | benefits.
         | 
         | I don't read this as saying it's "MongoDB but with...". I read
         | it as saying that it's Postgres.
        
         | throwup238 wrote:
         | _> I 'm not sure what "with strong consistency benefits"
         | means._
         | 
         | "Doesn't use MongoDB" was my first thought.
        
         | zihotki wrote:
         | Or is it? Jepsen reported a number of issues like "read skew,
         | cyclic information flow, duplicate writes, and internal
         | consistency violations. Weak defaults meant that transactions
         | could lose writes and allow dirty reads, even downgrading
         | requested safety levels at the database and collection level.
         | Moreover, the snapshot read concern did not guarantee snapshot
         | unless paired with write concern majority--even for read-only
         | transactions."
         | 
         | That report (1) is 4 years old, many things could have changed.
         | But so far any reviewed version was faulty in regards to
         | consistency.
         | 
         | 1 - https://jepsen.io/analyses/mongodb-4.2.6
        
           | endisneigh wrote:
           | That's been resolved for a long time now (not to say that
           | MongoDB is perfect, though).
        
             | nick_ wrote:
             | I just want to point out that 4 years is not a long time in
             | the context of consistency guarantees of a database engine.
             | 
             | I have listened to Mongo evangelists a few times despite my
             | skepticism and been burned every time. Mongo is way
             | oversold, IMO.
        
           | vorticalbox wrote:
           | That is for mongo 4.x but latest stable is 6.0.7 which has
           | note More resilient operations and Additional data security.
           | 
           | https://www.mongodb.com/blog/post/big-reasons-upgrade-
           | mongod...
        
           | pipe_connector wrote:
           | Jepsen found a more concerning consistency bug than the above
           | results when Postgres 12 was evaluated [1]. Relevant text:
           | 
           | We [...] found that transactions executed with serializable
           | isolation on a single PostgreSQL instance were not, in fact,
           | serializable
           | 
           | I have run Postgres and MongoDB at petabyte scale. Both of
           | them are solid databases that occasionally have bugs in their
           | transaction logic. Any distributed database that is receiving
           | significant development will have bugs like this. Yes, even
           | FoundationDB.
           | 
           | I wouldn't not use Postgres because of this problem, just
           | like I wouldn't not use MongoDB because they had bugs in a
           | new feature. In fact, I'm more likely to trust a company that
           | is paying to consistently have their work reviewed in public.
           | 
           | 1. https://jepsen.io/analyses/postgresql-12.3
        
         | jokethrowaway wrote:
         | Have you tried it in production? It's absolute mayhem.
         | 
         | Deadlocks were common; it uses a system of retries if the
         | transaction fails; we had to disable transactions completely.
         | 
         | Next step is either writing a writer queue manually or
         | migrating to postgres.
         | 
         | For now we fly without transaction and fix the occasional
         | concurrency issues.
        
           | pipe_connector wrote:
           | Yes, I have worked on an application that pushed enormous
           | volumes of data through MongoDB's transactions.
           | 
           | Deadlocks are an application issue. If you built your
           | application the same way with Postgres you would have the
           | same problem. Automatic retries of failed transactions with
           | specific error codes are a driver feature you can tune or
           | turn off if you'd like. The same is true for some Postgres
           | drivers.
           | 
           | If you're seeing frequent deadlocks, your transactions are
           | too large. If you model your data differently, deadlocks can
           | be eliminated completely (and this advice applies regardless
           | of the database you're using). I would recommend you engage a
           | third party to review your data access patterns before you
           | migrate and experience the same issues with Postgres.
        
             | akoboldfrying wrote:
             | >Deadlocks are an application issue.
             | 
             | Not necessarily, and not in the very common single-writer-
             | many-reader case. In that case, PostreSQL's MVCC allows all
             | readers to see consistent snapshots of the data without
             | blocking each other or the writer. TTBOMK, any other
             | mechanism providing this guarantee requires locking (making
             | deadlocks possible).
             | 
             | So: Does Mongo now also implement MVCC? (Last time I
             | checked, it didn't.) If not, how does it guarantee that
             | reads see consistent snapshots without blocking a writer?
        
       | karmakaze wrote:
       | What makes mongo mongo is its distibruted nature, without it you
       | could just store json(b) in an RDBMS.
        
         | richwater wrote:
         | > store json(b) in an RDBMS
         | 
         | I actually did this for as small HR application and it worked
         | incredible well.jsonb gin indexes are pretty nice once you get
         | the hang of the syntax.
         | 
         | And then, you also have all the features of Postgres as a
         | freebie.
        
           | eddd-ddde wrote:
           | Personally, I much better like postgres json syntax than
           | whatever mongo invented.
           | 
           | Big fan of jsonb columns.
        
             | oskar_dudycz wrote:
             | I'm planning to add methods for raw JSON path or, in
             | general, raw SQL syntax to enable such fine-tuning and not
             | need to always use MongoDB API. I agree that for many
             | people, this would be better.
        
         | darby_nine wrote:
         | but then you wouldn't have the joy of using the most awkward
         | query language invented by mankind
        
         | lkdfjlkdfjlg wrote:
         | > What makes mongo mongo is its distibruted nature, without it
         | you could just store json(b) in an RDBMS.
         | 
         | Welllllllll I think that's moving the goalposts. Being
         | distributed might be a thing _now_ but I still remember when it
         | was marketed as the thing to have if you wanted to store
         | unstructured documents.
         | 
         | Now that Postgres also does that, you're marketing Mongo as
         | having a different unique feature. Moving the goalposts.
        
           | thfuran wrote:
           | It doesn't really seem reasonable to accuse someone of moving
           | goalposts that you've just brought into the conversation,
           | especially when they were allegedly set by a third party.
        
             | coldtea wrote:
             | Parent didn't "just brought them", they merely referrenced
             | the pre-existing goalposts used to advocate for Mongo and
             | reasons devs adopted it.
        
               | lkdfjlkdfjlg wrote:
               | Exactly this, very eloquent, thank you.
               | 
               | Yes, I'm still bitter because I was one of those tricked
               | into it.
        
         | zihotki wrote:
         | But RDBMS'es are often also distributed. So what is mongo now?
        
           | marcosdumay wrote:
           | People don't usually distribute Postgres (unless you count
           | read replicas and cold HA replicas). But well, people don't
           | usually distribute MongoDB either, so no difference.
           | 
           | In principle, a cluster of something like Mongo can scale
           | much further than Postgres. In practice, Mongo is full of
           | issues even before you replicate it, and you are better with
           | something that abstracts a set if incoherent Postgres (or
           | sqlite) instances.
        
             | zozbot234 wrote:
             | Postgres supports foreign data wrapper (FDW), which is the
             | basic building block for a distributed DB. It doesn't
             | support strong consistency in distributed settings as of
             | yet, although it does provide two-phase commit which could
             | be used for such.
        
               | williamdclt wrote:
               | > strong consistency in distributed settings
               | 
               | I doubt it ever will. The point of distributing a data
               | store is latency and availability, both of which would go
               | down the drain with distributed strong consistency
        
             | hibikir wrote:
             | I think of the Stripe Mongo install, as it was a decade or
             | so ago. It really was sharded quite wide, and relied on all
             | shards having multiple replicas, as to tolerate cycling
             | through them on a regular basis. It worked well enough to
             | run as a source of truth for a financial company, but the
             | database team wasn't small, dedicated to keeping all that
             | machinery working well.
             | 
             | Ultimately anyone doing things at that scale is going to
             | run a small priesthood doing custom things to keep the
             | persistence payer humming, regardless of what the
             | underlying database is. I recall a project abstracting over
             | the Mongo API, as to allow for swapping the storage layer
             | if they ever needed to
        
           | brabel wrote:
           | Often?? In my experience it's really hard to do it and still
           | maintain similar performance, which kind of voids any benefit
           | you may be looking for.
        
         | anonzzzies wrote:
         | So how easy is it to distribute it? I don't have experience
         | with it but the tutorials look terrible compared to, say,
         | Scylla, Yuga, Cockroach, TiDB etc. Again, honest question?
        
           | rad_gruchalski wrote:
           | Pongo seems to be a middleware between your app and Postgres.
           | So it will most certainly work absolutely fine on YugabyteDB,
           | if one's okay with occasional latency issues.
           | 
           | One could optimise it more for a distributed sql by
           | implementing key partition awareness and connecting directly
           | to a tserver storing the data one's after.
        
             | oskar_dudycz wrote:
             | Yes, as long as database has support to JSONB and JSON path
             | syntax (so PG 12 >= compliant) you should be good to go :)
        
               | rad_gruchalski wrote:
               | It could work:
               | https://docs.yugabyte.com/preview/explore/ysql-language-
               | feat....
        
           | theteapot wrote:
           | Does "distributed" mean sharded or just replicated? In either
           | case it's a bit quirky but easy enough.
           | 
           | > Scylla, Yuga, Cockroach, TiDB etc.
           | 
           | You have experience "distributing" all these DBs? That's
           | impressive.
        
         | rework wrote:
         | > What makes mongo mongo is its distibruted nature
         | 
         | Since when? Mongo was popular because it gave the false
         | perception it was insanely fast until people found out it was
         | only fast if you didn't care about your data, and the moment
         | you ensure write happened it ended up being slower than an
         | RDB....
        
           | jokethrowaway wrote:
           | Since forever, sharding, distributing postgres / mysql was
           | not easy. There were a few proprietary extensions. Nowadays
           | it's more accessible.
           | 
           | This was typical crap you had to say to pass fang style
           | interview "oh of course I'd use mongo because this use case
           | doesn't have relations and because it's easy to scale", while
           | you know postgres will give you way less problems and allow
           | you to make charts and analytics in 30m when finance comes
           | around.
           | 
           | I made the mistake of picking mongo for my own startup,
           | because of propaganda coming from interviewing materials and
           | I regretted it for the entire duration of the company.
        
       | posix_monad wrote:
       | Does MongoDB have serious market share compared to DynamoDB (and
       | similar clones from Azure, GCP) at this point?
        
         | dudeinjapan wrote:
         | Totally. Many of the biggest tech companies are using for core
         | use cases. Stripe uses a modified version:
         | https://stripe.com/blog/how-stripes-document-databases-suppo...
         | 
         | We use MongoDB's cloud offering called Atlas as our core DB at
         | TableCheck.
        
       | cpursley wrote:
       | Thanks, just added Pongo to the NoSQL section of my "Postgres Is
       | Enough" gist:
       | 
       | https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...
        
         | oskar_dudycz wrote:
         | Thank you!
        
       | revskill wrote:
       | Genius.
        
         | oskar_dudycz wrote:
         | <3
        
       | hdhshdhshdjd wrote:
       | I use JSONB columns a lot, it has its place. It can fit certain
       | applications, but it does introduce a lot of extra query
       | complexity and you lose out on some ways to speed up query
       | performance that you could get from a relational approach.
       | 
       | Which is to say JSONB is useful, but I wouldn't throw the
       | relational baby out with the bath water.
        
         | oskar_dudycz wrote:
         | I'm planning to add possibility to use Generated Columns in the
         | future https://www.postgresql.org/docs/current/ddl-generated-
         | column... to allow more optimisations.
        
         | doctor_eval wrote:
         | I've been doing some reasonably serious playing with the idea
         | of using jsonb columns as a kind of front end to relational
         | tables. So basically, external interactions with the database
         | are done using JSON, which gives end users some flexibility,
         | but internally we effectively create a realtime materialised
         | view of just those properties we need from the json.
         | 
         | Anyone else tried this approach? Anything I should know about
         | it?
        
           | hdhshdhshdjd wrote:
           | I do something similar, building a lightweight search index
           | over very large relational datasets.
           | 
           | So the tables are much simpler to manage, much more portable,
           | so I can serve search off scalable hardware without
           | disturbing the underlying source of truth.
           | 
           | The downside is queries are more complex and slower.
        
           | Zenzero wrote:
           | My mental model has always been to only use JSONB for column
           | types where the relations within the json object are of no
           | importance to the DB. An example might be text editor markup.
           | I imagine if you start wanting to query within the json
           | object you should consider a more relational model.
        
       | 314156 wrote:
       | https://docs.oracle.com/en/database/oracle/mongodb-api/mgapi...
       | 
       | Oracle database has had a MongoDB compatible API for a few years
       | now.
        
         | cyberpunk wrote:
         | And it only costs 75k a seat per year per developer, with free
         | bi yearly license compliance audits, a million in ops and
         | hardware to get near prod and all the docu is paywalled. What a
         | deal!
        
           | slau wrote:
           | A client had a DB hosted by Oracle. The client was doing most
           | of their compute on AWS, and wanted to have a synchronised
           | copy made available to them on AWS. Oracle quoted them a cool
           | $600k/year to operate that copy, with a 3 year contract.
           | 
           | DMS + Postgres did it for $5k/year.
        
           | 314156 wrote:
           | Maybe you are unaware of this?
           | https://www.oracle.com/cloud/free/
        
       | rework wrote:
       | Looks sort of like MartenDB but trying to minic mongo api, unsure
       | why anyone would want to do that... mongo api is horrible...
        
         | JanSt wrote:
         | Wouldn't that allow to switch from Mongo to Postgres without
         | having to rewrite all of your app?
        
           | oskar_dudycz wrote:
           | Hint: I'm an ex-Marten maintainer, so the similarity is not
           | accidental ;)
           | 
           | As Op said, not needing to rewrite applications or using the
           | muscle memory from using Mongo is beneficial. I'm not
           | planning to be strict and support only MongoDB API; I will
           | extend it when needed (e.g. to support raw SQL or JSON Path).
           | But I plan to keep shim with compliant API for the above
           | reasons.
           | 
           | MongoDB API has its quirks but is also pretty powerful and
           | widely used.
        
             | rework wrote:
             | Oh, so you are, then we can rest assured this will end up
             | being a solid project!
             | 
             | I personally can't stand mongodb, its given me alot of
             | headaches, joined a company and the same week I joined we
             | lost a ton of data and the twat who set it up resigned in
             | the middle of the outage. Got it back online and spend 6m
             | moving to postgresql.
        
       | Tao3300 wrote:
       | Ditch the dalmatian before Disney rips your face off.
        
       | harel wrote:
       | I regularly find the hybrid model is a sweet spot. I keep core
       | fields as regular columns and dynamic data structures as JSONB.
       | It brings the best of both worlds together.
        
         | Waterluvian wrote:
         | I do this too with Postgres and it is just the best of both.
         | 
         | A robot is a record. A sensor calibration is a record. A
         | warehouse robot map with tens of thousands of geojson objects
         | is a single record.
         | 
         | If I made every map entity its own record, my database would be
         | 10_000x more records and I'd get no value out of it. We're not
         | doing spatial relational queries.
        
       | willsmith72 wrote:
       | It's technologically cool, but I would love a "why" section in
       | the README. Is the idea you're a mongo Dev/love the mongo api and
       | want to use it rather than switch to pg apis? Or want to copy
       | some code over from an old project?
       | 
       | I'm sure there are use cases, I'm just struggling to grasp them.
       | Especially if it's about reusing queries from other projects, AI
       | is pretty good at that
        
       ___________________________________________________________________
       (page generated 2024-07-07 23:00 UTC)