[HN Gopher] Database mocks are not worth it
       ___________________________________________________________________
        
       Database mocks are not worth it
        
       Author : shayonj
       Score  : 75 points
       Date   : 2024-12-30 20:12 UTC (2 hours ago)
        
 (HTM) web link (www.shayon.dev)
 (TXT) w3m dump (www.shayon.dev)
        
       | javcasas wrote:
       | For tests, first use a real database. If you can't because the
       | database just doesn't want to (ahem mongo ahem) then use a fake.
       | If you can't use a fake, stop lying and use a fake.
       | 
       | Mocks for databases are extremely brittle and complicated.
        
         | btilly wrote:
         | Ahem MongoDB? I must admit, I've never understood what
         | MongoDB's real use case is supposed to be. Whenever I've looked
         | at it, there has always been a better alternative. When I've
         | had to use it, I've had to debug performance problems that
         | wouldn't have existed with alternative solutions.
         | 
         | It sounds cool. But running software isn't about sounding cool.
         | 
         | A decade ago, it was really clear. As
         | https://aphyr.com/posts/284-call-me-maybe-mongodb explains,
         | MongoDB didn't really work. But they've fixed that. So now it
         | runs acceptably accurately. I just don't know why I'd ever want
         | to.
        
           | karmakurtisaani wrote:
           | In my company they use it to store a large number of varying
           | sized jsons. You can create indexes to make search queries
           | lighting fast. It's also been extremely stable.
           | 
           | I wasn't involved in setting it up tho, so can't say anything
           | about how difficult it is to work with on the technical side.
        
           | dakiol wrote:
           | I don't know. Like any other tool, it depends on how you use
           | it. I worked for a unicorn with presence in multiple
           | countries and they were using Mongo as main db in multiple
           | microservices. Around 1500 engineers. It worked fine. I'm not
           | saying it was justified, but never had perf. issues.
        
             | nerdponx wrote:
             | We had some pretty bad performance with $lookup queries but
             | it all magically went away after adding some indexes. I
             | have a lot of grievances with teams that use MongoDB, but
             | relatively few grievances with MongoDB itself along these
             | lines.
        
           | nerdponx wrote:
           | I used MongoDB at a company where engineering policy was
           | strictly that MongoDB was our only allowable database,
           | including in cases where it was clearly not the best choice.
           | 
           | Were there good aspects? Sure... kind of. It was _super
           | super_ easy to just throw data into the database. Need to add
           | a field? Who cares, just add it. Need to remove a field? Who
           | cares, just remove it -- as long as the calling code is null-
           | safe. And it was likewise super easy to store denormalized
           | data for fast lookups when it made sense to do so, as well as
           | deeply-nested things and arbitrary JSON blobs synced from our
           | CMS. And queries could be stored and manipulated as actual
           | Python dicts instead of opaque strings, whereas you normally
           | need an ORM or query builder to do that in SQL. And you
           | _could_ get the best-of-both-worlds-ish with the ODMantic
           | framework, where you could spit out a warning (but not an
           | error) and recover gracefully if there happened to be bad
           | data in the database.
           | 
           | Basically it allows you to forego any foresight or serious
           | design, and instead just throw shit together. Which is great
           | for getting a prototype going really fast and giving the
           | appearance of a highly productive team. That is, until you
           | run out of new microservices to prototype and now you have to
           | start adding features to existing code and fixing bugs. IMO
           | it was completely "penny-wise and pound-foolish" with respect
           | to developer time, but I can at least see the appeal if
           | you're a particular type of engineer operating under a
           | particular set of incentives.
           | 
           | As for using MongoDB for things it was _actually_ meant for
           | (writing a ton of schemaless JSON blobs really fast and
           | figuring out reads later), I have no idea because we never
           | used it for that and had nothing really resembling that use
           | case.
        
           | lelandbatey wrote:
           | I worked full time on a pretty seriously-trafficed product
           | based on MongoDB for 3 years and I still don't know of
           | anywhere I'd want to use MongoDB. I'd basically always want
           | _either_ a DB with a schema _or_ a super fast opaque-store
           | style cache.
           | 
           | Also, their hosted offerings (MongoDB Atlas) were not well
           | operated and they took down our company for 2 days. Our
           | MongoDB instance stopped accepting new connections and
           | restarting our instance via their control panel didn't bring
           | it back up and then their support literally said "we don't
           | know why this happened or how to fix it" for like a day and a
           | half, while we were on their highest tier ultra platinum
           | support contract. Ultimately, we had to do 24 hours of
           | research and then tell their own support how to fix the
           | problem using their special shell access. If I recall
           | correctly, it was some issue with WiredTiger (an internal
           | component of the Mongo version we were using).
           | 
           | After that experience, I'd never use anything produced by
           | MongoDB for anything; we moved everything that we possibly
           | could out of Mongo and into more traditional RDBMS
           | (PostgreSQL) and never had to deal with issues like that
           | again.
        
           | computerfan494 wrote:
           | MongoDB ships with horizontal sharding out-of-the-box, has
           | idiomatic and well-maintained drivers for pretty much every
           | language you could want (no C library re-use), is reasonably
           | vendor-neutral and can be run locally, and the data modeling
           | it encourages is both preferential for some people as well as
           | pushes users to avoid patterns that don't scale very well
           | with other models. Whether these things are important to you
           | is a different question, but there is a lot to like that
           | alternatives may not have answers for. If you currently or
           | plan on spending > 10K per month on your database, I think
           | MongoDB is one of the strongest choices out there.
           | 
           | Also want to add that you can definitely use MongoDB (or any
           | other database) in a way that doesn't scale well. I have
           | personally run MongoDB at petabyte scale and had a relatively
           | great experience.
        
           | javcasas wrote:
           | Sometimes you cannot choose. Sometimes you are handed down
           | some decisions already made, and reverting them may have a
           | cost the business doesn't want to pay.
        
       | btilly wrote:
       | I've found that replacing the database with in memory SQLite for
       | tests is a sweet spot. Almost as fast as a mock, catches a lot of
       | database issues. And it's really easy to do if you're using
       | something like Django that makes automatically generating
       | database migrations easy.
       | 
       | It won't help you with database specific differences. But there
       | should be very few of those if you're using a framework that
       | abstracts away the database. Like Django.
        
         | rikthevik wrote:
         | > But there should be very few of those if you're using a
         | framework that abstracts away the database.
         | 
         | But I really want that database-specific behaviour. :)
         | PostgreSQL does so many amazing things (recursive CTEs, jsonb,
         | etc) that actively make our system better. If there was a fork
         | of Django that optimized for leveraging advanced postgres
         | features, I'd use it.
        
           | ariofrio wrote:
           | Does something like PGlite work for your use case?
           | https://pglite.dev/
        
             | devoutsalsa wrote:
             | I want test with the same API as my database, which means
             | using the same database. Setting up an instance of
             | Postgres/MySQL/whatever isn't hard. Maybe harder if you're
             | using an online database that doesn't fit into a container,
             | but that's a different problem.
        
             | smackeyacky wrote:
             | You can just simply run postgres, why bother with pglite?
             | 
             | postgres installs easily on WSL2 or whatever Linux
             | distribution you're using.
        
             | samwillis wrote:
             | Unit tests work well with PGlite, at least in the TS/JS
             | world where it natively sits at the moment. You can have a
             | unique Postgres instance for each unit test that's up and
             | running in just a few ms.
             | 
             | It's possible to use PGlite from any language using native
             | Postgres clients and pg-gateway, but you lose some of the
             | nice test DX when it's embed directly in the test code.
             | 
             | I'm hopeful that we can bring PGlite to other platforms,
             | it's being actively worked on.
             | 
             | The other thing I hope we can look at at some point is
             | instant forks of in memory databases, it would make it
             | possible to setup a test db once and then reset it to a
             | known state for each test.
             | 
             | (I work on PGlite)
        
           | zeroonetwothree wrote:
           | Sqlite supports recursive CTE
        
             | rikthevik wrote:
             | I think you're missing the point.
             | 
             | I want to develop on postgres and test on postgres because
             | I run postgres in production and I want to take advantage
             | of all of its features. I don't understand why a person
             | would 1) develop/test on a different database than
             | production or 2) restrict one's self to the lowest common
             | denominator of database features.
             | 
             | Test what you fly, fly what you test.
        
             | maxbond wrote:
             | And JSONB, but you can't insert or update in a CTE like you
             | can with Postgres.
        
         | pdimitar wrote:
         | I don't know what kind of magic fairy dust Django is but I've
         | found the differences between SQLite and PostgreSQL too big to
         | be worth it, in at least 3 other programming languages that are
         | not Python.
         | 
         | Sounded good at first but we were quickly overwhelmed with
         | false positives and just opted for Postgres in a VM (this was
         | before Docker was a thing).
        
           | herpdyderp wrote:
           | I've discovered the same thing: can't use SQLite to mock
           | PostgreSQL. PGLite looks super promising for this, but I
           | haven't tried it yet at scale.
        
           | phendrenad2 wrote:
           | This is a good thing to be aware of when choosing a database,
           | but I think most of the time people just reach for Postgres
           | because it's the "standard".
        
             | pdimitar wrote:
             | PostgreSQL stops many data bugs at the door due to being so
             | strict -- something many programmers start rediscovering is
             | a good thing by choosing stricter programming languages
             | with time.
             | 
             | I love SQLite to bits but the test harness I have to put
             | around my apps with it is a separate project in itself.
        
               | phendrenad2 wrote:
               | Somehow this is my first time hearing about that benefit
               | of Postgres! (despite having googled for explanations of
               | why people were switching to it). Being on HN proves more
               | valuable than google once again...
        
               | pdimitar wrote:
               | Glad to help!
               | 
               | On a philosophical / meta level it's all quite simple: do
               | your damnedest for the computer to do as much of your
               | work for you as possible, really. Nothing much to it.
               | 
               | Strict technologies slap you hard when you inevitably
               | make a mistake so they do in fact do more of your work
               | for you.
        
         | chrisandchris wrote:
         | I don't know...
         | 
         | How if you rely e.g. on CASCADE and foreign keys, which are not
         | on by default kn SQLite? I think then things start getting
         | complicated and testing that layer gets difficult.
        
         | 3pt14159 wrote:
         | I'm surprised you say so btilly. I've found the differences in
         | between SQLite and Postgres to be large enough to bring up
         | issues that are hard to find. Timestamp handling, non-trivial
         | indexes, etc.
        
         | DanielHB wrote:
         | The problem with this approach is that SQLite dialect is not
         | the same as most production setups. Even if you use an ORM you
         | often have manual queries or you are using a feature that is
         | only supported in some databases (like geospatial queries)
        
           | isbvhodnvemrwvn wrote:
           | And sqlite is _extremely_ lax by default (and even non-
           | defaults are not great).
        
         | jmmv wrote:
         | Came here to say this: I've found that using SQLite for tests
         | is a good sweet spot, and maintaining dual DB implementations
         | ensures my business logic remains generic. But there is no
         | replacement for running the tests agains the real database
         | periodically (maybe only in CI for example).
         | 
         | I wrote about this and some more in
         | https://jmmv.dev/2023/07/unit-testing-a-web-service.html
        
         | MetaWhirledPeas wrote:
         | > I've found that replacing the database with in memory SQLite
         | for tests is a sweet spot
         | 
         | Does that cover the first two examples brought up by the
         | article? Constraint violations and default value handling.
        
       | senex wrote:
       | Does anyone have experience making tests against real databases
       | fast?
       | 
       | I resonate with the sentiment of this article, but have struggled
       | to find an alternative that's fast enough as the test suite
       | grows, isn't flakey in CI, and is able to share the production
       | schema definition for relevant relations.
       | 
       | I'd love to hear more from anyone that's solved for some of these
       | constraints!
        
         | lbriner wrote:
         | A real database should not be slow. Even with our tests running
         | against a hosted SQL Server on a separate server, the database
         | is never the slow part. For other tests, we run with the same
         | database in a local Docker container with Docker Compose and it
         | is fast and isolated/resettable.
         | 
         | Most tests should be unit tests, which are super fast.
         | Integration and UI tests that might use the database should be
         | fewer and if the database is slow, it might be related to your
         | specific application or unoptimized database queries, our
         | database calls are usually < 10ms
        
         | pdimitar wrote:
         | On my machine -- which is quite middle range at this point, not
         | even high end -- I get by just fine up to 3000 tests, with
         | Elixir at least. When I was at that contract the ~3200 tests
         | ran in something like 40 seconds.
         | 
         | What kinds of troubles do you have with using a real DB for
         | testing?
        
         | shayonj wrote:
         | Doesn't directly answer your question but at least in Postgres
         | I am curious about UNLOGGED mode and see if it results in
         | faster specs. Trade off being, crash recovery doesn't work but
         | that's fine in CI.
         | 
         | There is also something to be said about keeping database
         | transactions atomic (no 3rd party network calls, etc) to keep
         | flakey specs to none. I have some ad hoc thoughts on this, will
         | try to frame it proper in a post.
        
         | TobiasBales wrote:
         | I think one helpful thing is what rails calls transactional
         | tests
         | (https://guides.rubyonrails.org/testing.html#transactions). It
         | basically does the database setup (migrations and seeds) and
         | then executes the tests in a transaction that is rolled back at
         | the end (and thus never committed). This helps with speed and
         | also ensuring that tests don't have as much accidental
         | codependence as they might otherwise.
         | 
         | If you use read replicas in production code this can become
         | tricky though since the transactions don't commit they never
         | become visible to the reader or even different connections to
         | the same database
        
         | avg_dev wrote:
         | Don't know what language or database you use, but check this
         | out: https://github.com/peterldowns/pgtestdb
         | 
         | If you happen to use Postgres, the approach is ultimately
         | portable: it uses Pg database templates (also, regarding perf,
         | the author recommends using a ramdisk and turning off fsync on
         | your test DBs; you'll see this in the project readme). But
         | you'll have to write the code yourself.
        
           | thom wrote:
           | This is what I do, it has an overhead of about 10-20ms per
           | test and I've had zero flakiness. Absolute no brainier from
           | my point of view.
        
         | peibye wrote:
         | Depends on how fast you need them to be. I've had success using
         | docker postgres , set up the same as production, where each
         | test creates a connection to it's own test specific database,
         | or is written to run inside a transaction that can clean up
         | after itself using a shared database. In my experience, slower
         | tests that use a real db almost always save net positive dev
         | time.
        
         | jayd16 wrote:
         | Setup tooling to start postgres in a local docker container and
         | point at that. Works very well and there are some libraries
         | that abstract it for you.
         | 
         | I think it was Testcontainers.
        
         | maxmcd wrote:
         | Tests are usually embarrassingly parallel. Instead of creating
         | one test db (app_test) create many (app_test_0, app_test_1,
         | ...). Run tests in many threads/processes, db per thread.
         | 
         | This works in a lot of cases. In some cases this might not
         | address your bottleneck.
         | 
         | Also someone should write a real, performant, in-memory
         | postgres storage driver. Then we can all be happy (with pg at
         | least).
        
         | theogravity wrote:
         | We use Testcontainers (https://testcontainers.com/) in our
         | node.js / vitest / kysely stack. Really fast to spin up a
         | temporary postgres instance, and we use kysely migrations to
         | init and seed the db.
        
         | nerdponx wrote:
         | What I've done is make a clone of the real database, with a
         | sample of data that has enough variety/size to test whatever it
         | is you need to test, but no bigger. It definitely takes some
         | thinking, planning, and writing of code, but it's worth doing.
         | 
         | Unfortunately I maintain an app where the database (read-only)
         | is Snowflake, and being more of a "warehouse" than "database"
         | there's always a lot of overhead in running any query at all.
         | Even just `select 1` can take a few seconds. So there's only so
         | much you can do with that, but setting up your data so that
         | tests can be parallelized helps as well.
         | 
         | However your tests against a proper OLTP database should be
         | plenty fast, unless your app itself is slow or your test
         | fixtures require some really complicated setup.
        
         | fatbird wrote:
         | Not sure if this is still a valid approach, but on a large
         | Django site with a lot of unit tests, the cumulative
         | setup/teardown/reset cycles was killing us. We found that
         | setting each test to be wrapped with a transaction that was
         | aborted on teardown, caused the per test cleanup to drop
         | radically. We also kept a canned database for testing so that
         | running the test suite didn't have a large startup penalty to
         | populate the database with test fixtures. Keeping that ready db
         | between runs also sped things up a lot for devs.
        
         | 3pt14159 wrote:
         | Tests are embarrassingly parallel. If you can split up the load
         | for CI then at the limit the slowest response is your slowest
         | running test. I wish more tooling would expose this out of the
         | box, but with some effort it is possible.
        
         | dinosaurdynasty wrote:
         | https://eradman.com/ephemeralpg/ (for Postgres) plus making
         | sure your test suite is parallel works wonders.
        
       | pdimitar wrote:
       | I thought this was common knowledge and that it became even
       | easier after Docker became a thing?
       | 
       | Mocks are wishful thinking incarnate most of the time, though
       | here and there they are absolutely needed (like 3rd party APIs
       | without sandbox environments, or quite expensive API, or most of
       | the time: both).
       | 
       | Just pick a task runner -- I use just[0] -- and make a task that
       | brings up both Docker and your containers, then run your test
       | task, done. Sure it's a bit fiddly the first time around but I've
       | seen juniors conquer that in a day maximum and then your tests
       | actually work with the real world 99% of the time.
       | 
       | Mocks in general are rarely worth it, the DB ones: 10x so.
       | 
       | [0] https://github.com/casey/just
        
         | shayonj wrote:
         | +1 for this being common knowledge but there is still decent
         | bit mocking that happens IME.
        
           | withinboredom wrote:
           | Mocks are great for testing specific code paths, but if you
           | need to do that, there is usually a better way of doing it.
           | Mocks hide contractual incongruities. For example, a function
           | that returns either "fizz" or "buzz" that gets mocked out. If
           | it gets changed to also return "bar", and you forget to
           | update the mock, you've got a time-bomb in your code.
        
             | pdimitar wrote:
             | Completely agreed, that's why I have advocated for
             | periodically (once a month) revisiting the test suites with
             | them by doing re-captures of real data. There are
             | frameworks that help with this.
             | 
             | Is it a chore? Absolutely. But peace of mind is important.
        
         | fcmgr wrote:
         | I've had good experience with testcontainers
         | (https://testcontainers.com/) to do that sort of thing.
        
           | kkarimi wrote:
           | Yes it's a shame that this is not leading the discussion,
           | works very well and surprisingly efficient in our experience
           | compared to manually setting up docker containers or docker
           | compose
        
           | donjoe wrote:
           | testcontainers is great. I struggled a bit with
           | testcontainers due to the nature of one container per test
           | which just felt too slow for writing gray/blackbox tests. The
           | startup time for postgres was > 10 seconds. After a bit of
           | experimenting, I am now quite happy with my configuration
           | which allows me to have a snappy, almost instant testing
           | experience.
           | 
           | My current setup:
           | 
           | - generate a new psql testcontainer _or_ reuse an existing
           | one by using a fixed name for the container - connect to the
           | psql container with no database selected - create a new
           | database using a random database name - connect to the
           | randomly generated database - initialize the project's tables
           | - run a test - drop the database - keep the testcontainer up
           | and running and reuse with next test
           | 
           | With this setup, most tests run sub-second;
        
         | nerdponx wrote:
         | I've had a nice experience using Docker Compose for this,
         | although you might still want a task runner wrapper around it.
         | Podman Compose should work fine as well.
        
           | pdimitar wrote:
           | Exactly what I do. Having a local `docker-compose.yml` that
           | helps bring up a dev and test environments is one of my first
           | action items when hired. Paid huge dividends, many times.
        
       | Deukhoofd wrote:
       | Yeah, I ran in very much the same issues. Setting up a database
       | for unit tests can be even more of a bother though (especially
       | for CI pipelines, I don't want to have to run a full database
       | there), so I took a middle road, and use my ORM to throw up a
       | temporary in-memory SQLite database that is mostly similar to our
       | actual database. Each unit tests scaffolds its own database, and
       | deletes it when it's done.
       | 
       | That allows me to somewhat mock the database within unit tests,
       | while still being able to test database-specific errors that
       | might occur, like foreign key errors.
        
         | pdimitar wrote:
         | Why a separate DB for each test? Just have only one DB and each
         | test opens a transaction and then rolls it back when it's done.
         | That way you also achieve isolation of tests so they don't
         | interfere with each other.
        
           | Deukhoofd wrote:
           | The code being tested also uses transactions internally at
           | times, so it'd mean additional complexity in the code being
           | tested to allow for unit testing, which is not great. In my
           | experience throwing up a database including all tables in an
           | in-memory SQLite db is extremely fast, so it's not really a
           | major concern.
        
             | pdimitar wrote:
             | OK, but is your production DB also SQLite? If not, I would
             | not. I found the differences between it and PostgreSQL too
             | big and was getting too many false positives.
             | 
             | Also code complexity on a few levels of recursion of
             | transactions is an easy thing to abstract away with almost
             | zero performance penalty -- depending on your programming
             | language of choice.
        
               | Deukhoofd wrote:
               | The production DB is MSSQL, and we scaffold it in SQLite
               | through EF Core. The resulting SQLite DB is close enough
               | to our production DB that we are able to catch invalid
               | defaults, missing foreign keys, etc. in unit tests
               | instead of later on in our testing pipeline, which helps
               | massively in accelerating our development. It could be
               | even better if SQLite would actually tell you which
               | foreign key constraint failed instead of its somewhat
               | unhelpful 'foreign key constraint failed' error, but as
               | it is we at least know _something_ is wrong in our code.
               | 
               | And sure, we could probably refactor it to use
               | transactions to shave a few seconds of running our test
               | suite, but it'd add some additional mental complexity to
               | our codebase to do so. In general, I prefer to keep the
               | actual running code as simple as possible, and have any
               | complexity that is required for unit tests be handled by
               | the unit tests. By just recreating the database the unit
               | tests currently handle all that complexity.
               | 
               | The idea is definitely not a bad one though, and if your
               | scaffolds are big enough to actually cause performance
               | issues with your unit tests it might definitely be a
               | consideration to not recreate the database every time.
        
               | pdimitar wrote:
               | Well, if you found a productive workflow then who am I to
               | judge, right?
               | 
               | However, if I was hired into your team tomorrow you'll
               | have to fiercely fight with me over this:
               | 
               | > _And sure, we could probably refactor it to use
               | transactions to shave a few seconds of running our test
               | suite, but it 'd add some additional mental complexity to
               | our codebase to do so._
               | 
               | Various languages and frameworks demonstrate that
               | abstracting this away and using convenient wrappers is
               | more or less trivial, and definitely a solved problem.
               | And the resulting code is simple; you are (mostly) none
               | the wiser that you are actually using a temporary
               | transaction that would be ultimately rolled back.
               | 
               | ...Though that is extremely sensitive to which
               | programming language and DB library you are using, of
               | course.
               | 
               | Your approach works, so keep using it but IMO it's a
               | temporary one and it could stop being fit-for-purpose
               | Soon(tm). You yourself are already aware of its
               | limitations so just keep an eye out for this moment and
               | be decisive when the time comes to migrate away from it.
               | 
               | My $0.02.
        
       | worik wrote:
       | This risks unleashing a bit of a "religious" war.
       | 
       | I agree that tests should occur in the environment they are to be
       | run in, when ever possible. Otherwise not...
       | 
       | There is no science in testing. There are no standardised
       | definitions. We agree it is good, we often ignore the downsides
       | (they exist - there are costs to tests)
       | 
       | I have struck examples where the testing frameworks of a language
       | (Dart in this case) ruled out using a live HTTPS client to test.
       | I kept getting 404 errors running it against my (test) site. I
       | made a bug report. The good folks at Google (who I am sure are
       | very smart - if a little stupid) had decided that the HTTPS
       | client would always return 404 when run as part of the testing
       | framework.
       | 
       | I argued, but apparently I am not so smart (I am less stupid,
       | IMO). The HTTPS client testing had to either occur outside the
       | test framework, or I rewrite the HTTPS client, or I wrote a mock
       | up of some sort, all things I would have to pay for with time and
       | effort - for no gain.
       | 
       | I get the point about mocking. But I am in business. I am not
       | interested in always testing in isolation, (sometimes, sure), I
       | want my tests to depend on the other parts of my business.
       | 
       | If the database goes wrong and my Gui tool test discovers it,
       | that is a good thing, not a bad thing
        
       | nerdponx wrote:
       | This is one of many reasons why "isolating" I/O within your
       | application is so important, whether that means pushing it to the
       | edges or otherwise carefully corralling it in known-critical
       | areas where it can't be removed.
       | 
       | A static type system that assists with that design style is nice
       | as well. It's really useful to be able to look at a function and
       | know right away what I/O it performs, if any. This is probably my
       | #1 missing feature in Python, now that static type hints work
       | well enough for most purposes.
       | 
       | That said, setting up and maintaining a database test harness in
       | the developer environment can be really annoying and time
       | consuming. Even moreso when your database is a proprietary cloud
       | thing (e.g. Snowflake). But I have never ever regretted spending
       | the time to set it up, whereas I have definitely regretted _not_
       | having it.
       | 
       | Sometimes it's unavoidable that your "unit" tests need to access
       | the database to be able to properly work out the "unit" under
       | test. Watching those tests go from failing to passing is such a
       | sweet feeling, it makes the pain of maintaining the test harness
       | feel worthwhile. And then of course when you hit the actual
       | integration tests in CI, everything almost always passes, because
       | your unit tests are that good.
        
       | andy_ppp wrote:
       | As per usual Elixir does this correctly and even has a setup that
       | allows all tests to run against the database in a pristine way in
       | parallel:
       | https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.Sandbox.html
        
         | napsterbr wrote:
         | A hobby project of mine (in Elixir) uses SQLite as primary
         | database. Each test runs in its own fully isolated SQLite
         | database. No mocking (or transaction rolling back) needed. Most
         | of these tests take less than 1ms to run (and when they take
         | longer, it's because of something else).
         | 
         | This kind of setup makes the usual Ecto Sandbox approach feel
         | slow, but I do agree that the way Elixir approaches this is
         | great!
        
         | brightball wrote:
         | Yea, I've been working on a side project for a while and I keep
         | building it with Rails because there's so much about Rails that
         | speeds up this particular type of project...
         | 
         | But the testing setup in Elixir is just exemplary compared to
         | everything else I've worked with. I fight myself daily on just
         | rebuilding the entire project in Elixir.
        
       | mrkeen wrote:
       | Two points:
       | 
       | 1) The subtle issues with constraint violations, default values
       | and indexes are part of the RDBMS, not the testing strategy. The
       | article suggests stepping on these rakes in test so that you
       | hopefully don't step on them in prod. Another way to avoid them
       | in prod (and to also regain the speed/reliability/simplicity of
       | your test suite) is to not use an RDBMS in prod.
       | 
       | 2) If you want to persist something, ask yourself if what you
       | really have is tabular data. Do you have a lot of nulls and/or
       | default values? Catch-all json fields? You probably have objects,
       | not relations. Postgres may have a lot of wonderful features that
       | relate to relations, and an ORM might have a lot of wonderful
       | features (which appear to save you time). But if you stop
       | requiring your objects to be relations at rest, you don't need
       | the features, which is even faster.
        
         | Volundr wrote:
         | > constraint violations
         | 
         | You call these rakes and suggest removing them from prod, I
         | call these vital safeguards. Defining constraints at the
         | database layer gives me confidence that these will truly be
         | invariants, and I won't one day discover that I've got millions
         | of rows in my database missing critical data, with no way of
         | knowing what should be there. Been there, done that, no desire
         | to revisit it.
        
           | mrkeen wrote:
           | My coworkers call it "data integrity". At best it's a local
           | maximum for consistency. Right now it's how we reject true
           | facts about the business. Our partner bank tells us a
           | customer deactivated? Our database says "no", someone gets a
           | 500, and we get a support ticket when the customer can't sign
           | up again (because they're still in our system).
        
       | seeknotfind wrote:
       | This is not only about databases. Mocks are almost always a
       | compromise.
        
       | Lvl999Noob wrote:
       | I faced problems with flaky unit tests that used a common unit
       | test database for the whole project. Since all the tests ran at
       | once, the tests would sometimes fail because of concurrency
       | issues. I never got the time to look too deeply into it.
       | 
       | I wonder if there's a database that works sorta like git, giving
       | one "commit" point at the start of every test and then "branching
       | off" so each test can do its own thing, make transactions, commit
       | them, whatever, and finally be able to roll the whole thing back.
       | 
       | If that can be done then each unit test can use a full database
       | clone with actual valid data and not just test data that was fast
       | enough to set up before the test ran.
        
         | shayonj wrote:
         | Starting each test in its own transaction and then rolling back
         | at the end, and ensuring no 3rd party network calls in
         | application logic in rails and Postgres apps have worked with a
         | lot of success in my experience.
        
         | whatnotests2 wrote:
         | Transaction per test is how that works in ruby on rails. Also,
         | the rspec parallel gem will create N test databases for running
         | tests - one per process.
        
           | masklinn wrote:
           | That doesn't work if you need to commit for one reason or an
           | other. And while in some cases you might be able to
           | substitute subtransactions via some sort of test mode, it's
           | not always an option depending what you're doing / testing
           | exactly.
        
         | masklinn wrote:
         | Dupe the database. That's what I do in one of my test suites
         | (though that's in part because the tests happen across 4
         | different processes so it's just not possible to keep it within
         | one transaction).
         | 
         | Creating a db from a template is not free, but if there's not
         | too much stuff in the db it's not horrendous either. And if
         | you're using sqlite, you might be able to backup an on-disk
         | template to an in-memory test db.
        
         | Volundr wrote:
         | Datomic more or less works like this. I wasn't terribly
         | impressed in general, but it did make tests a breeze.
         | 
         | Unless you use Clojure though it's probably not a real option.
        
       | kerblang wrote:
       | Was dealing with mocking voodoo garbage this morning as a result
       | of fixing a bug. What a horrible mess just to claim a few lines
       | of "test coverage" while ignoring critical db functionality. I
       | wouldn't mind if the mocking frameworks weren't absurdly complex
       | in their own right.
       | 
       | Unfortunately our "integration" tests are even worse, corrupting
       | data all over and flaky as hell.
        
       | isotropy wrote:
       | I've seen a db mock work when 1) there was a small subteam in
       | charge of the OR-mapping, schema structure, and included a DBA;
       | and 2) also a design policy from the architect that all objects
       | had to come out of factories. Under those specific circumstances,
       | having the mock - used solely as a per-developer object cache
       | imitating the factory interface - was critical for unblocking the
       | people working on business logic and front-end.
       | 
       | I wouldn't structure a team that way now.
        
       | astine wrote:
       | Testing against a real database is an example of integration
       | testing. Using mocks is for unit testing. Ideally, you want to do
       | both. Unit testing entails isolating particular components for
       | testing which is important because it let's you be more precise
       | in what exactly you're testing. Using mocks also makes it easier
       | to run automated tests because it means you don't need to have a
       | database or credentials handy during the build process.
       | 
       | Integration testing is also important because the closer you get
       | to running things in a production environment, the more likely
       | you are to detect issues. Some things just won't be apparent
       | until you start running code in production or near production.
       | The thing to understand though, is that you want to do both if
       | you can, not either-or.
        
         | nwatson wrote:
         | Then just use a database for unit testing as well.
        
       | theogravity wrote:
       | Just use Testcontainers (https://testcontainers.com/). We use it
       | for quickly spinning up a temporary postgres instance to run our
       | db tests against.
        
       | diddid wrote:
       | The whole point of the mocking the database is to not test the
       | database! If you need to test the database then test the
       | database! Just like mocking an API can hide hidden issues with
       | the API... which is again the exact point of mocking the API.
       | This article should really be named "mocking your database isn't
       | testing your database" which seems like it should be obvious.
        
         | liontwist wrote:
         | Which is why any testing involving mocking should have low
         | purity.
         | 
         | You get 90% of benefit from two kinds of tests: - verifying
         | components work (unit tests) - verifying the system works from
         | its interface (automation)
        
         | bccdee wrote:
         | The point of mocking the database is to avoid the hassle of
         | spinning up a database to test against. If we ignore that, why
         | _not_ test our use of the database? Are we disinterested in the
         | schema and query errors we might catch?
         | 
         | Unit testing is useful because (a) it's easier to test all the
         | code paths in a unit when that unit is isolated, and (b) unit
         | tests don't need to be modified in response to changes in
         | unrelated modules. But (a) is irrelevant since we can easily
         | put dummy data in the DB, and (b) isn't a concern since the
         | relevant DB schemas and queries are tightly coupled to the
         | module in question anyway.
        
           | wesselbindt wrote:
           | > why not test our use of the database
           | 
           | Primarily slowness and brittleness. Tests with side effects
           | are much more likely to be flaky, and flaky tests are the
           | worst. Especially if they're slow and hence hard to debug.
           | 
           | Of course, you do test your use of the database, but in a
           | much smaller test suite. For the business logic tests you
           | just spin up a copy of your app with fake adapters, and
           | you're done. Quick, deterministic tests.
        
       | kapperchino wrote:
       | Just mock the repository which calls the db, and to test the
       | queries themselves, use the real db as a docker container, or
       | integration test with an existing db
        
       | perrygeo wrote:
       | Mocking the database can be done, but only if you use the
       | database as a generic storage layer. It's not trivial to keep the
       | mocks updated in most applications.
       | 
       | Some ORMs and frameworks provide magic to do this, but as soon as
       | you deviate from the framework it gets complex. You can build
       | geospatial applications, time series, document, and full text
       | search - iow leverage advanced features of Postgres extensions
       | that most ORMs can't touch. Their behaviors can be complex and
       | are part of the application, not simply a storage layer.
        
       | selimnairb wrote:
       | At least in the Python world, it's trivial to bring up an
       | isolated DB instance using a docker compose file and a simple
       | pytest plugin. Hardly seems worth the effort to mock the DB in
       | cases like this.
        
       | fcmgr wrote:
       | I've had some good experience with a mix of those approaches,
       | maybe not using mocks per se, but an "in-memory database
       | implementation" (just a wrapper around the hash map that
       | implements the same behaviors as a repository that deals with a
       | real database) on one hand, and testcontainers on the other.
       | (Still, using an in-memory db is way better than mocking, the
       | tests are not coupled to the implementation details/the
       | underlying model).
       | 
       | For simple use cases where I mostly just read/write from the
       | database and don't expect any of those issues mentioned in the
       | article (constraint violations or concurrency issues, because the
       | application is quite simple tbh, plus I already have some
       | testcontainers based integration tests for the components that
       | deal with a real db and I reuse them) writing tests with an in-
       | memory db implementation is quite nice - the setup is simple and
       | running the whole test suite is instantaneous (literally
       | something like 1-2s for a couple thousand tests, I don't need any
       | framework for those kind of tests).
       | 
       | And on the other hand if I'm relying on something like an
       | optimistic/pesimisstic lock or a framework feature, I will write
       | a test with the real thing, using test containers. Also have a
       | pretty good coverage with the components that deal with queues
       | and databases specifically with testcontainers. And on top of
       | that just a few e2e flows written using testcontainers as well.
        
         | mrkeen wrote:
         | My take is that your business logic shouldn't know about your
         | storage tech. (Like, dependency inversion 101 right?)
         | 
         | > Still, using an in-memory db is way better than mocking, the
         | tests are not coupled to the implementation details/the
         | underlying model.
         | 
         | Isn't this backwards? The fact that you've backed your storage
         | with a HashMap (which is 100% what I shoot for too) means your
         | service-under-test cannot know if it's talking to an SQL
         | database.
        
           | fcmgr wrote:
           | I think we're talking about the same thing :D? The
           | service/business logic/whatever you might want to call it
           | interacts with the storage via an interface - in prodcution
           | that interface is implemented by a component that can talk to
           | a real SQL database, in tests I can just create a wrapper
           | around a hash map and use that.
           | 
           | EDIT: What I meant when I wrote that tests are not coupled to
           | the underlying model/details is that with a mock you have to
           | explicitly specify "when called with this return that". With
           | an in memory database implementation you don't need to do
           | anything, the code will just use the interface methods like
           | "getX" or "saveY".
        
             | mrkeen wrote:
             | > with a mock you have to explicitly specify "when called
             | with this return that".
             | 
             | Riiiight, no I've always hated that Mockito way of doing
             | things.
             | 
             | I find wrapping a hashmap avoids the need for explicit
             | when-this-then-that bindings, because a hashmap already
             | does the expected behaviour natively.
             | 
             | You can even 'integrate' your tests as deeply as you like,
             | all standing on top of your poor hashmap. I.e. Http tests
             | with unit test speed.
             | 
             | var controller = new Controller(new Service(new Repo(new
             | HashMap()))) controller.POST(...);
             | assert(controller.GET(...));
        
       | brettgriffin wrote:
       | I'd be really surprised if most applications that rely on
       | databases can't just use a containerized image of their
       | production database in any environment, including CI. In fact,
       | the only example I can really think of is when the production
       | database can't be Dockerized (e.g. a proprietary SaaS database
       | like BigQuery or Snowflake).
       | 
       | I'm working on a project that has ~22,000 tests that operate on
       | Docker container of the same version of Postgres used in
       | production. In CI it completes all the database tests in under a
       | minute.
        
         | thom wrote:
         | Are these parallelised? <3ms overhead for a clean database and
         | test code to interact with it is very good!
        
       | p0w3n3d wrote:
       | of course one can do the query testing separated, but I have run
       | once or twice into problems covered by hibernate, and showing up
       | on sql dialect on real database. Hence as always the answer is:
       | 'it depends'
        
       | hamandcheese wrote:
       | My #1 rule: tests should "just work". A new contributor should
       | not have to run any setup, just a single test command. I don't
       | care what kind of tests they are or what you prefer to call them
       | as long as they work (in CI AND locally).
       | 
       | The vast majority of projects fail at this, often because some
       | external dependency like a database than needs some manual set
       | up.
       | 
       | To me, mocking out the db is a relatively cheap way to achieve
       | rule #1. Yes, it has pitfalls, but I am unwilling to compromise
       | on rule #1 so it's purely a question of cost vs fully automating
       | db setup and teardown.
        
         | Threadbare wrote:
         | Tell me you've never heard of test containers without telling
         | me you've never heard of..
        
       | calmbonsai wrote:
       | Replace "database mocks" with ANY <X> distributed API service
       | calls in your execution flow and there's still ZERO excuse to NOT
       | to be using SQLite or LocalStack mocks in your basic unit and
       | pre-integration testing.
       | 
       | Sure, there's no substitute for optimizing things for a
       | particular query engine and EVERY database engine has SUBSTANTIAL
       | quirks ( _cough_ , _cough_ Redshift), but you should be crawling
       | and walking before you try running.
        
       | elevatedastalt wrote:
       | This article seems to be implying that there's something special
       | about DBs that makes them only testable directly instead of
       | through mocks.
       | 
       | There's a continuum between extreme integration testing and
       | extreme unit testing, each with their strengths and weaknesses.
       | 
       | There's a reason we don't do all testing through integration
       | testing: it's slower, more flaky, and cripples developer velocity
       | to a near standstill.
       | 
       | There's a reason we don't do all testing through pure unit
       | testing: you have no idea whether the e2e plumbing works as
       | expected.
       | 
       | DBs aren't special in this regard. That being said, it is true
       | that investing in a light weight unit-testable setup for your DB
       | is a good idea.
        
       | nrr wrote:
       | I think it's probably worth mentioning that the principal concern
       | for tests should be proving out the application's logic, and
       | unless you're really leaning on your database to be, e.g., a
       | source of type and invariant enforcement for your data, any sort
       | of database-specific testing can be deferred to integration and
       | UAT.
       | 
       | I use both the mocked and real database approaches illustrated
       | here because they ultimately focus on different things: the
       | mocked approach validates that the model is internally consistent
       | with itself, and the real database approach validates that the
       | same model is externally consistent with the real world.
       | 
       | It may seem like a duplication of effort to do that, but tests
       | are where you really should Write Everything Twice in a world
       | where it's expected that you Don't Repeat Yourself.
        
         | jcgrillo wrote:
         | The database is often the thing that enforces the most critical
         | application invariants, and is the primary source of errors
         | when those invariants are violated. For example, "tenant IDs
         | are unique" or "updates to the foobars are strictly
         | serializable". The only thing enforcing these invariants in
         | production is the interplay between your database schema and
         | the queries you execute against it. So unless you exercise
         | these invariants and the error cases against the actual
         | database (or a lightweight containerized version thereof) in
         | your test suite, it's your users who are actually testing the
         | critical invariants.
         | 
         | I'm pretty sure "don't repeat yourself" thinking has led to the
         | vast majority of the bad ideas I've seen so far in my career.
         | It's a truly crippling brainworm, and I wish computer schools
         | wouldn't teach it.
        
         | globular-toast wrote:
         | Exactly, the proper db should be done with an integration/e2e
         | test. Doing every single unit test against an expensive db is
         | insanity and leads to a slow test suite.
         | 
         | The article makes it seem like it's an either/or. It is not.
         | You should do both fast unit tests and slower integration
         | tests. Has the test pyramid been forgotten? For web stuff I
         | generally test the happy path and one or two representative
         | unhappy paths for each endpoint with integration tests. But
         | each component gets _every_ path tested with unit tests. Also
         | there is a rule that you are not allowed to fix a bug without
         | also submitting the missing (failing, now passing) test.
        
       | jaza wrote:
       | You should do, and I usually do, both "pure" unit tests (with all
       | I/O - database calls, reading/writing local files, 3rd party API
       | calls, etc - being mocked), and integration tests (with ideally
       | all I/O really happening). More of the former, and less of the
       | latter, in line with the "testing pyramid" approach.
       | 
       | There is value in testing "assuming that the database returns 3
       | rows with Foo IDs and Foo Descriptions, my code should serve a
       | JSON response containing the Foo IDs and the Foo Descriptions
       | concatenated together". And there is also value in testing "when
       | my code executes SELECT id, description FROM foo LIMIT 3, the
       | real database should return 3 rows with Foo IDs and Foo
       | Descriptions". Granted, there's also a big advantage in that the
       | former can run much faster, with zero environment setup /
       | teardown complexity required. But my main point is, different
       | test suites should test different things.
       | 
       | However, if you're unable or unwilling to write two different
       | test suites, and if you're able to easily and reliably have a
       | database available in dev / CI, then ok, I concede, just write
       | integration tests.
        
       | binary132 wrote:
       | I think the sweet spot is simple relational abstractions that do
       | not necessarily depend on a SQL implementation. Those are
       | extremely quick and easy to write unit tests against and to use,
       | and don't burden the developer. Then the library implementing
       | them can itself be well-tested and straightforward, and even
       | portable to other backing stores. I'm not saying "use an ORM".
        
       | viccis wrote:
       | Can anyone here who might know more than me explain the
       | difference between dependency injection and the author's
       | suggested pattern at the end? Is it just the same thing? Seems
       | like that's the most common way I see to isolate external
       | services and stub them for testing.
        
       ___________________________________________________________________
       (page generated 2024-12-30 23:01 UTC)