[HN Gopher] Database mocks are not worth it
___________________________________________________________________
Database mocks are not worth it
Author : shayonj
Score : 75 points
Date : 2024-12-30 20:12 UTC (2 hours ago)
(HTM) web link (www.shayon.dev)
(TXT) w3m dump (www.shayon.dev)
| javcasas wrote:
| For tests, first use a real database. If you can't because the
| database just doesn't want to (ahem mongo ahem) then use a fake.
| If you can't use a fake, stop lying and use a fake.
|
| Mocks for databases are extremely brittle and complicated.
| btilly wrote:
| Ahem MongoDB? I must admit, I've never understood what
| MongoDB's real use case is supposed to be. Whenever I've looked
| at it, there has always been a better alternative. When I've
| had to use it, I've had to debug performance problems that
| wouldn't have existed with alternative solutions.
|
| It sounds cool. But running software isn't about sounding cool.
|
| A decade ago, it was really clear. As
| https://aphyr.com/posts/284-call-me-maybe-mongodb explains,
| MongoDB didn't really work. But they've fixed that. So now it
| runs acceptably accurately. I just don't know why I'd ever want
| to.
| karmakurtisaani wrote:
| In my company they use it to store a large number of varying
| sized jsons. You can create indexes to make search queries
| lighting fast. It's also been extremely stable.
|
| I wasn't involved in setting it up tho, so can't say anything
| about how difficult it is to work with on the technical side.
| dakiol wrote:
| I don't know. Like any other tool, it depends on how you use
| it. I worked for a unicorn with presence in multiple
| countries and they were using Mongo as main db in multiple
| microservices. Around 1500 engineers. It worked fine. I'm not
| saying it was justified, but never had perf. issues.
| nerdponx wrote:
| We had some pretty bad performance with $lookup queries but
| it all magically went away after adding some indexes. I
| have a lot of grievances with teams that use MongoDB, but
| relatively few grievances with MongoDB itself along these
| lines.
| nerdponx wrote:
| I used MongoDB at a company where engineering policy was
| strictly that MongoDB was our only allowable database,
| including in cases where it was clearly not the best choice.
|
| Were there good aspects? Sure... kind of. It was _super
| super_ easy to just throw data into the database. Need to add
| a field? Who cares, just add it. Need to remove a field? Who
| cares, just remove it -- as long as the calling code is null-
| safe. And it was likewise super easy to store denormalized
| data for fast lookups when it made sense to do so, as well as
| deeply-nested things and arbitrary JSON blobs synced from our
| CMS. And queries could be stored and manipulated as actual
| Python dicts instead of opaque strings, whereas you normally
| need an ORM or query builder to do that in SQL. And you
| _could_ get the best-of-both-worlds-ish with the ODMantic
| framework, where you could spit out a warning (but not an
| error) and recover gracefully if there happened to be bad
| data in the database.
|
| Basically it allows you to forego any foresight or serious
| design, and instead just throw shit together. Which is great
| for getting a prototype going really fast and giving the
| appearance of a highly productive team. That is, until you
| run out of new microservices to prototype and now you have to
| start adding features to existing code and fixing bugs. IMO
| it was completely "penny-wise and pound-foolish" with respect
| to developer time, but I can at least see the appeal if
| you're a particular type of engineer operating under a
| particular set of incentives.
|
| As for using MongoDB for things it was _actually_ meant for
| (writing a ton of schemaless JSON blobs really fast and
| figuring out reads later), I have no idea because we never
| used it for that and had nothing really resembling that use
| case.
| lelandbatey wrote:
| I worked full time on a pretty seriously-trafficed product
| based on MongoDB for 3 years and I still don't know of
| anywhere I'd want to use MongoDB. I'd basically always want
| _either_ a DB with a schema _or_ a super fast opaque-store
| style cache.
|
| Also, their hosted offerings (MongoDB Atlas) were not well
| operated and they took down our company for 2 days. Our
| MongoDB instance stopped accepting new connections and
| restarting our instance via their control panel didn't bring
| it back up and then their support literally said "we don't
| know why this happened or how to fix it" for like a day and a
| half, while we were on their highest tier ultra platinum
| support contract. Ultimately, we had to do 24 hours of
| research and then tell their own support how to fix the
| problem using their special shell access. If I recall
| correctly, it was some issue with WiredTiger (an internal
| component of the Mongo version we were using).
|
| After that experience, I'd never use anything produced by
| MongoDB for anything; we moved everything that we possibly
| could out of Mongo and into more traditional RDBMS
| (PostgreSQL) and never had to deal with issues like that
| again.
| computerfan494 wrote:
| MongoDB ships with horizontal sharding out-of-the-box, has
| idiomatic and well-maintained drivers for pretty much every
| language you could want (no C library re-use), is reasonably
| vendor-neutral and can be run locally, and the data modeling
| it encourages is both preferential for some people as well as
| pushes users to avoid patterns that don't scale very well
| with other models. Whether these things are important to you
| is a different question, but there is a lot to like that
| alternatives may not have answers for. If you currently or
| plan on spending > 10K per month on your database, I think
| MongoDB is one of the strongest choices out there.
|
| Also want to add that you can definitely use MongoDB (or any
| other database) in a way that doesn't scale well. I have
| personally run MongoDB at petabyte scale and had a relatively
| great experience.
| javcasas wrote:
| Sometimes you cannot choose. Sometimes you are handed down
| some decisions already made, and reverting them may have a
| cost the business doesn't want to pay.
| btilly wrote:
| I've found that replacing the database with in memory SQLite for
| tests is a sweet spot. Almost as fast as a mock, catches a lot of
| database issues. And it's really easy to do if you're using
| something like Django that makes automatically generating
| database migrations easy.
|
| It won't help you with database specific differences. But there
| should be very few of those if you're using a framework that
| abstracts away the database. Like Django.
| rikthevik wrote:
| > But there should be very few of those if you're using a
| framework that abstracts away the database.
|
| But I really want that database-specific behaviour. :)
| PostgreSQL does so many amazing things (recursive CTEs, jsonb,
| etc) that actively make our system better. If there was a fork
| of Django that optimized for leveraging advanced postgres
| features, I'd use it.
| ariofrio wrote:
| Does something like PGlite work for your use case?
| https://pglite.dev/
| devoutsalsa wrote:
| I want test with the same API as my database, which means
| using the same database. Setting up an instance of
| Postgres/MySQL/whatever isn't hard. Maybe harder if you're
| using an online database that doesn't fit into a container,
| but that's a different problem.
| smackeyacky wrote:
| You can just simply run postgres, why bother with pglite?
|
| postgres installs easily on WSL2 or whatever Linux
| distribution you're using.
| samwillis wrote:
| Unit tests work well with PGlite, at least in the TS/JS
| world where it natively sits at the moment. You can have a
| unique Postgres instance for each unit test that's up and
| running in just a few ms.
|
| It's possible to use PGlite from any language using native
| Postgres clients and pg-gateway, but you lose some of the
| nice test DX when it's embed directly in the test code.
|
| I'm hopeful that we can bring PGlite to other platforms,
| it's being actively worked on.
|
| The other thing I hope we can look at at some point is
| instant forks of in memory databases, it would make it
| possible to setup a test db once and then reset it to a
| known state for each test.
|
| (I work on PGlite)
| zeroonetwothree wrote:
| Sqlite supports recursive CTE
| rikthevik wrote:
| I think you're missing the point.
|
| I want to develop on postgres and test on postgres because
| I run postgres in production and I want to take advantage
| of all of its features. I don't understand why a person
| would 1) develop/test on a different database than
| production or 2) restrict one's self to the lowest common
| denominator of database features.
|
| Test what you fly, fly what you test.
| maxbond wrote:
| And JSONB, but you can't insert or update in a CTE like you
| can with Postgres.
| pdimitar wrote:
| I don't know what kind of magic fairy dust Django is but I've
| found the differences between SQLite and PostgreSQL too big to
| be worth it, in at least 3 other programming languages that are
| not Python.
|
| Sounded good at first but we were quickly overwhelmed with
| false positives and just opted for Postgres in a VM (this was
| before Docker was a thing).
| herpdyderp wrote:
| I've discovered the same thing: can't use SQLite to mock
| PostgreSQL. PGLite looks super promising for this, but I
| haven't tried it yet at scale.
| phendrenad2 wrote:
| This is a good thing to be aware of when choosing a database,
| but I think most of the time people just reach for Postgres
| because it's the "standard".
| pdimitar wrote:
| PostgreSQL stops many data bugs at the door due to being so
| strict -- something many programmers start rediscovering is
| a good thing by choosing stricter programming languages
| with time.
|
| I love SQLite to bits but the test harness I have to put
| around my apps with it is a separate project in itself.
| phendrenad2 wrote:
| Somehow this is my first time hearing about that benefit
| of Postgres! (despite having googled for explanations of
| why people were switching to it). Being on HN proves more
| valuable than google once again...
| pdimitar wrote:
| Glad to help!
|
| On a philosophical / meta level it's all quite simple: do
| your damnedest for the computer to do as much of your
| work for you as possible, really. Nothing much to it.
|
| Strict technologies slap you hard when you inevitably
| make a mistake so they do in fact do more of your work
| for you.
| chrisandchris wrote:
| I don't know...
|
| How if you rely e.g. on CASCADE and foreign keys, which are not
| on by default kn SQLite? I think then things start getting
| complicated and testing that layer gets difficult.
| 3pt14159 wrote:
| I'm surprised you say so btilly. I've found the differences in
| between SQLite and Postgres to be large enough to bring up
| issues that are hard to find. Timestamp handling, non-trivial
| indexes, etc.
| DanielHB wrote:
| The problem with this approach is that SQLite dialect is not
| the same as most production setups. Even if you use an ORM you
| often have manual queries or you are using a feature that is
| only supported in some databases (like geospatial queries)
| isbvhodnvemrwvn wrote:
| And sqlite is _extremely_ lax by default (and even non-
| defaults are not great).
| jmmv wrote:
| Came here to say this: I've found that using SQLite for tests
| is a good sweet spot, and maintaining dual DB implementations
| ensures my business logic remains generic. But there is no
| replacement for running the tests agains the real database
| periodically (maybe only in CI for example).
|
| I wrote about this and some more in
| https://jmmv.dev/2023/07/unit-testing-a-web-service.html
| MetaWhirledPeas wrote:
| > I've found that replacing the database with in memory SQLite
| for tests is a sweet spot
|
| Does that cover the first two examples brought up by the
| article? Constraint violations and default value handling.
| senex wrote:
| Does anyone have experience making tests against real databases
| fast?
|
| I resonate with the sentiment of this article, but have struggled
| to find an alternative that's fast enough as the test suite
| grows, isn't flakey in CI, and is able to share the production
| schema definition for relevant relations.
|
| I'd love to hear more from anyone that's solved for some of these
| constraints!
| lbriner wrote:
| A real database should not be slow. Even with our tests running
| against a hosted SQL Server on a separate server, the database
| is never the slow part. For other tests, we run with the same
| database in a local Docker container with Docker Compose and it
| is fast and isolated/resettable.
|
| Most tests should be unit tests, which are super fast.
| Integration and UI tests that might use the database should be
| fewer and if the database is slow, it might be related to your
| specific application or unoptimized database queries, our
| database calls are usually < 10ms
| pdimitar wrote:
| On my machine -- which is quite middle range at this point, not
| even high end -- I get by just fine up to 3000 tests, with
| Elixir at least. When I was at that contract the ~3200 tests
| ran in something like 40 seconds.
|
| What kinds of troubles do you have with using a real DB for
| testing?
| shayonj wrote:
| Doesn't directly answer your question but at least in Postgres
| I am curious about UNLOGGED mode and see if it results in
| faster specs. Trade off being, crash recovery doesn't work but
| that's fine in CI.
|
| There is also something to be said about keeping database
| transactions atomic (no 3rd party network calls, etc) to keep
| flakey specs to none. I have some ad hoc thoughts on this, will
| try to frame it proper in a post.
| TobiasBales wrote:
| I think one helpful thing is what rails calls transactional
| tests
| (https://guides.rubyonrails.org/testing.html#transactions). It
| basically does the database setup (migrations and seeds) and
| then executes the tests in a transaction that is rolled back at
| the end (and thus never committed). This helps with speed and
| also ensuring that tests don't have as much accidental
| codependence as they might otherwise.
|
| If you use read replicas in production code this can become
| tricky though since the transactions don't commit they never
| become visible to the reader or even different connections to
| the same database
| avg_dev wrote:
| Don't know what language or database you use, but check this
| out: https://github.com/peterldowns/pgtestdb
|
| If you happen to use Postgres, the approach is ultimately
| portable: it uses Pg database templates (also, regarding perf,
| the author recommends using a ramdisk and turning off fsync on
| your test DBs; you'll see this in the project readme). But
| you'll have to write the code yourself.
| thom wrote:
| This is what I do, it has an overhead of about 10-20ms per
| test and I've had zero flakiness. Absolute no brainier from
| my point of view.
| peibye wrote:
| Depends on how fast you need them to be. I've had success using
| docker postgres , set up the same as production, where each
| test creates a connection to it's own test specific database,
| or is written to run inside a transaction that can clean up
| after itself using a shared database. In my experience, slower
| tests that use a real db almost always save net positive dev
| time.
| jayd16 wrote:
| Setup tooling to start postgres in a local docker container and
| point at that. Works very well and there are some libraries
| that abstract it for you.
|
| I think it was Testcontainers.
| maxmcd wrote:
| Tests are usually embarrassingly parallel. Instead of creating
| one test db (app_test) create many (app_test_0, app_test_1,
| ...). Run tests in many threads/processes, db per thread.
|
| This works in a lot of cases. In some cases this might not
| address your bottleneck.
|
| Also someone should write a real, performant, in-memory
| postgres storage driver. Then we can all be happy (with pg at
| least).
| theogravity wrote:
| We use Testcontainers (https://testcontainers.com/) in our
| node.js / vitest / kysely stack. Really fast to spin up a
| temporary postgres instance, and we use kysely migrations to
| init and seed the db.
| nerdponx wrote:
| What I've done is make a clone of the real database, with a
| sample of data that has enough variety/size to test whatever it
| is you need to test, but no bigger. It definitely takes some
| thinking, planning, and writing of code, but it's worth doing.
|
| Unfortunately I maintain an app where the database (read-only)
| is Snowflake, and being more of a "warehouse" than "database"
| there's always a lot of overhead in running any query at all.
| Even just `select 1` can take a few seconds. So there's only so
| much you can do with that, but setting up your data so that
| tests can be parallelized helps as well.
|
| However your tests against a proper OLTP database should be
| plenty fast, unless your app itself is slow or your test
| fixtures require some really complicated setup.
| fatbird wrote:
| Not sure if this is still a valid approach, but on a large
| Django site with a lot of unit tests, the cumulative
| setup/teardown/reset cycles was killing us. We found that
| setting each test to be wrapped with a transaction that was
| aborted on teardown, caused the per test cleanup to drop
| radically. We also kept a canned database for testing so that
| running the test suite didn't have a large startup penalty to
| populate the database with test fixtures. Keeping that ready db
| between runs also sped things up a lot for devs.
| 3pt14159 wrote:
| Tests are embarrassingly parallel. If you can split up the load
| for CI then at the limit the slowest response is your slowest
| running test. I wish more tooling would expose this out of the
| box, but with some effort it is possible.
| dinosaurdynasty wrote:
| https://eradman.com/ephemeralpg/ (for Postgres) plus making
| sure your test suite is parallel works wonders.
| pdimitar wrote:
| I thought this was common knowledge and that it became even
| easier after Docker became a thing?
|
| Mocks are wishful thinking incarnate most of the time, though
| here and there they are absolutely needed (like 3rd party APIs
| without sandbox environments, or quite expensive API, or most of
| the time: both).
|
| Just pick a task runner -- I use just[0] -- and make a task that
| brings up both Docker and your containers, then run your test
| task, done. Sure it's a bit fiddly the first time around but I've
| seen juniors conquer that in a day maximum and then your tests
| actually work with the real world 99% of the time.
|
| Mocks in general are rarely worth it, the DB ones: 10x so.
|
| [0] https://github.com/casey/just
| shayonj wrote:
| +1 for this being common knowledge but there is still decent
| bit mocking that happens IME.
| withinboredom wrote:
| Mocks are great for testing specific code paths, but if you
| need to do that, there is usually a better way of doing it.
| Mocks hide contractual incongruities. For example, a function
| that returns either "fizz" or "buzz" that gets mocked out. If
| it gets changed to also return "bar", and you forget to
| update the mock, you've got a time-bomb in your code.
| pdimitar wrote:
| Completely agreed, that's why I have advocated for
| periodically (once a month) revisiting the test suites with
| them by doing re-captures of real data. There are
| frameworks that help with this.
|
| Is it a chore? Absolutely. But peace of mind is important.
| fcmgr wrote:
| I've had good experience with testcontainers
| (https://testcontainers.com/) to do that sort of thing.
| kkarimi wrote:
| Yes it's a shame that this is not leading the discussion,
| works very well and surprisingly efficient in our experience
| compared to manually setting up docker containers or docker
| compose
| donjoe wrote:
| testcontainers is great. I struggled a bit with
| testcontainers due to the nature of one container per test
| which just felt too slow for writing gray/blackbox tests. The
| startup time for postgres was > 10 seconds. After a bit of
| experimenting, I am now quite happy with my configuration
| which allows me to have a snappy, almost instant testing
| experience.
|
| My current setup:
|
| - generate a new psql testcontainer _or_ reuse an existing
| one by using a fixed name for the container - connect to the
| psql container with no database selected - create a new
| database using a random database name - connect to the
| randomly generated database - initialize the project's tables
| - run a test - drop the database - keep the testcontainer up
| and running and reuse with next test
|
| With this setup, most tests run sub-second;
| nerdponx wrote:
| I've had a nice experience using Docker Compose for this,
| although you might still want a task runner wrapper around it.
| Podman Compose should work fine as well.
| pdimitar wrote:
| Exactly what I do. Having a local `docker-compose.yml` that
| helps bring up a dev and test environments is one of my first
| action items when hired. Paid huge dividends, many times.
| Deukhoofd wrote:
| Yeah, I ran in very much the same issues. Setting up a database
| for unit tests can be even more of a bother though (especially
| for CI pipelines, I don't want to have to run a full database
| there), so I took a middle road, and use my ORM to throw up a
| temporary in-memory SQLite database that is mostly similar to our
| actual database. Each unit tests scaffolds its own database, and
| deletes it when it's done.
|
| That allows me to somewhat mock the database within unit tests,
| while still being able to test database-specific errors that
| might occur, like foreign key errors.
| pdimitar wrote:
| Why a separate DB for each test? Just have only one DB and each
| test opens a transaction and then rolls it back when it's done.
| That way you also achieve isolation of tests so they don't
| interfere with each other.
| Deukhoofd wrote:
| The code being tested also uses transactions internally at
| times, so it'd mean additional complexity in the code being
| tested to allow for unit testing, which is not great. In my
| experience throwing up a database including all tables in an
| in-memory SQLite db is extremely fast, so it's not really a
| major concern.
| pdimitar wrote:
| OK, but is your production DB also SQLite? If not, I would
| not. I found the differences between it and PostgreSQL too
| big and was getting too many false positives.
|
| Also code complexity on a few levels of recursion of
| transactions is an easy thing to abstract away with almost
| zero performance penalty -- depending on your programming
| language of choice.
| Deukhoofd wrote:
| The production DB is MSSQL, and we scaffold it in SQLite
| through EF Core. The resulting SQLite DB is close enough
| to our production DB that we are able to catch invalid
| defaults, missing foreign keys, etc. in unit tests
| instead of later on in our testing pipeline, which helps
| massively in accelerating our development. It could be
| even better if SQLite would actually tell you which
| foreign key constraint failed instead of its somewhat
| unhelpful 'foreign key constraint failed' error, but as
| it is we at least know _something_ is wrong in our code.
|
| And sure, we could probably refactor it to use
| transactions to shave a few seconds of running our test
| suite, but it'd add some additional mental complexity to
| our codebase to do so. In general, I prefer to keep the
| actual running code as simple as possible, and have any
| complexity that is required for unit tests be handled by
| the unit tests. By just recreating the database the unit
| tests currently handle all that complexity.
|
| The idea is definitely not a bad one though, and if your
| scaffolds are big enough to actually cause performance
| issues with your unit tests it might definitely be a
| consideration to not recreate the database every time.
| pdimitar wrote:
| Well, if you found a productive workflow then who am I to
| judge, right?
|
| However, if I was hired into your team tomorrow you'll
| have to fiercely fight with me over this:
|
| > _And sure, we could probably refactor it to use
| transactions to shave a few seconds of running our test
| suite, but it 'd add some additional mental complexity to
| our codebase to do so._
|
| Various languages and frameworks demonstrate that
| abstracting this away and using convenient wrappers is
| more or less trivial, and definitely a solved problem.
| And the resulting code is simple; you are (mostly) none
| the wiser that you are actually using a temporary
| transaction that would be ultimately rolled back.
|
| ...Though that is extremely sensitive to which
| programming language and DB library you are using, of
| course.
|
| Your approach works, so keep using it but IMO it's a
| temporary one and it could stop being fit-for-purpose
| Soon(tm). You yourself are already aware of its
| limitations so just keep an eye out for this moment and
| be decisive when the time comes to migrate away from it.
|
| My $0.02.
| worik wrote:
| This risks unleashing a bit of a "religious" war.
|
| I agree that tests should occur in the environment they are to be
| run in, when ever possible. Otherwise not...
|
| There is no science in testing. There are no standardised
| definitions. We agree it is good, we often ignore the downsides
| (they exist - there are costs to tests)
|
| I have struck examples where the testing frameworks of a language
| (Dart in this case) ruled out using a live HTTPS client to test.
| I kept getting 404 errors running it against my (test) site. I
| made a bug report. The good folks at Google (who I am sure are
| very smart - if a little stupid) had decided that the HTTPS
| client would always return 404 when run as part of the testing
| framework.
|
| I argued, but apparently I am not so smart (I am less stupid,
| IMO). The HTTPS client testing had to either occur outside the
| test framework, or I rewrite the HTTPS client, or I wrote a mock
| up of some sort, all things I would have to pay for with time and
| effort - for no gain.
|
| I get the point about mocking. But I am in business. I am not
| interested in always testing in isolation, (sometimes, sure), I
| want my tests to depend on the other parts of my business.
|
| If the database goes wrong and my Gui tool test discovers it,
| that is a good thing, not a bad thing
| nerdponx wrote:
| This is one of many reasons why "isolating" I/O within your
| application is so important, whether that means pushing it to the
| edges or otherwise carefully corralling it in known-critical
| areas where it can't be removed.
|
| A static type system that assists with that design style is nice
| as well. It's really useful to be able to look at a function and
| know right away what I/O it performs, if any. This is probably my
| #1 missing feature in Python, now that static type hints work
| well enough for most purposes.
|
| That said, setting up and maintaining a database test harness in
| the developer environment can be really annoying and time
| consuming. Even moreso when your database is a proprietary cloud
| thing (e.g. Snowflake). But I have never ever regretted spending
| the time to set it up, whereas I have definitely regretted _not_
| having it.
|
| Sometimes it's unavoidable that your "unit" tests need to access
| the database to be able to properly work out the "unit" under
| test. Watching those tests go from failing to passing is such a
| sweet feeling, it makes the pain of maintaining the test harness
| feel worthwhile. And then of course when you hit the actual
| integration tests in CI, everything almost always passes, because
| your unit tests are that good.
| andy_ppp wrote:
| As per usual Elixir does this correctly and even has a setup that
| allows all tests to run against the database in a pristine way in
| parallel:
| https://hexdocs.pm/ecto_sql/Ecto.Adapters.SQL.Sandbox.html
| napsterbr wrote:
| A hobby project of mine (in Elixir) uses SQLite as primary
| database. Each test runs in its own fully isolated SQLite
| database. No mocking (or transaction rolling back) needed. Most
| of these tests take less than 1ms to run (and when they take
| longer, it's because of something else).
|
| This kind of setup makes the usual Ecto Sandbox approach feel
| slow, but I do agree that the way Elixir approaches this is
| great!
| brightball wrote:
| Yea, I've been working on a side project for a while and I keep
| building it with Rails because there's so much about Rails that
| speeds up this particular type of project...
|
| But the testing setup in Elixir is just exemplary compared to
| everything else I've worked with. I fight myself daily on just
| rebuilding the entire project in Elixir.
| mrkeen wrote:
| Two points:
|
| 1) The subtle issues with constraint violations, default values
| and indexes are part of the RDBMS, not the testing strategy. The
| article suggests stepping on these rakes in test so that you
| hopefully don't step on them in prod. Another way to avoid them
| in prod (and to also regain the speed/reliability/simplicity of
| your test suite) is to not use an RDBMS in prod.
|
| 2) If you want to persist something, ask yourself if what you
| really have is tabular data. Do you have a lot of nulls and/or
| default values? Catch-all json fields? You probably have objects,
| not relations. Postgres may have a lot of wonderful features that
| relate to relations, and an ORM might have a lot of wonderful
| features (which appear to save you time). But if you stop
| requiring your objects to be relations at rest, you don't need
| the features, which is even faster.
| Volundr wrote:
| > constraint violations
|
| You call these rakes and suggest removing them from prod, I
| call these vital safeguards. Defining constraints at the
| database layer gives me confidence that these will truly be
| invariants, and I won't one day discover that I've got millions
| of rows in my database missing critical data, with no way of
| knowing what should be there. Been there, done that, no desire
| to revisit it.
| mrkeen wrote:
| My coworkers call it "data integrity". At best it's a local
| maximum for consistency. Right now it's how we reject true
| facts about the business. Our partner bank tells us a
| customer deactivated? Our database says "no", someone gets a
| 500, and we get a support ticket when the customer can't sign
| up again (because they're still in our system).
| seeknotfind wrote:
| This is not only about databases. Mocks are almost always a
| compromise.
| Lvl999Noob wrote:
| I faced problems with flaky unit tests that used a common unit
| test database for the whole project. Since all the tests ran at
| once, the tests would sometimes fail because of concurrency
| issues. I never got the time to look too deeply into it.
|
| I wonder if there's a database that works sorta like git, giving
| one "commit" point at the start of every test and then "branching
| off" so each test can do its own thing, make transactions, commit
| them, whatever, and finally be able to roll the whole thing back.
|
| If that can be done then each unit test can use a full database
| clone with actual valid data and not just test data that was fast
| enough to set up before the test ran.
| shayonj wrote:
| Starting each test in its own transaction and then rolling back
| at the end, and ensuring no 3rd party network calls in
| application logic in rails and Postgres apps have worked with a
| lot of success in my experience.
| whatnotests2 wrote:
| Transaction per test is how that works in ruby on rails. Also,
| the rspec parallel gem will create N test databases for running
| tests - one per process.
| masklinn wrote:
| That doesn't work if you need to commit for one reason or an
| other. And while in some cases you might be able to
| substitute subtransactions via some sort of test mode, it's
| not always an option depending what you're doing / testing
| exactly.
| masklinn wrote:
| Dupe the database. That's what I do in one of my test suites
| (though that's in part because the tests happen across 4
| different processes so it's just not possible to keep it within
| one transaction).
|
| Creating a db from a template is not free, but if there's not
| too much stuff in the db it's not horrendous either. And if
| you're using sqlite, you might be able to backup an on-disk
| template to an in-memory test db.
| Volundr wrote:
| Datomic more or less works like this. I wasn't terribly
| impressed in general, but it did make tests a breeze.
|
| Unless you use Clojure though it's probably not a real option.
| kerblang wrote:
| Was dealing with mocking voodoo garbage this morning as a result
| of fixing a bug. What a horrible mess just to claim a few lines
| of "test coverage" while ignoring critical db functionality. I
| wouldn't mind if the mocking frameworks weren't absurdly complex
| in their own right.
|
| Unfortunately our "integration" tests are even worse, corrupting
| data all over and flaky as hell.
| isotropy wrote:
| I've seen a db mock work when 1) there was a small subteam in
| charge of the OR-mapping, schema structure, and included a DBA;
| and 2) also a design policy from the architect that all objects
| had to come out of factories. Under those specific circumstances,
| having the mock - used solely as a per-developer object cache
| imitating the factory interface - was critical for unblocking the
| people working on business logic and front-end.
|
| I wouldn't structure a team that way now.
| astine wrote:
| Testing against a real database is an example of integration
| testing. Using mocks is for unit testing. Ideally, you want to do
| both. Unit testing entails isolating particular components for
| testing which is important because it let's you be more precise
| in what exactly you're testing. Using mocks also makes it easier
| to run automated tests because it means you don't need to have a
| database or credentials handy during the build process.
|
| Integration testing is also important because the closer you get
| to running things in a production environment, the more likely
| you are to detect issues. Some things just won't be apparent
| until you start running code in production or near production.
| The thing to understand though, is that you want to do both if
| you can, not either-or.
| nwatson wrote:
| Then just use a database for unit testing as well.
| theogravity wrote:
| Just use Testcontainers (https://testcontainers.com/). We use it
| for quickly spinning up a temporary postgres instance to run our
| db tests against.
| diddid wrote:
| The whole point of the mocking the database is to not test the
| database! If you need to test the database then test the
| database! Just like mocking an API can hide hidden issues with
| the API... which is again the exact point of mocking the API.
| This article should really be named "mocking your database isn't
| testing your database" which seems like it should be obvious.
| liontwist wrote:
| Which is why any testing involving mocking should have low
| purity.
|
| You get 90% of benefit from two kinds of tests: - verifying
| components work (unit tests) - verifying the system works from
| its interface (automation)
| bccdee wrote:
| The point of mocking the database is to avoid the hassle of
| spinning up a database to test against. If we ignore that, why
| _not_ test our use of the database? Are we disinterested in the
| schema and query errors we might catch?
|
| Unit testing is useful because (a) it's easier to test all the
| code paths in a unit when that unit is isolated, and (b) unit
| tests don't need to be modified in response to changes in
| unrelated modules. But (a) is irrelevant since we can easily
| put dummy data in the DB, and (b) isn't a concern since the
| relevant DB schemas and queries are tightly coupled to the
| module in question anyway.
| wesselbindt wrote:
| > why not test our use of the database
|
| Primarily slowness and brittleness. Tests with side effects
| are much more likely to be flaky, and flaky tests are the
| worst. Especially if they're slow and hence hard to debug.
|
| Of course, you do test your use of the database, but in a
| much smaller test suite. For the business logic tests you
| just spin up a copy of your app with fake adapters, and
| you're done. Quick, deterministic tests.
| kapperchino wrote:
| Just mock the repository which calls the db, and to test the
| queries themselves, use the real db as a docker container, or
| integration test with an existing db
| perrygeo wrote:
| Mocking the database can be done, but only if you use the
| database as a generic storage layer. It's not trivial to keep the
| mocks updated in most applications.
|
| Some ORMs and frameworks provide magic to do this, but as soon as
| you deviate from the framework it gets complex. You can build
| geospatial applications, time series, document, and full text
| search - iow leverage advanced features of Postgres extensions
| that most ORMs can't touch. Their behaviors can be complex and
| are part of the application, not simply a storage layer.
| selimnairb wrote:
| At least in the Python world, it's trivial to bring up an
| isolated DB instance using a docker compose file and a simple
| pytest plugin. Hardly seems worth the effort to mock the DB in
| cases like this.
| fcmgr wrote:
| I've had some good experience with a mix of those approaches,
| maybe not using mocks per se, but an "in-memory database
| implementation" (just a wrapper around the hash map that
| implements the same behaviors as a repository that deals with a
| real database) on one hand, and testcontainers on the other.
| (Still, using an in-memory db is way better than mocking, the
| tests are not coupled to the implementation details/the
| underlying model).
|
| For simple use cases where I mostly just read/write from the
| database and don't expect any of those issues mentioned in the
| article (constraint violations or concurrency issues, because the
| application is quite simple tbh, plus I already have some
| testcontainers based integration tests for the components that
| deal with a real db and I reuse them) writing tests with an in-
| memory db implementation is quite nice - the setup is simple and
| running the whole test suite is instantaneous (literally
| something like 1-2s for a couple thousand tests, I don't need any
| framework for those kind of tests).
|
| And on the other hand if I'm relying on something like an
| optimistic/pesimisstic lock or a framework feature, I will write
| a test with the real thing, using test containers. Also have a
| pretty good coverage with the components that deal with queues
| and databases specifically with testcontainers. And on top of
| that just a few e2e flows written using testcontainers as well.
| mrkeen wrote:
| My take is that your business logic shouldn't know about your
| storage tech. (Like, dependency inversion 101 right?)
|
| > Still, using an in-memory db is way better than mocking, the
| tests are not coupled to the implementation details/the
| underlying model.
|
| Isn't this backwards? The fact that you've backed your storage
| with a HashMap (which is 100% what I shoot for too) means your
| service-under-test cannot know if it's talking to an SQL
| database.
| fcmgr wrote:
| I think we're talking about the same thing :D? The
| service/business logic/whatever you might want to call it
| interacts with the storage via an interface - in prodcution
| that interface is implemented by a component that can talk to
| a real SQL database, in tests I can just create a wrapper
| around a hash map and use that.
|
| EDIT: What I meant when I wrote that tests are not coupled to
| the underlying model/details is that with a mock you have to
| explicitly specify "when called with this return that". With
| an in memory database implementation you don't need to do
| anything, the code will just use the interface methods like
| "getX" or "saveY".
| mrkeen wrote:
| > with a mock you have to explicitly specify "when called
| with this return that".
|
| Riiiight, no I've always hated that Mockito way of doing
| things.
|
| I find wrapping a hashmap avoids the need for explicit
| when-this-then-that bindings, because a hashmap already
| does the expected behaviour natively.
|
| You can even 'integrate' your tests as deeply as you like,
| all standing on top of your poor hashmap. I.e. Http tests
| with unit test speed.
|
| var controller = new Controller(new Service(new Repo(new
| HashMap()))) controller.POST(...);
| assert(controller.GET(...));
| brettgriffin wrote:
| I'd be really surprised if most applications that rely on
| databases can't just use a containerized image of their
| production database in any environment, including CI. In fact,
| the only example I can really think of is when the production
| database can't be Dockerized (e.g. a proprietary SaaS database
| like BigQuery or Snowflake).
|
| I'm working on a project that has ~22,000 tests that operate on
| Docker container of the same version of Postgres used in
| production. In CI it completes all the database tests in under a
| minute.
| thom wrote:
| Are these parallelised? <3ms overhead for a clean database and
| test code to interact with it is very good!
| p0w3n3d wrote:
| of course one can do the query testing separated, but I have run
| once or twice into problems covered by hibernate, and showing up
| on sql dialect on real database. Hence as always the answer is:
| 'it depends'
| hamandcheese wrote:
| My #1 rule: tests should "just work". A new contributor should
| not have to run any setup, just a single test command. I don't
| care what kind of tests they are or what you prefer to call them
| as long as they work (in CI AND locally).
|
| The vast majority of projects fail at this, often because some
| external dependency like a database than needs some manual set
| up.
|
| To me, mocking out the db is a relatively cheap way to achieve
| rule #1. Yes, it has pitfalls, but I am unwilling to compromise
| on rule #1 so it's purely a question of cost vs fully automating
| db setup and teardown.
| Threadbare wrote:
| Tell me you've never heard of test containers without telling
| me you've never heard of..
| calmbonsai wrote:
| Replace "database mocks" with ANY <X> distributed API service
| calls in your execution flow and there's still ZERO excuse to NOT
| to be using SQLite or LocalStack mocks in your basic unit and
| pre-integration testing.
|
| Sure, there's no substitute for optimizing things for a
| particular query engine and EVERY database engine has SUBSTANTIAL
| quirks ( _cough_ , _cough_ Redshift), but you should be crawling
| and walking before you try running.
| elevatedastalt wrote:
| This article seems to be implying that there's something special
| about DBs that makes them only testable directly instead of
| through mocks.
|
| There's a continuum between extreme integration testing and
| extreme unit testing, each with their strengths and weaknesses.
|
| There's a reason we don't do all testing through integration
| testing: it's slower, more flaky, and cripples developer velocity
| to a near standstill.
|
| There's a reason we don't do all testing through pure unit
| testing: you have no idea whether the e2e plumbing works as
| expected.
|
| DBs aren't special in this regard. That being said, it is true
| that investing in a light weight unit-testable setup for your DB
| is a good idea.
| nrr wrote:
| I think it's probably worth mentioning that the principal concern
| for tests should be proving out the application's logic, and
| unless you're really leaning on your database to be, e.g., a
| source of type and invariant enforcement for your data, any sort
| of database-specific testing can be deferred to integration and
| UAT.
|
| I use both the mocked and real database approaches illustrated
| here because they ultimately focus on different things: the
| mocked approach validates that the model is internally consistent
| with itself, and the real database approach validates that the
| same model is externally consistent with the real world.
|
| It may seem like a duplication of effort to do that, but tests
| are where you really should Write Everything Twice in a world
| where it's expected that you Don't Repeat Yourself.
| jcgrillo wrote:
| The database is often the thing that enforces the most critical
| application invariants, and is the primary source of errors
| when those invariants are violated. For example, "tenant IDs
| are unique" or "updates to the foobars are strictly
| serializable". The only thing enforcing these invariants in
| production is the interplay between your database schema and
| the queries you execute against it. So unless you exercise
| these invariants and the error cases against the actual
| database (or a lightweight containerized version thereof) in
| your test suite, it's your users who are actually testing the
| critical invariants.
|
| I'm pretty sure "don't repeat yourself" thinking has led to the
| vast majority of the bad ideas I've seen so far in my career.
| It's a truly crippling brainworm, and I wish computer schools
| wouldn't teach it.
| globular-toast wrote:
| Exactly, the proper db should be done with an integration/e2e
| test. Doing every single unit test against an expensive db is
| insanity and leads to a slow test suite.
|
| The article makes it seem like it's an either/or. It is not.
| You should do both fast unit tests and slower integration
| tests. Has the test pyramid been forgotten? For web stuff I
| generally test the happy path and one or two representative
| unhappy paths for each endpoint with integration tests. But
| each component gets _every_ path tested with unit tests. Also
| there is a rule that you are not allowed to fix a bug without
| also submitting the missing (failing, now passing) test.
| jaza wrote:
| You should do, and I usually do, both "pure" unit tests (with all
| I/O - database calls, reading/writing local files, 3rd party API
| calls, etc - being mocked), and integration tests (with ideally
| all I/O really happening). More of the former, and less of the
| latter, in line with the "testing pyramid" approach.
|
| There is value in testing "assuming that the database returns 3
| rows with Foo IDs and Foo Descriptions, my code should serve a
| JSON response containing the Foo IDs and the Foo Descriptions
| concatenated together". And there is also value in testing "when
| my code executes SELECT id, description FROM foo LIMIT 3, the
| real database should return 3 rows with Foo IDs and Foo
| Descriptions". Granted, there's also a big advantage in that the
| former can run much faster, with zero environment setup /
| teardown complexity required. But my main point is, different
| test suites should test different things.
|
| However, if you're unable or unwilling to write two different
| test suites, and if you're able to easily and reliably have a
| database available in dev / CI, then ok, I concede, just write
| integration tests.
| binary132 wrote:
| I think the sweet spot is simple relational abstractions that do
| not necessarily depend on a SQL implementation. Those are
| extremely quick and easy to write unit tests against and to use,
| and don't burden the developer. Then the library implementing
| them can itself be well-tested and straightforward, and even
| portable to other backing stores. I'm not saying "use an ORM".
| viccis wrote:
| Can anyone here who might know more than me explain the
| difference between dependency injection and the author's
| suggested pattern at the end? Is it just the same thing? Seems
| like that's the most common way I see to isolate external
| services and stub them for testing.
___________________________________________________________________
(page generated 2024-12-30 23:01 UTC)