[HN Gopher] Reducing complexity by integrating through the database
___________________________________________________________________
Reducing complexity by integrating through the database
Author : evanweaver
Score : 59 points
Date : 2021-11-03 12:54 UTC (1 days ago)
(HTM) web link (fauna.com)
(TXT) w3m dump (fauna.com)
| tlarkworthy wrote:
| At Firebase it was sometimes called the client-database-server
| architecture. The pattern was documented in 2013 [1]
|
| If you use Firebase as an ephemeral message bus it's a great
| pattern. It has problems if you use it like a traditional
| database because migrations are very tricky. DBs that support
| views (or GraphQL) can make migrations much easier
|
| [1] https://firebase.googleblog.com/2013/03/where-does-
| firebase-...
| DeathArrow wrote:
| I wouldn't integrate through a database but I wouldn't refuse to
| integrate through a distributed cache.
|
| Anyway using REST for inter services communication decreases
| performance and increases latency and 99% of projects still do
| it.
| evanweaver wrote:
| Curious what makes a cache better to you.
| DeathArrow wrote:
| It would be faster and I still get to keep microservices from
| accessing the data they don't need or shouldn't access.
|
| But I rather use RPC for communication.
| mdoms wrote:
| I think anyone with experience on this type of system will recoil
| in horror at the thought of building it intentionally today.
| cogman10 wrote:
| Tell me about it. This is the current system I maintain. While
| new products have (mostly) started using non-integrated DBs,
| all the legacy systems we have use a giant integrated db.
|
| It SUCKS.
|
| Let me count the ways.
|
| - You can't update a table schema without updating and
| deploying a bunch of applications in sync (even if that's part
| of a stored proc). Which for something like this, means that
| getting out of this situation is WAY harder than getting into
| it.
|
| - You end up putting WAY too much logic into the DB which makes
| it hard to ultimately figure out WHAT is supposed to happen
|
| - DBs have TERRIBLE development stories. That's ultimately
| because the code and data all live in the same place and you
| can update code without any sort of revision control to help
| you understand or see a change that's been made to db schema
| (Forcing a bunch of painful process around updating DB
| capabilities).
|
| - DBs are resource bottlenecks that SUCK to figure out how to
| scale out. Putting a bunch of apps into one DB complicates that
| process. Scaling a single app in a single DB is simply WAY
| easier.
|
| - At least my db (but I assume a bunch of other DBs) have
| really crappy performance diagnostic tools. Further, the more
| complicated the queries against it, the more likely you are to
| go from "Hey, stats are making things fast" to "OMG, why is
| this thing taking 10 seconds to run now!". It's really bad when
| the only solution that seems to fix things is dumping stats.
|
| I could MAYBE see something like this for a macro service
| dedicated to a domain, but I'd never build a complex system
| like this from scratch. Colocating apps in the same DB would
| have to be for some crazy performance reasons why bypassing a
| microservice makes sense. An exception, not the regular course
| of action (And I'd still hate it :) )
| evanweaver wrote:
| I mean, yeah, this is why people stopped using this pattern.
| But these problems are getting solved, especially in Fauna:
|
| 1. Schemaless/document/schema-on-need databases like Fauna
| don't mandate the application breakage on every change that
| SQL does
|
| 2. It's hard to reason about if its not transparent, but it
| can be transparent now, see below
|
| 3. Fauna is a temporal database, which acts like version
| control on your stored procedures, so you can easily check
| and revert any change
|
| 4. Fauna is serverless and horizontally scalable without
| consistency/latency impact
|
| 5. This was definitely a problem when you were occupying
| precious CPU cores on a vertically scaled RDBMS with business
| logic, but compute in Fauna or in serverless lambdas scales
| horizontally indefinitely
| gunnarmorling wrote:
| No matter how you integrate different applications, be it via
| APIs, messaging, or a database, it's vital to separate your
| application's internal data model from models which it exposes.
| If you don't do that, you're in for a never-ending story of
| upstream services unknowingly breaking downstream services, or
| upstream services not being able to evolve in any meaningful way.
|
| So if they mean directly exposing a service's data model from the
| database to other services, I'm very skeptical. If they mean
| providing that access by means of some abstraction, e.g. database
| views, it can be an option in some cases.
|
| You'll still loose lots of flexibility you'd gain by putting some
| middleware in between, e.g. ability to scale out compute
| independently from storage, ability to merge data from multiple
| sources into one API response, ability to implement arbitrarily
| complex business logic e.g. in Java, etc.
| dennisy wrote:
| I agree, but would you mind providing a concrete example of:
|
| > No matter how you integrate different applications, be it via
| APIs, messaging, or a database, it's vital to separate your
| application's internal data model from models which it exposes.
|
| Especially in respect to this product.
| treeman79 wrote:
| A field you rely on changed form integer to string. Or goes
| away. Or was unique but a duplicate shows up. Or the database
| disappears for a few minutes every hour.
|
| S
| dennisy wrote:
| Interesting read, but seems like a tangential topic?
| joshuanapoli wrote:
| Over time, the internal data model might need to change to
| more accurately model the world. For consistency and
| efficiency, we usually do not want to maintain the
| combination of the original low-fidelity model and new high-
| fidelity model within the internal database.
|
| The internal data model might need to be reorganized to
| improve the efficiency of part of the internal application.
|
| A client might not need a higher fidelity model. The internal
| reorganization of the data model might not be pertinent to
| the client's interface. So we normally have some data-mapping
| in the application to help provide a stable interface for
| clients.
|
| It's possible to providing these compatibility mappings
| within the database through views, but this is usually
| considered to be harder to control, test and scale.
|
| Maybe in a big application, the application-layer mapping and
| caching eventually get complicated enough to be something
| like a custom-made database. And so we might end up with an
| "integration database" but call it something different.
| dennisy wrote:
| Great thanks.
|
| So we have a DB and a service in front of it. The DB gets a
| new schema, but the service maps to the previous
| representation in such a way that clients do not need to
| know about the new schema. Is that the gist?
|
| Then when it comes to using Fauna, what is it that does not
| allow such a data flow?
| ttymck wrote:
| Are you asking for an example of separating the data models
| or an example of what happens when you don't.
|
| This is a good example of what separation _enables you to do_
| :
|
| https://www.troyhunt.com/your-api-versioning-is-wrong-
| which-...
|
| (it is presented from the perspective of _how_ to version an
| API, but the examples of _what_ it looks like are there)
| evanweaver wrote:
| Stored procedures and the integration database have come back for
| our users in a big way. It would be great to hear examples of how
| others are applying this pattern with other databases and APIs.
| chrisjc wrote:
| Since you're the one making the claim that they're making a
| come back, I'd love to hear your own personal story or some of
| the stories you've heard.
|
| Personally I haven't noticed anything resembling a come back,
| but I'm certainly using them more than ever... and I'm loving
| it.
| robsutter wrote:
| My favorite thing about this pattern is that it allows
| developers to build globally distributed applications while
| reasoning about them like a single, centralized monolith.
|
| A lot of complicated practices that we've adopted to overcome
| the challenges of distributed systems just disappear when you
| have fast, strongly consistent transactional writes on a single
| data store. Either a thing happened or it didn't. No need to
| wait around and see.
|
| This matters even more as applications move from single
| failover to multi-region to edge computing with hundreds of
| locations of compute. How do you get consistent state out to
| each PoP?
|
| You don't, you integrate through the database.
| dennisy wrote:
| The article is provocative!
|
| But I do not see why the product itself is different to
| serverless DBs such as SupaBase and PlanetScale?
| rc_hackernews wrote:
| My first two programming jobs out of college back in the early
| 2010's took the approach in this article. Albeit, with older
| technology.
|
| This is bringing back old (bad) memories of the times where I was
| debugging stored procedures that called triggers that called the
| same stored procedures.
|
| A lot of that was due to poor design and bad choices. Some of
| that was due to developers trying to fit processes and patterns
| into a language, i.e. SQL, that lacked the expressiveness for it.
|
| I'm not a fan of this approach to say the least.
|
| I'd be interested in hearing other's experiences with this
| though.
|
| And I'm going to check out Fauna since it looks like a cool
| database and to see if anything has changed with this approach
| since I encountered it almost a decade ago.
| robsutter wrote:
| I remember those days too - my sins were T-SQL and I remember
| those headaches quite clearly.
|
| One huge difference today is that you can actually unit-test
| user-defined functions and stored procedures using the same
| tooling you use for your application code. That lends itself
| naturally to integration testing, and so on.
|
| I wouldn't necessarily recommend taking it to the property-
| based testing extreme like I did just to see whether it worked
| (it does).
| chrisjc wrote:
| > you can actually unit-test user-defined functions and
| stored procedures using the same tooling you use for your
| application code
|
| Can you give an example of how this is possible, or what
| tools/services it's possible with? I can understand how it
| would work on UDFs, but not entirely sure how it would be
| possible for stored procedures, esp when they're often times
| doing something more complex than simple read operations.
| robsutter wrote:
| Sure - here's an example using Jest to test the logic in
| UDFs: https://github.com/fauna-labs/fql-
| utilities/blob/main/tests/...
|
| You can use Jest's before[All] and after[All] to do any
| required setup and teardown per usual.
|
| For Fauna, UDFs and stored procedures are synonymous.
| GordonS wrote:
| I remember doing the same almost 30 years back, copying data
| from an Oracle database to an MSSQL database.
|
| It was a fairly limiting approach, performance wasn't great,
| and diagnosing problems was a total PITA.
|
| Afterwards I did an integration using an antiquated version of
| BizTalk, which was all based on COM and to this day remains my
| most loathed software _ever_. It was way too complex, the UI
| was crap, it was flakey as hell, and ridiculously slow. Later
| versions got better, but it was a very low bar.
|
| Later I used a Java-based Apache project to do an integration -
| I forget the name. It was actually really good, easy to use but
| also allowing full control.
|
| Later still, I did some integrations using custom C# code based
| around WWF (Windows Workflow Foundation) - the UI in Visual
| Studio was horrendously slow and flakey, it was too complex to
| build your own pipeline segments, and difficult to diagnose
| issues in production. But it did work fairly well for simple
| stuff.
|
| And then later still I used PowerCenter - I was the architect,
| so wasn't that hands on, but the PowerCenter guys liked it, and
| we ended up adopting it as our standard integrations tool, with
| hundreds of integrations across the company.
| meepmorp wrote:
| Was it Apache Camel? I used that a while back and liked how
| low hassle it was.
| GordonS wrote:
| Hmm, name doesn't sound familiar, it was a very long time
| ago tho.
| taylodl wrote:
| If I'm reading this correctly the crux of the argument appears to
| be if your serverless database engine provides cross-region data
| replication and consistency _and_ provides a GraphQL interface
| then applications can use GraphQL to go directly to the database,
| thus solving a lot of the problems we endured in the past when
| using SQL to go to the database. It 's an interesting idea that
| at first glance appears to be worth looking into further, though
| I still have a uneasy feeling because I remember all the pain in
| the past!
| [deleted]
| contingencies wrote:
| Use the right tool for the job. What is the most deployed
| database? SQLite3. In what pattern is it most often deployed? One
| language from one environment accessing one database with full
| read/write. Low call volume, high data complexity, and embedded
| (tightly application-coupled) use. This is observably _the_
| normal use case for a database and the simplest mode of
| implementation. Problems occur when software people start solving
| problems that don 't exist: typically performance, future
| scalability, dubious theories of security and future
| language/database migrations. KISS.
| rossdavidh wrote:
| Having worked on old systems that used this approach, I recall
| that it did have a lot of pluses. Now, for the minuses:
|
| 1) changing the language you use in a server (say, from PHP to
| Python) is big, but changing the language you use in your
| database (from SQL to anything else) is even more intimidating.
| If you are integrating through the database, this limitation
| matters more.
|
| 2) you need to have DBA's who are not only highly competent, but
| also have good people skills, since they will often be saying
| "no", or at least "not that way", and if they don't know how to
| do that in a constructive manner then it becomes a net
| productivity drain. Fortunately, where I worked that used this
| pattern, the DBA's had exceptional people skills as well as
| technical skills. This is, I am led to believe, not always the
| case.
| [deleted]
| nescioquid wrote:
| While they quickly acknowledge that integrating by means of a
| database is widely regarded as an anti-pattern, the rest of the
| article doesn't really address why this wouldn't be an anti-
| pattern, other than to pretend that the major reasons for
| avoiding this pattern are deployment complexity and furnishing
| multi-region services.
|
| > A customer pattern we see solves this problem, and it's the
| integration database pattern. This pattern went out of style in
| the 90s along with Oracle servers in the closet and dedicated
| DBAs, but with technological advances like scalable multi-region
| transactions and serverless computing resources, its utility has
| returned.
|
| Considering they are flogging a product, this feels especially
| dishonest to me.
| mumblemumble wrote:
| Also not the author, but I have seen some success here when the
| database is used as another way to build a service with an API,
| and not just a simple data store. What that means in practice
| is that applications interact exclusively through stored
| procedures, which serve as a sort of RPC API.
|
| The big downside of this approach are that the human factors
| are tricky nowadays. Developers expect to be allowed to poke at
| the tables directly, and don't necessarily take kindly to
| alternative ways of framing things. Especially if they've been
| raised on a diet of code-first object relational mapping. And
| there's not really a great way to enforce this boundary,
| because, unlike HTTP, an ODBC session will generally let you
| pretty much anything you want to the remote service's
| implementation details.
| thruflo22 wrote:
| Not related to the original article but I had a go at
| articulating the rationale here:
| https://hexdocs.pm/integratedb/intro.html
|
| (This project isn't ready for prime time but it was a sketch of
| an approach to mitigating the downsides of integrating through
| the database, in order to unlock the upsides).
| orev wrote:
| The contempt towards the "old ways" in this piece is really
| irksome, especially since those "old ways" are perfectly
| acceptable for 95% of applications (you only need this hyper
| scale stuff if you're running a MAANG level app). The rest of us
| still doing things the "old way" are perfectly happy to have
| simple systems that run reliably, as we sit and watch the
| complete train wreck of complexity everyone else is building
| using these "new ways" that are supposedly better.
| chrisjc wrote:
| While I see your point, it really felt like they were trying to
| empathize with "new" developers that might feel that way about
| doing things the "old way"... and that they may want to
| reconsider those feelings now that there are new technologies
| and services to overcome the limitations that existed in the
| "olden times".
| robsutter wrote:
| I actually think this piece aligns with your thinking. Using a
| single, strongly consistent data store to eliminate the "train
| wreck of complexity" allows for a better developer experience
| while still delivering a responsive user experience. That
| matters at all kinds of scale, not just hyper.
| iwintermute wrote:
| I know it's off-topic but maybe "MAANG level app" >> "MANGA" if
| we want go there?
| 6gvONxR4sf7o wrote:
| If you're going to rename F to M, why not rename G to A?
| pred_ wrote:
| But if F->M, then why not G->A? MAAAN.
| jayd16 wrote:
| Referring to the tech powers that be as "The MAAAN" is
| pretty fitting.
| BoxOfRain wrote:
| I like this, I suppose under this new acronym cutting
| Google and Facebook out of your life would be "sticking
| it to the MAAAN".
___________________________________________________________________
(page generated 2021-11-04 23:01 UTC)