[HN Gopher] Reducing complexity by integrating through the database
       ___________________________________________________________________
        
       Reducing complexity by integrating through the database
        
       Author : evanweaver
       Score  : 59 points
       Date   : 2021-11-03 12:54 UTC (1 days ago)
        
 (HTM) web link (fauna.com)
 (TXT) w3m dump (fauna.com)
        
       | tlarkworthy wrote:
       | At Firebase it was sometimes called the client-database-server
       | architecture. The pattern was documented in 2013 [1]
       | 
       | If you use Firebase as an ephemeral message bus it's a great
       | pattern. It has problems if you use it like a traditional
       | database because migrations are very tricky. DBs that support
       | views (or GraphQL) can make migrations much easier
       | 
       | [1] https://firebase.googleblog.com/2013/03/where-does-
       | firebase-...
        
       | DeathArrow wrote:
       | I wouldn't integrate through a database but I wouldn't refuse to
       | integrate through a distributed cache.
       | 
       | Anyway using REST for inter services communication decreases
       | performance and increases latency and 99% of projects still do
       | it.
        
         | evanweaver wrote:
         | Curious what makes a cache better to you.
        
           | DeathArrow wrote:
           | It would be faster and I still get to keep microservices from
           | accessing the data they don't need or shouldn't access.
           | 
           | But I rather use RPC for communication.
        
       | mdoms wrote:
       | I think anyone with experience on this type of system will recoil
       | in horror at the thought of building it intentionally today.
        
         | cogman10 wrote:
         | Tell me about it. This is the current system I maintain. While
         | new products have (mostly) started using non-integrated DBs,
         | all the legacy systems we have use a giant integrated db.
         | 
         | It SUCKS.
         | 
         | Let me count the ways.
         | 
         | - You can't update a table schema without updating and
         | deploying a bunch of applications in sync (even if that's part
         | of a stored proc). Which for something like this, means that
         | getting out of this situation is WAY harder than getting into
         | it.
         | 
         | - You end up putting WAY too much logic into the DB which makes
         | it hard to ultimately figure out WHAT is supposed to happen
         | 
         | - DBs have TERRIBLE development stories. That's ultimately
         | because the code and data all live in the same place and you
         | can update code without any sort of revision control to help
         | you understand or see a change that's been made to db schema
         | (Forcing a bunch of painful process around updating DB
         | capabilities).
         | 
         | - DBs are resource bottlenecks that SUCK to figure out how to
         | scale out. Putting a bunch of apps into one DB complicates that
         | process. Scaling a single app in a single DB is simply WAY
         | easier.
         | 
         | - At least my db (but I assume a bunch of other DBs) have
         | really crappy performance diagnostic tools. Further, the more
         | complicated the queries against it, the more likely you are to
         | go from "Hey, stats are making things fast" to "OMG, why is
         | this thing taking 10 seconds to run now!". It's really bad when
         | the only solution that seems to fix things is dumping stats.
         | 
         | I could MAYBE see something like this for a macro service
         | dedicated to a domain, but I'd never build a complex system
         | like this from scratch. Colocating apps in the same DB would
         | have to be for some crazy performance reasons why bypassing a
         | microservice makes sense. An exception, not the regular course
         | of action (And I'd still hate it :) )
        
           | evanweaver wrote:
           | I mean, yeah, this is why people stopped using this pattern.
           | But these problems are getting solved, especially in Fauna:
           | 
           | 1. Schemaless/document/schema-on-need databases like Fauna
           | don't mandate the application breakage on every change that
           | SQL does
           | 
           | 2. It's hard to reason about if its not transparent, but it
           | can be transparent now, see below
           | 
           | 3. Fauna is a temporal database, which acts like version
           | control on your stored procedures, so you can easily check
           | and revert any change
           | 
           | 4. Fauna is serverless and horizontally scalable without
           | consistency/latency impact
           | 
           | 5. This was definitely a problem when you were occupying
           | precious CPU cores on a vertically scaled RDBMS with business
           | logic, but compute in Fauna or in serverless lambdas scales
           | horizontally indefinitely
        
       | gunnarmorling wrote:
       | No matter how you integrate different applications, be it via
       | APIs, messaging, or a database, it's vital to separate your
       | application's internal data model from models which it exposes.
       | If you don't do that, you're in for a never-ending story of
       | upstream services unknowingly breaking downstream services, or
       | upstream services not being able to evolve in any meaningful way.
       | 
       | So if they mean directly exposing a service's data model from the
       | database to other services, I'm very skeptical. If they mean
       | providing that access by means of some abstraction, e.g. database
       | views, it can be an option in some cases.
       | 
       | You'll still loose lots of flexibility you'd gain by putting some
       | middleware in between, e.g. ability to scale out compute
       | independently from storage, ability to merge data from multiple
       | sources into one API response, ability to implement arbitrarily
       | complex business logic e.g. in Java, etc.
        
         | dennisy wrote:
         | I agree, but would you mind providing a concrete example of:
         | 
         | > No matter how you integrate different applications, be it via
         | APIs, messaging, or a database, it's vital to separate your
         | application's internal data model from models which it exposes.
         | 
         | Especially in respect to this product.
        
           | treeman79 wrote:
           | A field you rely on changed form integer to string. Or goes
           | away. Or was unique but a duplicate shows up. Or the database
           | disappears for a few minutes every hour.
           | 
           | S
        
           | dennisy wrote:
           | Interesting read, but seems like a tangential topic?
        
           | joshuanapoli wrote:
           | Over time, the internal data model might need to change to
           | more accurately model the world. For consistency and
           | efficiency, we usually do not want to maintain the
           | combination of the original low-fidelity model and new high-
           | fidelity model within the internal database.
           | 
           | The internal data model might need to be reorganized to
           | improve the efficiency of part of the internal application.
           | 
           | A client might not need a higher fidelity model. The internal
           | reorganization of the data model might not be pertinent to
           | the client's interface. So we normally have some data-mapping
           | in the application to help provide a stable interface for
           | clients.
           | 
           | It's possible to providing these compatibility mappings
           | within the database through views, but this is usually
           | considered to be harder to control, test and scale.
           | 
           | Maybe in a big application, the application-layer mapping and
           | caching eventually get complicated enough to be something
           | like a custom-made database. And so we might end up with an
           | "integration database" but call it something different.
        
             | dennisy wrote:
             | Great thanks.
             | 
             | So we have a DB and a service in front of it. The DB gets a
             | new schema, but the service maps to the previous
             | representation in such a way that clients do not need to
             | know about the new schema. Is that the gist?
             | 
             | Then when it comes to using Fauna, what is it that does not
             | allow such a data flow?
        
           | ttymck wrote:
           | Are you asking for an example of separating the data models
           | or an example of what happens when you don't.
           | 
           | This is a good example of what separation _enables you to do_
           | :
           | 
           | https://www.troyhunt.com/your-api-versioning-is-wrong-
           | which-...
           | 
           | (it is presented from the perspective of _how_ to version an
           | API, but the examples of _what_ it looks like are there)
        
       | evanweaver wrote:
       | Stored procedures and the integration database have come back for
       | our users in a big way. It would be great to hear examples of how
       | others are applying this pattern with other databases and APIs.
        
         | chrisjc wrote:
         | Since you're the one making the claim that they're making a
         | come back, I'd love to hear your own personal story or some of
         | the stories you've heard.
         | 
         | Personally I haven't noticed anything resembling a come back,
         | but I'm certainly using them more than ever... and I'm loving
         | it.
        
         | robsutter wrote:
         | My favorite thing about this pattern is that it allows
         | developers to build globally distributed applications while
         | reasoning about them like a single, centralized monolith.
         | 
         | A lot of complicated practices that we've adopted to overcome
         | the challenges of distributed systems just disappear when you
         | have fast, strongly consistent transactional writes on a single
         | data store. Either a thing happened or it didn't. No need to
         | wait around and see.
         | 
         | This matters even more as applications move from single
         | failover to multi-region to edge computing with hundreds of
         | locations of compute. How do you get consistent state out to
         | each PoP?
         | 
         | You don't, you integrate through the database.
        
       | dennisy wrote:
       | The article is provocative!
       | 
       | But I do not see why the product itself is different to
       | serverless DBs such as SupaBase and PlanetScale?
        
       | rc_hackernews wrote:
       | My first two programming jobs out of college back in the early
       | 2010's took the approach in this article. Albeit, with older
       | technology.
       | 
       | This is bringing back old (bad) memories of the times where I was
       | debugging stored procedures that called triggers that called the
       | same stored procedures.
       | 
       | A lot of that was due to poor design and bad choices. Some of
       | that was due to developers trying to fit processes and patterns
       | into a language, i.e. SQL, that lacked the expressiveness for it.
       | 
       | I'm not a fan of this approach to say the least.
       | 
       | I'd be interested in hearing other's experiences with this
       | though.
       | 
       | And I'm going to check out Fauna since it looks like a cool
       | database and to see if anything has changed with this approach
       | since I encountered it almost a decade ago.
        
         | robsutter wrote:
         | I remember those days too - my sins were T-SQL and I remember
         | those headaches quite clearly.
         | 
         | One huge difference today is that you can actually unit-test
         | user-defined functions and stored procedures using the same
         | tooling you use for your application code. That lends itself
         | naturally to integration testing, and so on.
         | 
         | I wouldn't necessarily recommend taking it to the property-
         | based testing extreme like I did just to see whether it worked
         | (it does).
        
           | chrisjc wrote:
           | > you can actually unit-test user-defined functions and
           | stored procedures using the same tooling you use for your
           | application code
           | 
           | Can you give an example of how this is possible, or what
           | tools/services it's possible with? I can understand how it
           | would work on UDFs, but not entirely sure how it would be
           | possible for stored procedures, esp when they're often times
           | doing something more complex than simple read operations.
        
             | robsutter wrote:
             | Sure - here's an example using Jest to test the logic in
             | UDFs: https://github.com/fauna-labs/fql-
             | utilities/blob/main/tests/...
             | 
             | You can use Jest's before[All] and after[All] to do any
             | required setup and teardown per usual.
             | 
             | For Fauna, UDFs and stored procedures are synonymous.
        
         | GordonS wrote:
         | I remember doing the same almost 30 years back, copying data
         | from an Oracle database to an MSSQL database.
         | 
         | It was a fairly limiting approach, performance wasn't great,
         | and diagnosing problems was a total PITA.
         | 
         | Afterwards I did an integration using an antiquated version of
         | BizTalk, which was all based on COM and to this day remains my
         | most loathed software _ever_. It was way too complex, the UI
         | was crap, it was flakey as hell, and ridiculously slow. Later
         | versions got better, but it was a very low bar.
         | 
         | Later I used a Java-based Apache project to do an integration -
         | I forget the name. It was actually really good, easy to use but
         | also allowing full control.
         | 
         | Later still, I did some integrations using custom C# code based
         | around WWF (Windows Workflow Foundation) - the UI in Visual
         | Studio was horrendously slow and flakey, it was too complex to
         | build your own pipeline segments, and difficult to diagnose
         | issues in production. But it did work fairly well for simple
         | stuff.
         | 
         | And then later still I used PowerCenter - I was the architect,
         | so wasn't that hands on, but the PowerCenter guys liked it, and
         | we ended up adopting it as our standard integrations tool, with
         | hundreds of integrations across the company.
        
           | meepmorp wrote:
           | Was it Apache Camel? I used that a while back and liked how
           | low hassle it was.
        
             | GordonS wrote:
             | Hmm, name doesn't sound familiar, it was a very long time
             | ago tho.
        
       | taylodl wrote:
       | If I'm reading this correctly the crux of the argument appears to
       | be if your serverless database engine provides cross-region data
       | replication and consistency _and_ provides a GraphQL interface
       | then applications can use GraphQL to go directly to the database,
       | thus solving a lot of the problems we endured in the past when
       | using SQL to go to the database. It 's an interesting idea that
       | at first glance appears to be worth looking into further, though
       | I still have a uneasy feeling because I remember all the pain in
       | the past!
        
       | [deleted]
        
       | contingencies wrote:
       | Use the right tool for the job. What is the most deployed
       | database? SQLite3. In what pattern is it most often deployed? One
       | language from one environment accessing one database with full
       | read/write. Low call volume, high data complexity, and embedded
       | (tightly application-coupled) use. This is observably _the_
       | normal use case for a database and the simplest mode of
       | implementation. Problems occur when software people start solving
       | problems that don 't exist: typically performance, future
       | scalability, dubious theories of security and future
       | language/database migrations. KISS.
        
       | rossdavidh wrote:
       | Having worked on old systems that used this approach, I recall
       | that it did have a lot of pluses. Now, for the minuses:
       | 
       | 1) changing the language you use in a server (say, from PHP to
       | Python) is big, but changing the language you use in your
       | database (from SQL to anything else) is even more intimidating.
       | If you are integrating through the database, this limitation
       | matters more.
       | 
       | 2) you need to have DBA's who are not only highly competent, but
       | also have good people skills, since they will often be saying
       | "no", or at least "not that way", and if they don't know how to
       | do that in a constructive manner then it becomes a net
       | productivity drain. Fortunately, where I worked that used this
       | pattern, the DBA's had exceptional people skills as well as
       | technical skills. This is, I am led to believe, not always the
       | case.
        
       | [deleted]
        
       | nescioquid wrote:
       | While they quickly acknowledge that integrating by means of a
       | database is widely regarded as an anti-pattern, the rest of the
       | article doesn't really address why this wouldn't be an anti-
       | pattern, other than to pretend that the major reasons for
       | avoiding this pattern are deployment complexity and furnishing
       | multi-region services.
       | 
       | > A customer pattern we see solves this problem, and it's the
       | integration database pattern. This pattern went out of style in
       | the 90s along with Oracle servers in the closet and dedicated
       | DBAs, but with technological advances like scalable multi-region
       | transactions and serverless computing resources, its utility has
       | returned.
       | 
       | Considering they are flogging a product, this feels especially
       | dishonest to me.
        
         | mumblemumble wrote:
         | Also not the author, but I have seen some success here when the
         | database is used as another way to build a service with an API,
         | and not just a simple data store. What that means in practice
         | is that applications interact exclusively through stored
         | procedures, which serve as a sort of RPC API.
         | 
         | The big downside of this approach are that the human factors
         | are tricky nowadays. Developers expect to be allowed to poke at
         | the tables directly, and don't necessarily take kindly to
         | alternative ways of framing things. Especially if they've been
         | raised on a diet of code-first object relational mapping. And
         | there's not really a great way to enforce this boundary,
         | because, unlike HTTP, an ODBC session will generally let you
         | pretty much anything you want to the remote service's
         | implementation details.
        
         | thruflo22 wrote:
         | Not related to the original article but I had a go at
         | articulating the rationale here:
         | https://hexdocs.pm/integratedb/intro.html
         | 
         | (This project isn't ready for prime time but it was a sketch of
         | an approach to mitigating the downsides of integrating through
         | the database, in order to unlock the upsides).
        
       | orev wrote:
       | The contempt towards the "old ways" in this piece is really
       | irksome, especially since those "old ways" are perfectly
       | acceptable for 95% of applications (you only need this hyper
       | scale stuff if you're running a MAANG level app). The rest of us
       | still doing things the "old way" are perfectly happy to have
       | simple systems that run reliably, as we sit and watch the
       | complete train wreck of complexity everyone else is building
       | using these "new ways" that are supposedly better.
        
         | chrisjc wrote:
         | While I see your point, it really felt like they were trying to
         | empathize with "new" developers that might feel that way about
         | doing things the "old way"... and that they may want to
         | reconsider those feelings now that there are new technologies
         | and services to overcome the limitations that existed in the
         | "olden times".
        
         | robsutter wrote:
         | I actually think this piece aligns with your thinking. Using a
         | single, strongly consistent data store to eliminate the "train
         | wreck of complexity" allows for a better developer experience
         | while still delivering a responsive user experience. That
         | matters at all kinds of scale, not just hyper.
        
         | iwintermute wrote:
         | I know it's off-topic but maybe "MAANG level app" >> "MANGA" if
         | we want go there?
        
           | 6gvONxR4sf7o wrote:
           | If you're going to rename F to M, why not rename G to A?
        
           | pred_ wrote:
           | But if F->M, then why not G->A? MAAAN.
        
             | jayd16 wrote:
             | Referring to the tech powers that be as "The MAAAN" is
             | pretty fitting.
        
               | BoxOfRain wrote:
               | I like this, I suppose under this new acronym cutting
               | Google and Facebook out of your life would be "sticking
               | it to the MAAAN".
        
       ___________________________________________________________________
       (page generated 2021-11-04 23:01 UTC)