[HN Gopher] Distributed transactions in Go: Read before you try
___________________________________________________________________
Distributed transactions in Go: Read before you try
Author : roblaszczak
Score : 91 points
Date : 2024-10-02 14:10 UTC (4 days ago)
(HTM) web link (threedots.tech)
(TXT) w3m dump (threedots.tech)
| p10jkle wrote:
| This is also a good use case for durable execution, see eg
| https://restate.dev
| liampulles wrote:
| This does a good job of encapsulating the considerations of an
| event driven system.
|
| I think the author is a little too easily dismissive of sagas
| though - for starters, an event driven system is also still going
| to need to deal with compensating actions, its just going to
| result in a larger set of events that various systems potentially
| need to handle.
|
| The virtue of a saga or some command driven orchestration
| approach is that the story of what happens when a user does X is
| plainly visible. The ability to dictate that upfront and figure
| it out easily later when diagnosing issues cannot be understated.
| relistan wrote:
| This is a good summary of building an evented system. Having
| built and run one that scaled up to 130 services and nearly 60
| engineers, I can say this solves a lot of problems. Our
| implementation was a bit different but in a similar vein. When
| the company didn't do well in the market and scaled down, 9
| engineers were (are) able to operate almost all of that same
| system. The decoupling and lack of synchronous dependencies means
| failures are fairly contained and easy to rectify. Replays can
| fix almost anything after the fact. Scheduled daily replays
| prevent drift across the system, helping guarantee that things
| are consistent... eventually.
| latentsea wrote:
| I would really hate to join a team of 9 engineers that owned
| 130 services.
| physicsguy wrote:
| > And if your product backlog is full and people who designed the
| microservices are still around, it's unlikely to happen.
|
| Oh man I feel this
| ekidd wrote:
| Let's assume you're not a FAANG, and you don't have a billion
| customers.
|
| If you're gluing microservices together using distributed
| transactions (or durable event queues plus eventual consistency,
| or whatever), the odds are good that you've gone far down the
| wrong path.
|
| For many applications, it's easiest to start with a modular
| monolith talking to a shared database, one that natively supports
| transactions. When this becomes too expensive to scale, the next
| step may be sharding your backend. (It depends on whether you
| have a system where users mostly live in their silos, or where
| everyone talks to everyone. If your users are siloed, you can
| shard at almost any scale.)
|
| Microservices make sense when they're "natural". A video encoder
| makes a great microservice. So does a map tile generator.
|
| Distributed systems are _expensive_ and complicated, and they
| kill your team 's development velocity. I've built several of
| them. Sometimes, they turned out to be serious mistakes.
|
| As a rule of thumb:
|
| 1. Design for 10x your current scale, not 1000x. 10x your scale
| allows for 3 consecutive years of 100% growth before you need to
| rebuild. Designing for 1000x your scale usually means you're
| sacrificing development velocity to cosplay as a FAANG.
|
| 2. You will want transactions in places that you didn't expect.
|
| 3. If you need transactions between two microservices, strongly
| consider merging them and having them talk to the same database.
|
| Sometimes you'll have no better choice than to use distributed
| transactions or durable event queues. They are inherent in some
| problems. But they should be treated as a giant flashing "danger"
| sign.
| hggigg wrote:
| I would add that just because you add these things it does not
| mean you _can_ scale afterwards. All microservices
| implementations I 've seen so far are bolted on top of some
| existing layer of mud and serve only to make function calls
| that were inside processes run over the network with added
| latency and other overheads. The end game is the aggregate
| latency and cost increases only with no functional scalability
| improvements.
|
| Various engineering leads, happy with what went on their
| resume, leave and tell everyone how they increased scalability.
| And persuade another generation of failures.
| xyzzy_plugh wrote:
| I frequently see folks fail to understand that when the unicorn
| rocketship spends a month and ten of their hundreds of
| engineers replacing their sharded mysql from setting ablaze
| daily due to overwhelming load, it is actually pretty close to
| the correct time for that work. Sure it may have been
| stressful, and customers may have been impacted, but it's a
| good problem to have. Conversely not having that problem maybe
| doesn't really mean anything at all, but there's a good chance
| it means you were solving these scaling problems prematurely.
|
| It's a balancing act, but putting out the fires before they
| even begin is often the wrong approach. Often a little fire is
| good for growth.
| AmericanChopper wrote:
| You really have to be doing huge levels of throughput before
| you start to struggle with scaling MySQL or Postgres. There's
| really not many workloads that actually require strict ACID
| guarantees _and_ produce that level of throughput. 10-20
| years ago I was running hundreds to thousands of transactions
| per second on beefy Oracle and Postgres instances, and the
| workloads had to be especially big before we'd even consider
| any fancy scaling strategies to be necessary, and there
| wasn't some magic tipping point where we'd decide that some
| instance had to go distributed all of a sudden.
|
| Most of the distributed architectures I've seen have been led
| by engineers needs (to do something popular or interesting)
| rather than an actual product need, and most of them have had
| issues relating to poor attempts to replicate ACID
| functionality. If you're really at the scale where you're
| going to benefit from a distributed architecture, the chances
| are eventual consistency will do just fine.
| MuffinFlavored wrote:
| > For many applications, it's easiest to start with a modular
| monolith talking to a shared database, one that natively
| supports transactions.
|
| I don't think this handles "what if my app is a wrapper on
| external APIs and my own database".
|
| You don't get automatic rollbacks with API calls the same way
| you do database transactions. What to do then?
| natdempk wrote:
| You have a distributed system on your hands at that point so
| you need idempotent processes + reconciliation/eventual
| consistency. Basically thinking a lot about failure,
| resyncing data/state, patterns like transactional outboxes,
| durable queues, two-phase commit, etc etc. It just quickly
| gets into the specifics of your task/system/APIs so hard to
| give general advice. Most apps do not solve these problems
| super well for a long time unless they are in critical places
| like billing, and even then it might just mean weird repair
| jobs, manual APIs to resync stuff, audits, etc. Usually an
| event-bus/queue or related DB table for the idempotent work +
| some async process validating that table can go a long way
| though.
| gustavoamigo wrote:
| I agree with start with a monolith and a shared database. I've
| done that in the past quite successfully. I would just add that
| if scaling becomes an issue, I wouldn't consider sharding my
| first option, it's more of a last resort. I would prefer
| scaling vertically the shared database and optimizing it as
| much as possible. Also, another strategy I've adopted was
| avoiding doing `JOIN` or `ORDER BY`, as they stress your
| database precious CPU and IO. `JOIN` also adds coupling between
| tables, which I find hard to refactor once done.
| vbezhenar wrote:
| I don't understand how do you avoid JOIN and ORDER BY?
|
| Well, with ORDER BY, if your result set is not huge, sure,
| you can just sort it on the client side. Although sorting 100
| rows on database side isn't expensive. But if you need, say,
| latest 500 records out of million (very frequent use-case),
| you have to sort it on the database side. Also with proper
| indices, database sometimes can avoid any explicit sort.
|
| Do you just prefer to duplicate everything in every table
| instead of JOINing them? I did some denormalization to
| improve performance, but that was more like the last thing I
| would do, if there's no other recourse, because it makes it
| very possible that database will contain logically
| inconsistent data and it causes lots of headache. Fixing bugs
| in software is easier. Fixing bugs in data is hard and
| requires lots of analytic work and sometimes manual work.
| danenania wrote:
| I think a better maxim would be to never have an _un-
| indexed_ ORDER BY or JOIN.
|
| A big part of what many "nosql" databases that prioritize
| scale are doing is simply preventing you from ever running
| an adhoc un-indexed query.
| vbezhenar wrote:
| I wonder if anyone tried long-lasting transactions passed
| between services?
|
| Like imagine you have a postgres pooler with API so you can use
| one postgres connection between several applications. Now you
| can start the transaction in one application, pass its ID to
| another application and commit it there.
|
| Implement queues using postgres, use the same transaction for
| both business data and queue operations and some things will
| become easier.
| natdempk wrote:
| Pretty great advice!
|
| I think the one thing you can run into that is hard is once you
| want to support different datasets that fall outside the scope
| of a transaction (think events/search/derived-data, anything
| that needs to read/write to a system that is not your primary
| transactional DB) you probably do want some sort of event
| bus/queue type thing to get eventual consistency across all the
| things. Otherwise you just end up in impossible situations when
| you try to manage things like doing a DB write + ES document
| update. Something has to fail and then your state is desynced
| across datastores and you're in velocity/bug hell. The other
| side of this though is once you introduce the event bus and
| transactional-outbox or whatever, you then have a problem of
| writes/updates happening and not being reflected immediately. I
| think the best things that solve this problem are stuff like
| Meta's TAO that combines these concepts, but no idea what is
| available to the mere mortals/startups to best solve these
| types of problems. Would love to know if anyone has killer
| recommendations here.
| ljm wrote:
| I think the question is if you need the entire system to be
| strongly consistent, or just the core of it?
|
| To use ElasticSearch as an example: do you need to add the
| complexity of keeping the index up to date in realtime, or
| can you live with periodic updates for search or a background
| job for it?
|
| As long as your primary DB is the source of truth, you can
| use that to bring other less critical stores up to date
| outside of the context of an API request.
| natdempk wrote:
| Well, the problem you run into is that you kind of want
| different datastores for different use-cases. For example
| search vs. specific page loads, and you want to try and
| make both of those consistent, but you don't have a single
| DB that can serve both use-cases (often times primary DB +
| ElasticSearch for example). If you don't keep them
| consistent, you have user-facing bugs where a user can
| update a record but not search for it immediately, or if
| you try to load everything from ES to provide consistent
| views to a user, then updates can disappear on refresh. Or
| if you try to write to both SQL + ES in an API request,
| they can desync on failure writing to one or the other. The
| problem is even less the complexity of keeping the index up
| to date in realtime, and more that the ES index isn't even
| consistent with the primary DB, and to a user they are just
| different parts of your app that kinda seem a little broken
| in subtle ways inconsistently. It would be great to be able
| to have everything present a consistent view to users, that
| updates together on-write.
| misiek08 wrote:
| The way I solved it once was trying to update ES
| synchronously and if it failed or timeouted - queue event
| to index the doc. Timeout wasn't an issue, because double
| update wasn't harmful.
| jrockway wrote:
| I agree with this.
|
| Having worked at FAANG, I'm always excited by the fanciness
| that is required. Spanner is the best database I've ever used.
|
| That said, my feeling is that in the real world, you just pick
| Postgres and forget about it. Let's Encrypt issues every TLS
| cert on the Internet with one beefy Postgres database.
| Computers are HUGE these days. By the time a 128 core machine
| isn't good enough for your app, you will have sold your shares
| and will be living on your own private island or whatever. If
| you want to wrap sqlite in raft over the weekend for some fun,
| sure, do that. But don't put it in prod.
| Andys wrote:
| Agreed - A place I worked needed 24/7 uptime for financial
| transactions, and we still managed to keep scaling a standard
| MySQL database, despite it getting hammered, over the course
| of 10 years up to something like 64 cores and 384GB of RAM on
| a large EC2 instance.
|
| We did have to move reporting off to a non-SQL solution
| because it was too slow to do in realtime, but that was a
| decision based on evidence of the need.
| devjab wrote:
| I disagree rather strongly with this advice. Mostly because
| I've spent almost a decade earning rather lucrative money on
| cleaning up after companies and organisations which did it.
| Part of what you say is really good advice, if you're not
| Facebook then don't build your infrastructure as though you
| were. I think it's always a good idea to remind yourself that
| StackOverflow ran on a few IIS servers for a long while doing
| exactly what you're recommending that people do. (Well almost
| anyway).
|
| Using a single database always ends up being a mess. Ok, I
| shouldn't say always because it's technically possible for it
| not to happen. I've just never seen the OOP people not utterly
| fuck up the complexity in their models. It gets even worse when
| they've decided to use stores procedures or some magical ORM
| which not everyone understood the underlying workings of. I
| think you should definitely separate your data as much as
| possible. Even small scale companies will quickly struggle
| scaling their DBs if they don't, and it'll quickly become
| absolutely horrible if you have to remove parts of your
| business. Maybe they are unseeded, maybe they get sold off,
| whatever it is. With that said, however, I think you're
| completely correct about not doing distributed transactions. I
| think that both you and the author are completely right that if
| you're doing this, then you're building complexity you should
| be building until you're Facebook (or maybe when you're almost
| Facebook).
|
| A good micro-service is one that can live in total isolation.
| It'll full-fill the need of a specific business domain, and it
| should contain all the data for this. If that leaves you with a
| monolith and a single shared database, then that is perfectly
| fine. If you can split it up. Say you have solar plants which
| are owned by companies but as far as the business goes a solar
| plant and a company can operate completely independently, then
| you should absolutely build them as two services. If you don't,
| then you're going start building your mess once you need to add
| wind plants or something different. Do note that I said that
| this depends on the business needs. If something like
| individual banking accounts of company owners is relevant to
| the greenfield workers and asset managers, then you probably
| can't split up solar plants and companies. Keeping things
| separate like this will also help you immensely as you add on
| business intelligence and analytics.
|
| If you keep everything in a single "model store", then you're
| eventually going end up with "oh, only John knows what that
| data does" while needing to pay someone like me a ridiculous
| amount of money to help your IT department get to a point where
| they are no longer hindering your company growth. Again, I'm
| sure this doesn't have to be the case and I probably should
| just advise people to do exactly as you say. In my experience
| it's an imperfect world and unless you keep things as simple as
| possible with as few abstractions as possible then you're going
| to end up with a mess.
| mamcx wrote:
| All that you say is true, and people who do that are THE LESS
| capable of becoming better at the MUCH harder challenges of
| microservices.
|
| I work in the ERP space and interact with dozens and I see
| the _horrors_ that some only know as fair tales.
|
| Without exception, staying in an RDBMS is the best option of
| all. I have seen the _cosmical horrors_ of what people that
| struggle with rdbms do when moved to nosql and such, and is
| always much worse than before.
|
| And all that you say, that is true, hide the real (practical)
| solution: Learn how to use the RDBMS, use SQL, remove
| complexity, and maybe put the thing in a bigger box.
|
| All that is symptoms that are barely related to the use of a
| single database.
| ekidd wrote:
| I wanted to respond to you, because you had some excellent
| points.
|
| > _Mostly because I've spent almost a decade earning rather
| lucrative money on cleaning up after companies and
| organisations which did it._
|
| For many companies, this is actually a pretty successful
| outcome. They built an app, they earned a pile of money, they
| kept adding customers, and now they have a mess. But they can
| afford to pay you to fix their mess!
|
| My rule of thumb of "design for 10x scale" is intended to be
| used iteratively. When something is slow and miserable, take
| the current demand, multiply it by 10, and design something
| that can handle it. Sometimes, yeah, this means you need to
| split stuff up or use a non-SQL database. But at least it's a
| real need at that point. And there's no substitute for
| engineering knowledge and good taste.
|
| But as other people have pointed out, people who can't use an
| RDBMS correctly are going to have a bad time implementing
| distributed transactions across microservices.
|
| So I'm going to stick with my advice to start with a single
| database, and to only pull things out when there's a clear
| need to scale something.
| foobiekr wrote:
| Great advice. Microservices also open the door to polyglot, so
| you lose the ability to even arrive that everyone uses/has
| access to/understands the things in a common libCompany that
| make it possible for anyone to at least make sense of code.
|
| When I talk to people who did microservices, I ask them "why is
| this a service separate from this?"
|
| I have legitimately - and commonly - gotten the answer that the
| dev I'm talking to wanted their own service.
|
| It's malpractice.
| danenania wrote:
| > Microservices also open the door to polyglot
|
| While I see your point about the downsides of involving too
| many languages/technologies, I think the really key
| distinction is whether a service has its own separate
| database.
|
| It's really not such a big problem to have "microservices"
| that share the same database. This can bring many of the
| benefits of microservices without most of the downsides.
|
| Imo it would be good if we had some common terminology to
| distinguish these approaches. It seems like a lot of people
| are creating services with their own separate databases for
| no reason other than "that's how you're supposed to do
| microservices".
| jayd16 wrote:
| Sure sure sure
|
| ...but micro services is used as a people organization
| technique.
|
| Once you're there you'll run into a situation where you'll have
| to do transactions. Might as well get good at it.
| mekoka wrote:
| _> 1. Design for 10x your current scale, not 1000x._
|
| I'd even say that the advice counts double for early stage
| startups. That is, at that scale, it should be _design for 5x_.
|
| You could spend years building a well architected, multi-
| tenanted, microserviced system, whose focus on sound
| engineering is actually distracting your team from building
| core solutions that address your clients' real problems _now_.
| Or, you could instead redirect that focus on first solving
| those immediate problems with simplistic, suboptimal, but valid
| engineering.
|
| An early stage solopreneur could literally just clone or
| copy/paste/configure their monolith in a new directory and
| spawn a new database every time they have a new client. They
| could literally do this for their first 20+ clients in their
| first year. When I say this, some people look at me in
| disbelief. Some of them, having yet to make their first sale.
| Instead, they're working on solutions to counter the
| _anticipated_ scalability issues they 'll have in two years,
| when they finally start to sell and become a huge success.
|
| For another few, copy/paste/createdb seems like a stroke of
| genius. But I'm not a genius, I'm just oldish. Many companies
| did basically this 20 years ago and it worked fine. The reason
| it's not even considered anymore seems to be a cultural
| amnesia/insanity that's made certain practices arcane, if not
| taboo altogether. So we tend to spontaneously reach for the
| nuclear reactor, when a few pieces of coal would suffice to
| fuel our current momentum.
| pkhuong wrote:
| > spawn a new database every time they have a new client.
|
| I've seen this work great for multitenancy, with sqlite
| (again, a single beefy server goes a long way). At some point
| though, you hit niche scaling issues and that's how you end
| up with, e.g., arcane sqlite hacks. Hopefully these mostly
| come from people who have found early success, or looked at
| by others who want reassurance that there is an escape hatch
| that doesn't involve rewriting everything for web scale.
| pjmlp wrote:
| As usual, don't try to use the network boundary to do what
| modules already offer in most languages.
|
| Distributed systems spaghetti is much worse to deal with.
| kunley wrote:
| this is smart, but also: the overall design was so overengineed
| in the first place..
| atombender wrote:
| I find the "forwarder" system here a rather awkward way to bridge
| the database and Pub/Sub system.
|
| A better way to do this, I think, is to ignore the term
| "transaction," which overloaded with too many concepts (such as
| transactional isolation), and instead to consider the desired
| behaviour, namely atomicity: You want two updates to happen
| together, and (1) if one or both fail you want to retry until
| they are both successful, and (2) if the two updates cannot both
| be successfully applied within a certain time limit, they should
| both be undone, or at least flagged for manual intervention.
|
| A solution to both (1) and (2) is to bundle _both_ updates into a
| single action that you retry. You can execute this with a queue-
| based system. You don 't need an outbox for this, because you
| don't need to create a "bridge" between the database and the
| following update. Just use Pub/Sub or whatever to enqueue an
| "update user and apply discount" action. Using acks and nacks,
| the Pub/Sub worker system can ensure the action is repeatedly
| retried until both updates complete as a whole.
|
| You can build this from basic components like Redis yourself, or
| you can use a system meant for this type of execution, such as
| Temporal.
|
| To achieve (2), you extend the action's execution with knowledge
| about whether it should retry or undo its work. For such a simple
| action as described above, "undo" means taking away the discount
| and removing the user points, which are just the opposite of the
| normal action. A durable execution system such as Temporal can
| help you do that, too. You simply decide, on error, whether to
| return a "please retry" error, or roll back the previous steps
| and return a "permanent failure, don't retry" error.
|
| To tie this together with an HTTP API that pretends to be
| synchronous, have the API handler enqueue the task, then wait for
| its completion. The completion can be a separate queue keyed by a
| unique ID, so each API request filters on just that completion
| event. If you're using Redis, you could create a separate Pub/Sub
| per request. With Temporal, it's simpler: The API handler just
| starts a workflow and asks for its result, which is a poll
| operation.
|
| The outbox pattern is better in cases where you simply want to
| bridge between two data processing systems, but where the
| consumers aren't known. For example, you want all orders to
| create a Kafka message. The outbox ensures all database changes
| are eventually guaranteed to land in Kafka, but doesn't know
| anything about what happens next in Kafka land, which could be
| stuff that is managed by a different team within the same
| company, or stuff related to a completely different part of the
| app, like billing or ops telemetry. But if your app already
| _knows_ specifically what should happen (because it 's a single
| app with a known data model), the outbox pattern is unnecessary,
| I think.
| wavemode wrote:
| I think attempting to automatically "undo" partially failed
| distributed operations is fraught with danger.
|
| 1. It's not really safe, since another system could observe the
| updated data B (and perhaps act on that information) before you
| manage to roll it back to A.
|
| 2. It's not really reliable, since the sort of failures that
| prevent you from completing the operation, could also prevent
| you from rolling back parts of it.
|
| The best way to think about it is that if a distributed
| operation fails, your system is now in an indeterminate state
| and all bets are off. So if you really must coordinate updates
| within two or more separate systems, it's best if either
|
| a) The operation is designed so that nothing really happens
| until the whole operation is done. One example of this pattern
| is git (and, by extension, the GitHub API). To commit to a
| branch, you have to create a new blob, associate that blob to a
| tree, create a new commit based on that tree, then move the
| branch tip to point to the new commit. As you can see, this
| series of operations is perfectly fine to do in an eventually-
| consistent manner, since a partial failure just leaves some
| orphan blobs lying around, and doesn't actually affect the
| branch (since updating the branch itself is the last step, and
| is atomic). You can imagine applying this same sort of pattern
| to problems like ordering or billing, where the last step is to
| update the order or update the invoice.
|
| b) The alternative is, as you say, flag for manual
| intervention. Most systems in the world operate at a scale
| where this is perfectly feasible, and so sometimes it just
| makes the most sense (compared to trying to achieve perfect
| automated correctness).
| atombender wrote:
| Undo may not always be possible or appropriate, but you _do_
| need to consider the edge case where an action cannot be
| applied fully, in which case a decision must be made about
| what to do. In OP 's example, failure to roll back would
| grant the user points but no discount, which isn't a nice
| outcome.
|
| Trying to achieve a "commit point" where changes become
| visible only at the end of all the updates is worth
| considering, but it's potentially much more complex to
| achieve. Your entire data model (such as database tables,
| including index) has to be adapted to support the kind of
| "state swap" you need.
| kgeist wrote:
| >I think attempting to automatically "undo" partially failed
| distributed operations is fraught with danger.
|
| We once tried it, and decided against it because
|
| 1) in practice rollbacks were rarely properly tested and were
| full of bugs
|
| 2) we had a few incidents when a rollback overwrote
| everything with stale data
|
| Manual intervention is probably the safest way.
| cletus wrote:
| To paraphase [1]:
|
| > Some people, when confronted with a problem, think "I know,
| I'll use micro-services." Now they have two problems.
|
| As soon as I read this example where there's users and orders
| microservices, you've already made an error (IMHO). What happens
| when the traffic becomes such an issue that you need to shard
| your microservices? Now you've got session and load-balancing
| issues. If you ignore them, you may break the read-your-write
| guarantee and that's going to create a huge cost to development.
|
| It goes like this: can you read uncommitted changes within your
| transaction or request? Generally the answer should be "yes". But
| imagine you need to speak to a sharded service, what happens when
| you hit a service that didn't do the mutation but it isn't
| committed yet?
|
| A sharded data backend will take you as far as you need to go. If
| it's good enough for Facebook, it's good enough for you.
|
| When I worked at FB, I had a project where someone had come in
| from Netflix and they fell into the trap many people do of trying
| to reinvent Netflix architecture at Facebook. Even if the Netflix
| microservices architecture is an objectively good idea (which I
| honestly have no opinion on, other than having personally never
| seen a good solution with microservices), that train has sailed.
| FB has embraced a different architecture so even if it's
| objectively good, you're going against established practice and
| changing what any FB SWE is going to expect when they come across
| your system.
|
| FB has a write through in-memory graph database (called TAO) that
| writes to sharded MySQL backends. You almost never speak to MySQL
| directly. You don't even really talk to TAO directly most of the
| time. There's a data modelling framework on top of it (that
| enforces privacy and a lot of other things; talk to TAO directly
| and you'll have a lot of explaining to do). Anyway, TAO makes the
| read-your-write promise and the proposed microservices broke
| that. This was pointed out from the very beginning, yet they
| barreled on through.
|
| I can understand putting video encoding into a "service" but I
| tend to view those as "workers" more than a "service".
|
| [1]: https://regex.info/blog/2006-09-15/247
| codethief wrote:
| > Even if the Netflix microservices architecture is an
| objectively good idea (which I honestly have no opinion on
|
| I have no opinion on that either, but at least this[0] story by
| ThePrimeagen didn't make it sound all too great. (Watch this
| classic[1] before for context, unless you already know Wingman,
| Galactus, etc.)
|
| [0]: https://youtu.be/s-vJcOfrvi0?t=319
|
| [1]: https://m.youtube.com/watch?v=y8OnoxKotPQ
| alphazard wrote:
| The best advice (organizationally) is to just do everything in a
| single transaction on top of Postgres or MySQL for as long as
| possible. This produces no cognitive overhead for the developers.
|
| Sometimes that doesn't deliver enough performance and you need to
| involve another datastore (or the same datastore across multiple
| transactions). At that point eventual consistency is a good
| strategy, much less complicated than distributed transactions.
| This adds a significant tax to all of your work though. Now
| everyone has to think through all the states, and additionally
| design a background process to drive the eventual consistency. Do
| you have a process in place to ensure all your developers are
| getting this right for every feature? Did you answer code review?
| Are you sure there's always enough time to re-do the
| implementation, and you'll never be forced to merge inconsistent
| "good enough" behavior?
|
| And the worst option (organizationally) is distributed
| transactions, which basically means a small group of talented
| engineers can't work on other things and need to be consulted for
| every new service and most new features and maintain the clients
| and server for the thumbs up/down system.
|
| If you make it hard to do stuff, then people will either 1. do
| less stuff, or 2. do the same amount of stuff, but badly.
| junto wrote:
| This is giving me bad memories of MSDTC and Microsoft SQL Server
| here.
| kgeist wrote:
| Main source of pain with eventual consistency is lots of customer
| calls/emails "we did X but nothing happened". I'd also add that
| you should make it clear to the user that the action may not be
| instantenous.
| renegade-otter wrote:
| That's another thing about a single database, if it's well-
| tuned and your squeel is good - in many cases you don't even
| need any kind of cache, removing an entire class of bugs.
| Scubabear68 wrote:
| If I had a nickel for all the clients I've seen with micro
| services everywhere, and 90% of the code is replicating an RDBMS
| with hand coded in memory joins.
|
| What could have been a simple SQL query in a sane architecture
| becomes N REST calls (possibly nested with others downstream) and
| manually stitching together results.
|
| And that is just the read only case. As the author notes updates
| add another couple of levels of horror.
| renegade-otter wrote:
| In the good old days, if you did that, you would rightfully be
| labeled as an "amateur".
| revskill wrote:
| You can have your cake and eat it too by allowing replication.
| wwarner wrote:
| AGREE! The author's point is very well argued. Beginning a
| transaction is almost _never_ a good idea. Design your data model
| so that if two pieces of data must be consistent, they are in the
| same row, and allow associated rows to be missing, handling nulls
| in the application. Inserts and updates should operate on a
| single table, because in the case of failure, nothing changed,
| and you have a simple error to deal with. In short, as explained
| in the article, embrace eventual consistency. There was a great
| post from the Github team about why they didn 't allow
| transactions in their rails app, from around 2013, but I can't
| find it for the life of me.
|
| I realize that you're staring at me in disbelief right now, but
| this is gospel!
| latchkey wrote:
| Back in the early 2000's, I was working for the largest hardcore
| porn company in the world, serving tons of traffic. We built a
| cluster of 3 Dell 2950 servers with JBoss4. We were using
| Hibernate and EJB2 entities, with a MySQL backend. This was all
| before "cloud" allowed porn on their own systems, so we had to do
| it ourselves.
|
| Once configured correctly and all the multicast networking was
| set up, distributed 2PC transactions via jgroups worked
| flawlessly for years. We actually only needed one server for all
| the traffic, but used 3 for redundancy and rolling updates.
|
| -\\_(tsu)_/-, kids these days
| ebiester wrote:
| Different problems have different solutions.
|
| You likely mostly had very simple business logic in 90% of your
| system. If your system is automating systems for a cross-domain
| sector (think payroll), you're likely to have a large number of
| developers on a relatively small amount of data and speed is
| secondary to managing the complexity across teams.
|
| Microservices might not be a great solution, and distributed
| monoliths will always be an anti-pattern, but there are reasons
| for more complex setups to enable concurrent development.
| latchkey wrote:
| Due to the unwillingness for corporations to work with us, we
| had to develop our own cross-TLD login framework, payments
| system, affiliate tracker, micro-currency for live pay per
| minute content, secure image/video serving across multiple
| CDN's, and a whole ads serving network. It took years to
| build it all and was massively complicated.
|
| The point I was making is that the tooling for all of this
| has existed for ages. People keep reinventing it. Nothing
| wrong with that, but these sorts of blog posts are
| entertaining to watch history repeat itself and HN to argue
| over the best way to do things.
| misiek08 wrote:
| Looks like crypto ad for the library and showing probably worst,
| most over-engineered method for ,,solving" transactions in more
| diverse environment. Eventually consistency is big tradeoff not
| possible to accept in many payment and stock related areas.
| Working in company where all described problems exist and were
| solved the worst way possible I see this article as very
| misleading. You don't want the events instead of transactions -
| if something has to be commited together - you need to reachitect
| system and that's it. Of course people who were building this
| monster for years will block anyone from doing this. Over-
| engineered AF, because most of the parts where transactions are
| required could be handled by single database, even SQL and
| currently are split between dozens of separate Mongo clusters.
|
| "Event based consistency" leaves us in state where you can't
| restore system to a stable, safe, consistent state. And of course
| you have a lot more fun in debugging and developing, because you
| (we, here) can't test locally anything. Hundreds of mini-clones
| of prod setup running, wasting resources and always out of sync
| are ready to see the change and tell you a little more than
| nothing. Great DevEx...
___________________________________________________________________
(page generated 2024-10-06 23:02 UTC)