[HN Gopher] LedgerStore Supports Trillions of Indexes at Uber
___________________________________________________________________
LedgerStore Supports Trillions of Indexes at Uber
Author : imnot404
Score : 89 points
Date : 2024-06-07 14:55 UTC (8 hours ago)
(HTM) web link (www.uber.com)
(TXT) w3m dump (www.uber.com)
| dmoy wrote:
| That was a pretty cool read, thanks for posting. It always sends
| prickly/danger signals to my brain when I see "reads also write
| stuff", but this was laid out pretty clearly. Dissolved that
| prickly feeling pretty quickly haha.
| candiddevmike wrote:
| > From a business impact perspective, operating LedgerStore is
| now very cost-effective due to reduced spend on DynamoDB. The
| estimated yearly savings are over $6 million per year.
|
| I'd like to see this broken out more. You replaced a managed
| service with a custom DB on top of MySQL. Surely there is a
| significant amount of engineering effort that was/is being spent
| to create/maintain this.
| stackskipton wrote:
| Yea, 6 Million is about ~15 Engineers at Uber Total Comp
| average.
| BiteCode_dev wrote:
| Managed services still require engineering efforts. I have
| never seen a transition to a cloud service that actually didn't
| end up requiring a new hire.
|
| It's the dirty little secret of our industry.
| echelon wrote:
| This is the biggest nasty secret about cloud (and SaaS for
| things you don't build in-house).
|
| When my last company transitioned to cloud, all of the on-
| prem folks had to additionally take on that responsibility.
| We then had to hire a bunch of new folks to ease the
| transition and manage the cloud pieces. And then there are
| new teams spun up for permissions and security. Your
| headcount requirements are not going to go down because of
| cloud.
|
| As for those that like to call DIY systems "weirdware":
| homegrown systems are expensive to engineer, but cheap to
| run. We 10x'd our cost going to 3rd party visibility and
| feature flagging versus the thinly staffed [1] homegrown
| systems. And those 3rd party systems also required a ton of
| engineering to support, the migrations weren't clean, and
| they missed a lot of key features that we had to upstream to
| the third parties!
|
| When SignalFx got bought out by Splunk, I've never seen an
| annual plan get re-wired so hastily. Top priority was moving
| to a new vendor, and that impacted every single team in the
| company.
|
| The only real tangible benefit to cloud that I've seen is
| that a team can instantly provision hardware, database, etc.
| resources without much planning. That's it. And is that worth
| it? (I don't really think so.)
|
| [1] Once built, half an engineering headcount per quarter to
| accommodate new features. Oncall rotation for one team that
| owned the systems. Our stuff was high resiliency and barely
| paged at all. No SEV1 or worse outages ever.
| deanCommie wrote:
| What a short-sighted outlook
|
| 1) "homegrown systems are expensive to engineer, but cheap
| to run" misses the most important part - how much does it
| cost to maintain, and how much of a RISK is it?
|
| Homegrown probably means you're depending on tribal
| knowledge from a few core people who if they leave you're
| hosed. That's a risk. You also have to train EVERYONE you
| hire to learn the homegrown systems, instead of hiring
| people with externally transferrable skills that can come
| in and hit the ground running.
|
| 2) all of the on-prem folks had to additionally take on
| that responsibility
|
| Most transitions/migrations always end up with a period
| where you have both systems running. It's not surprising
| that at first everyone has additional responsibilities. It
| sounds like the "on-prem folks" either didn't have the
| skills to work in the cloud or refused to learn, and needed
| external hires. Well, that's one way to put themselves out
| of a job, and that would be the next step once the
| homegrown system is deprecated.
|
| Obviously there's a reason why "not invented here" syndrome
| exists. There's a reason "build vs buy" is a complex
| discussion because of more than just buy costs. People
| always prefer the home-grown thing, DIY. Engineers
| especially. On this site, particularly. And, also, plenty
| of successful businesses exist by cutting out layers and
| going on a shallower stack ("your margin is my
| opportunity").
|
| But at the end of the day, for the vast majority of
| businesses, companies, and software shops, managing their
| own in-house infrastructure is a worse decision than paying
| a cloud vendor.
| aledalgrande wrote:
| I see both points but I have to agree with this one,
| especially if we're talking small companies. A startup
| should outsource everything that is outside their core
| product (but not more than that), so they can focus on
| making their product top class.
| jcelerier wrote:
| > Homegrown probably means you're depending on tribal
| knowledge from a few core people who if they leave you're
| hosed.
|
| there's so much complexity and tribal knowledge with
| cloud deployments that if tomorrow our cloud experts
| leave we're also very definitely hosed too, despite
| everything being documented thoroughly. I'm involved in a
| product that leverages a cloud-based metaverse system
| (Mozilla Hubs) that recently had organisational changes
| requiring us to change our hosting approach, and it's
| taking the better part of a year of work to understand
| its logic, for something that would have been a non-issue
| if homegrown & self-hosted.
| pphysch wrote:
| > This is the biggest nasty secret about cloud (and SaaS
| for things you don't build in-house).
|
| Yep. Huge amounts of marketing funds are spent tricking
| decision-makers into thinking that off-the-shelf, plug-and-
| play solutions exist for their idiosyncratic business
| problems and schemas.
|
| Sometimes such products exist, but they are so bloated with
| features to support the other 99% of customers. So you
| absolutely need at least one expert to unwind the
| complexity.
|
| It's known that "greenfield" software development is much
| easier than "brownfield"/migrations. It should also be
| widely known that brownfield SaaS integrations are
| troublesome, because it involves enormous software+data
| work to link the existing in-house interfaces to the third-
| party interfaces. As a rule of thumb, there is no such
| thing as "off-the-shelf". You WILL have to hire and build
| and deeply understand your business processes.
| dehrmann wrote:
| This is what people tell themselves, but 99% of the time,
| what your business does and the scale it operates at
| aren't special.
| adamisom wrote:
| Special in the sense of scale does not equate to special
| in the sense of adapting to your business'
| needs/culture/sensemaking
| lazide wrote:
| They don't have to be special to be bespoke.
|
| And almost all business integrations are highly bespoke.
| spamizbad wrote:
| I love the term "weirdware": I've seen systems like that
| built (and used!) first-hand multiple times in my career.
|
| They can sometimes be very odd force multipliers in
| unexpected ways.
|
| My favorite example is a home-rolled payroll applications
| specifically for sales people from over a decade ago. When
| I first arrived at the org I thought they were absolutely
| insane to have created such a thing. But its killer feature
| was that it allowed our org to pay out commissions rapidly
| (same day) and also allowed negotiating on a per-account
| residual commission basis as well as giving management the
| option to buy-out residuals. Off-the-shelf solutions at
| that time kinda-sorta did this but required either massive
| upfront $$$ and a ton of integration work. A plucky startup
| couldn't afford it. This was built, tested, and rolled-out
| in 45 days by 2 engineers and basically allowed them to
| poach top sales people because they knew they could get
| paid faster and had more flexibility on residuals.
| davedx wrote:
| Ha I built an internal app just like that for field sales
| at a mid sized company. One of the best projects I ever
| worked on.
| throwup238 wrote:
| That's the tradeoff everyone makes when they say
| something isn't in their "core competency" - they give up
| any real competitive advantage that area of expertise
| could bring them. That often makes total sense like with
| accounting or HR, but most companies take it way too far
| in engineering and internal support software.
|
| Engineers make the same mistake all the time too: we love
| to say that "premature optimization is the root of all
| evil", ignoring that the full quote is _" We should
| forget about small efficiencies, say about 97% of the
| time: premature optimization is the root of all evil.
| _Yet we should not pass up our opportunities in that
| critical 3%_"_
| throwup238 wrote:
| _> The only real tangible benefit to cloud that I 've seen
| is that a team can instantly provision hardware, database,
| etc. resources without much planning. That's it. And is
| that worth it? (I don't really think so.)_
|
| Yes! Maybe not for you or me, but it's definitely worth it
| to tons of organizations where everyone including
| engineering was beholden to old school IT departments that
| take months to spin up a VM after pages of bureaucratic
| back and forth. The cloud moves it from an IT issue to
| department budgeting.
|
| The other advantage of cloud was moving capex to opex which
| the finance people liked because * _hand wave*_ something
| about amortization. Those are the two reasons most
| established companies moved to the cloud. It had little to
| do with the actual cost but how it was spent and who was in
| charge of doing it.
| 0cf8612b2e1e wrote:
| where everyone including engineering was beholden to old
| school IT departments that take months to spin up a VM
| after pages of bureaucratic back and forth. The cloud
| moves it from an IT issue to department budgeting.
|
| My F500 company put a stop to that. There is new
| paperwork in place to requisition cloud resources. The
| bureaucracy will not be replaced.
| ndriscoll wrote:
| Even the startup I worked at in my last job didn't let
| people just spin up whatever resources they felt like in
| AWS. Changes had to go through architecture/security
| review, and I might have been the only SDE who was
| allowed to log into AWS at all. We were subject to SOC 2,
| but I imagine it's relatively common for SaaS companies
| to have some kind of compliance framework assuming they
| want F500s as customers?
| dopylitty wrote:
| Having seen the results of everyone being able to spin up
| a VM (or an EMR cluster, or an EKS cluster) I can say a
| little more bureaucracy is probably a good thing.
|
| You end up wasting huge amounts of money on resources
| that are just sitting around idle (no your fancy
| automation to shut them down won't work because maybe
| that dev cluster is actually needed at 2am on Sunday.
| It's an org problem, not a technical problem).
|
| But more importantly you give the cloud providers an
| excuse to destroy another square mile of pristine forest
| and build a giant 4 story grey box that wastes enormous
| amounts of already limited water and energy.
| sneak wrote:
| > _Your headcount requirements are not going to go down
| because of cloud._
|
| Depends on how big you are. I'm one person and I do a lot
| of things with the cloud that I could not do if I had to
| rack servers (or even order dedicateds).
|
| Same goes for small 5-10 person teams that are good with
| Terraform. I've seen some orgs punching waaaay above their
| weight given their size. Not possible without classic IaaS.
| roncesvalles wrote:
| Eh, sure but it's not the same person. There are far more
| people who can "do cloud" than those who genuinely understand
| distributed systems well enough to build and operate a
| homegrown data infrastructure.
| makmanalp wrote:
| True, but anyone who's a cheaper hire because they can "do
| cloud" but not homegrown also represents a sacrifice in
| terms of what's you get. In terms of your operational
| capabilities, you'll likely be paying for that with a
| crisis first where everyone yells at your team because of a
| problem in your managed provider you can't fix, and you
| can't get someone from the provider on the line and helpful
| in a way that gets you out of it faster. So then you'll
| tack on add a massive yearly support retainer for a
| response time SLA that's still bad for your business bottom
| line and support that's not incentivized to prioritize you.
| I have stories from name-brand PaaS/IaaS companies.
|
| It's like outsourcing (I don't mean offshore, I mean any
| kind). It can be a reasonable thing to do if you don't
| pretend you're getting the same thing for less money and
| really actually plan around that. But people often do
| pretend it's the same, and the results aren't immediately
| apparent. And when they're finally blatantly apparent,
| either the guilty party has cashed out and left, or people
| don't have enough perspective to point to the real root
| cause.
|
| I'm not saying building your own infra is generally a good
| idea of course. It's just that beyond a certain size, for
| more businesses than you'd think, reliability has to be a
| core competency you invest in. And homegrown is a lot
| easier than it used to be because the offerings in terms of
| tooling and platforms are way better such that you can
| strategically craft your level of managedness for optimum
| cost/benefit on multiple criteria.
| 999900000999 wrote:
| It depends on your scale.
|
| As a solo dev just being able to point my apps to firebase
| and not worry about it saves a ridiculous amount of time.
|
| At Uber's scale they probably have to tweak Dynamo and other
| services so much they might as well bring em in house.
|
| I'm a bit surprised we haven't seen a new AWS, something like
| Walmart Web Services.
| wongarsu wrote:
| On the other hand as a solo dev `sudo apt-get install
| postgresql` plus a short tweak of your pg_hba.conf and a
| quick script to run pg_dump every night is about as much
| work as setting up a cloud service.
|
| Managed databases provide a lot of useful features,
| especially if shit hits the fan. I'm not saying you
| shouldn't use them. But the work you put into reasonable
| self-hosted services and reasonable managed services often
| scale at a surprisingly similar pace.
| 999900000999 wrote:
| Alright, what about Auth or other features Firebase has
| built in ?
|
| I still have to host it somewhere.
| BiteCode_dev wrote:
| Auth is included in your stack, e.g: django comes with
| it.
|
| Plus you own the accounts, so you are not locked in.
|
| Sync is the firebase kill feature, but most apps can live
| without it.
| 999900000999 wrote:
| For a side project it's not that serious.
|
| Firebase is ready to go without me thinking of the
| details.
|
| If I'm using Flutter or React, I already have a nice
| client side sdk to use.
|
| The 12th of never when I need to scale it I can switch to
| a different stack. Firebase functions allows me to do
| some server side data manipulation.
|
| That said, if I was at work and needed to propose
| something I'd probably spend more time thinking of this.
| But for all my side projects firebase is my goto.
|
| Let's just say we're having a competition to see who can
| crank out a basic crud app first. You're not beating me
| with Flutter and Firebase.
| faddypaddy34 wrote:
| Then if you are successful one day you can cry on social
| media about your large surprise bills.
| rco8786 wrote:
| Bingo. You still need your whole Ops and tooling teams. They
| just become Cloud Ops and Cloud Tooling.
| devjab wrote:
| You wish! In my experience they just load all the cloud work
| on to developers. Like, I'm setting up virtual networks,
| private endpoints, firewalls and what not, because if I don't
| there won't be horrible (I basically knew nothing about
| networking before I did it and I still don't know if anything
| we do is correct in any manner) security but no security.
|
| It'll bite the company in the ass eventually, or maybe it
| won't, but it is what it is.
| diob wrote:
| I don't think it's a secret, I think the idea is it will
| allow you to be more resilient then a custom solution. Why?
| If the person(s) who made the custom solution leave, you
| can't as easily hire a replacement. What's more, hiring
| immediately effective folks becomes harder, even if those
| folks who made it don't leave.
|
| It's a tradeoff, but I don't think folks are keeping secrets
| around it.
| stefan_ wrote:
| When did this profession turn into a bunch of people that never
| want to build anything ever again?
|
| I guess I finally understand what they meant by "no one ever
| got fired buying IBM".
| gigatexal wrote:
| 6M in total UBER spend is it really a worthwhile chunk of
| change?
|
| I mean I'm here for the tech of course just curious from the
| cost benefit part.
| bastawhiz wrote:
| Six million dollars is six million dollars. Now, if that's a
| team of eight delivering that value, yeah, it's a great deal.
| If it's an org of 100 people, 6M is much less exciting. I'd
| suspect it's the former.
| jpollock wrote:
| I'm confused, why does the Trip Service talk to the issuer to
| place a hold? I would have expected the Payment Service to
| mediate all that. Particularly since "Place Hold" is another way
| of saying "Authorize Payment".
|
| Perhaps the lines are wrong?
| mschuster91 wrote:
| > Particularly since "Place Hold" is another way of saying
| "Authorize Payment".
|
| As the article explains, _it 's not_, at least from the
| merchant's side (from the customer's side it is). A CC hold
| tells the bank "hey guys, make sure that there will be at least
| X dollars available when we call in the hold", and the bank can
| either respond with "yes, hold confirmed" or "nope, declined"
| (say, due to a lack of funds, wrong CVV, whatever). When the
| hold is confirmed, and the customer makes another transaction
| at another vendor that would cause the amount of open holds +
| executed but not paid-off transactions to go above your limit,
| that transaction gets blocked.
|
| And eventually, only once you as the merchant call in the hold,
| you will actually initiate the flow of money. Or you call in
| less than the held amount, and you'll only get that amount of
| money (usually seen in hotels and car rentals, where the hold
| can be significantly larger than the bill, to account for stuff
| like minibar expenses, room cleaning or damages to the
| vehicle). Or you release the hold, or it expires, and you get
| no money at all, and all you can do is try to create a new
| transaction and hope it gets approved.
| ComputerGuru wrote:
| But that's what auth + capture _does_ , though.
| jpollock wrote:
| Authorize Payment results in the Hold. The diagram is
| representing a standard Auth + Capture credit card flow.
|
| Capture contains the final transaction amount and can either
| be higher or lower than the Authorized amount.
|
| The odd thing is the Trip Service shouldn't know anything
| about the Issuer. That's the Payment Service's reason to
| exist (so Trip Service doesn't need to know about different
| payment methods). That the Trip Service knows how to talk to
| Visa (and Amex, and MasterCard, and Discover, and Google, and
| Apple) about Auth'ing a card, but doesn't know how to Capture
| the payment is the strange part.
|
| My expectation is that the diagram is incorrect, and the
| TripService talks to the PaymentService to Auth the card.
| johnrob wrote:
| I'm a little confused as to what "Index" refers to here. Is it a
| data structure providing key lookup, or is it a single entry
| within one of these structures?
| tantalor wrote:
| It has to be entries. Also confused.
| tiffanyh wrote:
| Is this at all related to the major migration from Postgres to
| MySQL?
|
| Previously discussed on HN (294 comments):
|
| https://news.ycombinator.com/item?id=12166585
| dmoy wrote:
| Yea, seems like a continuation of the same fundamental problem?
| I don't know.
|
| As a commenter there put it - """The actual summary of the
| article is "The design of Postgres means that updating existing
| rows is inefficient compared to MySQL"."""
|
| I think this is yet another approach to dealing with the cost
| of updating rows?
| ThinkBeat wrote:
| When you decide to just write your own database to store stuff
| in, because whatever is out there just doesn't fit your use case.
| you have most likely failed.
|
| You can then either write a new database that fits the use case,
| or take a hard look at the use case and change it to fit existing
| database systems.
|
| Now as a programmer I LOVE the idea of writing a database or an
| operating system, it would be super leet.
|
| When writing a new database system, you will make a lot of
| mistakes and a lot of bugs that you will no doubt encounter at
| inconvenient times after a lot of debugging and profiling.
|
| This is built on top of MySQL which has been battle tested (still
| not where I would put any critical data) Technically that you're
| not writing the engine but with so much functionality in two
| layers above MySQL you are in essence building an engine for the
| engine.
| bdcravens wrote:
| Perhaps but I think most of us have never worked on a problem
| of Uber's scale. The behaviors of things change spectacularly
| above a certain level; it's not just the same thing times N.
|
| The implicit assumption of "not invented here syndrome" is that
| it has been invented elsewhere already, which isn't always
| true.
| convolvatron wrote:
| whats missing from the reuse picture for things like
| operating systems, compilers (somwhat), and databases is
| systems that use a 'lego' model of components that can be put
| together into solutions rather than deployable systems (which
| can clearly be built on top).
|
| this provdies room for doing real customization without
| having to build all the things from ground zero, and is
| likely to be more robust than trying to attach functionality
| fully on the outside like was done here.
|
| in the database world it would be really nice to implement
| caching and sharding extensions inside the transactional
| envelope
| jonathan_landy wrote:
| That all makes sense. On the other hand, next gen workhorses
| must often arise from people opting to do something new like
| this, rather than make do with the current gen workhorse.
| SilverBirch wrote:
| I think this depends on the context. Let's say Uber had a
| database problem, they hired a bunch of database people and
| they came to the conclusion "we just _have_ to write our own...
| fine. I think that's actually perfectly legit. What I see much
| more commonly though is people who don't really know too much
| about databases have problems and rather than go off and hire
| /consult with experts, they just decide to build their own.
| That's when you get into real trouble. The dirty truth is that
| whilst yes, Uber faces challenges due its scale, it also faces
| few real limits on building superfluous shit because of their
| success.
| makestuff wrote:
| Yeah this smells like promotion oriented architecture. I have
| a hard time believing that a logistics company that built
| DynamoDB and uses it extensively can make it work but uber
| cannot.
|
| The 6m in savings does not properly account for things like
| ramping up new hires on some custom database, maintenance
| (what happens in 5 years when whatever language you wrote it
| in needs to be upgraded to a new version, or some
| dependency), and a host of other things.
|
| Yes the cloud is expensive, but the entire point of it is
| that you are offloading all of that underlying
| maintenance/feature work to a team that only does that all
| day every day and is very good at it.
| dangwu wrote:
| You're implying that you should never invent new tools. But
| every tool was a new invention upon its creation. Also, all
| software is an "engine" for an "engine".
| aeyes wrote:
| I understand that this Ledgerstore thing is just some tables
| and materialized views in a Docstore database which they
| already use for everything else.
|
| This seems to be an in-house DynamoDB (sharded MySQL with Raft)
| which has been developed a long time ago, see:
| https://www.uber.com/en-US/blog/schemaless-sql-database/
|
| But maybe I'm totally wrong here because they also use or used
| Google Spanner: https://www.uber.com/en-US/blog/building-ubers-
| fulfillment-p...
| normand1 wrote:
| Sounds like they rebuilt QLDB?
| https://docs.aws.amazon.com/qldb/latest/developerguide/what-...
| nikolay wrote:
| Stop bragging and open-source it!
| stackskipton wrote:
| No, please don't. Some CTO at some company I work at is going
| to think they are Uber scale, deploy this and I'll be stuck
| supporting it.
| ddoolin wrote:
| TIL that "indexes" is also a valid plural of "index", as well as
| "indices."
| redwood wrote:
| Feels like this would have been a good fit for zoned sharding in
| MongoDB -- hot shards that would age out to cold shards, never
| having to deal with the lack of strongly consistent secondary
| indexes or transactions, and not having to deal with a completely
| different database for the hot and cold environments.
| kookamamie wrote:
| Indexes? Indices?
___________________________________________________________________
(page generated 2024-06-07 23:00 UTC)