[HN Gopher] Migrating Uber's ledger data from DynamoDB to Ledger...
___________________________________________________________________
Migrating Uber's ledger data from DynamoDB to LedgerStore
Author : gronky_
Score : 267 points
Date : 2024-05-20 10:01 UTC (12 hours ago)
(HTM) web link (www.uber.com)
(TXT) w3m dump (www.uber.com)
| drexlspivey wrote:
| > Uber migrated all its payment transaction data from DynamoDB
| and blob storage into a new long-term solution
|
| No way they have 1 trillion transactions right?
| rco8786 wrote:
| 1T "records". Any given transaction can have N records. I'm
| assuming this includes Uber Eats as well.
| drexlspivey wrote:
| Still, they have 10B rides in 2023 including Eats, say
| 75-100B since inception. What would be a record such that
| each transaction needs 10-15 on average?
| ndr wrote:
| Consider it might be quite de-normalized as typical at
| scale.
|
| Some records for the customer, some for the driver, some
| for the restaurant...
| eru wrote:
| You might even have a few more.
|
| Eg you might have a record for each stage of the meal.
| When it's ordered, when it's cooked, when it's delivered,
| etc.
| re-thc wrote:
| > What would be a record such that each transaction needs
| 10-15 on average?
|
| Does it have to be 1-dimensional? Depends exactly what
| payments is. There are refunds, discounts, paying e.g.
| drivers. There are also things like monthly subscriptions
| people can subscribe to for discounts / unlimited uses.
| Lots of things add up.
| csomar wrote:
| I can see that as transactions with credit cards go through
| lots of process (withholding, approval, charging, settling,
| etc..)
| shrubble wrote:
| They need to pay the driver and they need to handle taxes;
| that alone triples your estimated 100B.
| rco8786 wrote:
| > 75-100B
|
| This seems low, off the bat. 15 years of Uber, 9 years of
| Uber Eats.
|
| But even just looking at my most recent trip with Uber,
| there are 7 different records visible on the receipt. Not
| including backend recordkeeping that isn't exposed to the
| user (driver payments, driver loan repayments, revenue
| recognition, internal fees/records, etc).
|
| Total trip amount, Trip fare, Booking fee, Tip, State fee,
| Payment #1 (trip itself), and Payment #2 (driver tip)
|
| Now consider Uber Eats where there is (at least) one record
| for each item in an order...plus tax, tip, etc as always.
|
| Then consider things like wait time charges, subscriptions,
| split charges, pending charges, chargebacks, refunds,
| disputes, blah blah blah.
|
| An average of 10 records per customer transaction seems
| entirely reasonable.
| bjornsing wrote:
| The blog post says billions of transactions and trillions of
| indexes (or rather index entries I presume), if I remember
| correctly.
| sha_r_roh wrote:
| Congrats to anyone who worked on it! However, I'm guessing the
| cost of just running this team be quite large and not
| significantly different from the savings (6M), and add on top of
| it the overhead of maintenance. Payments would not likely be a
| long-term bet as well, so kind of interesting why teams take up
| such projects ? Is it some kind of sunk-cost with the engineering
| teams you already have?
| bjornsing wrote:
| > Payments would not likely be a long-term bet as well
|
| How so? It's a pretty ubiquitous problem...
| kondro wrote:
| The estimate sounds suspiciously similar to just the data
| storage component of DynamoDB. 1.7PB of data and indexes is
| about $5.1m/year in DynamoDB storage at list.
| sakjur wrote:
| Supporting that, Uber's blog post linked from the article
| mentions cost savings as a benefit from going from three
| systems to one, and doesn't really mention any dollar figure
| afaict.
|
| https://www.uber.com/en-AU/blog/migrating-from-dynamodb-
| to-l...
| bachmeier wrote:
| > I'm guessing the cost of just running this team be quite
| large and not significantly different from the savings (6M),
| and add on top of it the overhead of maintenance
|
| I'm guessing they know a lot about their costs, and you know
| very little. There's little value in insulting the team members
| like this.
| whamlastxmas wrote:
| It's not insulting to speculate in a conversational way
| around the errors we very very commonly see
| szundi wrote:
| That was not a nice reply for a non-insult. Do you have
| anything to add maybe?
| bachmeier wrote:
| > That was not a nice reply for a non-insult.
|
| It's an insult if you dismissively explain basic things to
| the folks working on the project.
| inoop wrote:
| > I'm guessing they know a lot about their costs, and you
| know very little.
|
| I'm curious what makes you believe the OP doesn't know about
| cost? They might be director-level at a large tech company
| with 20+ years experience for all you know...
|
| > There's little value in insulting the team members like
| this.
|
| I'd argue it's not insulting to question a claim (i.e. 'we
| saved $6MM') that is offered with little explanation.
| qaq wrote:
| Regardless of position at some other company it will tell
| you precisely 0 about this specific situation.
| smokel wrote:
| At one end of the spectrum, some people here claim to write
| this kind of software over a weekend. Some others claim they
| require a salary of $600,000, and still need nine additional
| colleagues to pull something like this off.
|
| There is a lot of room in between, where cost estimates are
| more realistic.
| szundi wrote:
| This answer pretty much sums a lot of my experience. Of
| course when the guy somehow pulls this off in 2 weeks it is
| seen as an easy side project with proof that it is, haha
| datadrivenangel wrote:
| This is why incentives favor the heavy bloated enterprise
| approach: if it looks expensive, people feel like they got
| something good for their money.
| renegade-otter wrote:
| Plenty of things can be prototyped over a weekend, but many
| will require months and even years to get production-ready,
| feature-complete, and useful, especially at scale.
| cdchn wrote:
| Developing and maintaining a totally bespoke DB system with
| that kind of volume even for $5m/yr, spitball you could get
| yourself 25 top-notch engineers without AI PhDs and have
| another mil left over for metal. Sounds plenty feasible to have
| a nice tailored suit for a core part of your business.
| inoop wrote:
| > you could get yourself 25 top-notch engineers without AI
| PhD
|
| Not in the US though. According to levels.fyi, an SDE2 makes
| ~275k/year at Uber. Hire 25 of those and you're already at
| $6.875MM. In reality you're going to have a mix of SDE1,
| SDE2, SDE3, and staff so total salaries will be higher.
|
| Then you gotta add taxes, office space, dental, medical, etc.
| You may as well double that number.
|
| And that's just the cost of labor, you haven't spun up a
| single machine and or sent a single byte across a wire.
| silverquiet wrote:
| Work from home doesn't mean that home has to be in the US.
| cdchn wrote:
| "and have another mil left over for metal" was the part
| accounting for hardware, infrastructure, etc.
|
| And you can fudge the employee salary a mil or two either
| way, but the point is that spending that much on a team to
| build something isn't infeasible or even unreasonable.
| cthalupa wrote:
| > Then you gotta add taxes, office space, dental, medical,
| etc. You may as well double that number.
|
| Economies of scale help a bit with this for larger
| companies, so it's probably not quite double for Uber, but
| yeah, not too far off as a general rule of thumb. Probably
| a 75% increase on the employee facing total comp to get
| fairly close to the company's actual cost for the employee.
| aeyes wrote:
| It doesn't sound like they needed to implement a new DB
| system for this.
|
| This is using existing features of Docstore which is Uber's
| own DynamoDB (sharded MySQL) which they seem to be using for
| almost everything.
| davedx wrote:
| Is accounting really a core part of Uber's business? They're
| a transportation company not a bank. I kind of question the
| premise really
| cdchn wrote:
| Uber is a technology company that tracks 'rides' between
| drivers that are contractors and customers, and accounts
| for taking money from one and giving it to another. I
| wouldn't just call it a core part, I'd go so far as it say
| it is the intrinsic essence of their business. They're not
| a bank, but they're not running a branch with tellers
| taking cash and running ATMs, either.
| shermantanktop wrote:
| They are in the transportation market serving
| transportation needs for a transportation-seeking
| customer base. How they accomplish that is obviously
| interesting, but their attempts to move laterally haven't
| been amazing from what I can tell (though I don't follow
| them closely).
|
| They are structured and run like a tech company but imo
| they don't produce a tech product.
| mlrtime wrote:
| You're assuming that the team only works on this product. It is
| possible they are owners of a lot more than just 1 db.
| inoop wrote:
| I'd be curious as well to see a more complete cost-benefit
| analysis, and I'd be especially interested in labor cost.
|
| We don't know how much time and head count Uber committed to
| this project, but I would be impressed if they were able to
| pull this off with fewer than 6-8 people. We can use that to
| get a very rough lower-bound cost estimate.
|
| For example, AWS internally uses a rule of thumb where each
| developer should generate about $1MM ARR (annual recurring
| revenue). So, if you have 20 head count, your service should
| bring in about $20MM annually. If Uber pulled this off with a
| team of ~6 engineers, by AWS logic, they should about break
| even.
|
| Another rule of thumb I sometimes see applied is 2x developer
| salary. So for example, let's assume a 7-person team of 2xSDE1,
| 3xSDE2, 1xSDE3, and 1xSTAFF, then according to levels.fyi that
| would be a total annual salary of $2.3MM. Double that, and you
| get $4.6MM/year to justify that team annual cost footprint,
| which is still less than $6MM.
|
| Of course, this is assuming a small increase in headcount to
| operate this new, custom data store, and does not factor in a
| potentially significant development and migration cost.
|
| So unless my math is completely off, it sounds to me like the
| cost of development, migration, and ownership is not that far
| off from the cost of the status quo (i.e. DynamoDb).
| Rastonbury wrote:
| Not an engineer, but something like this takes 6-8 people
| working on only this for a full year?
| inoop wrote:
| That has been my experience, yes. You need one full-time
| manager, one full-time on-call/pager duty (usually a
| rotation), and then 4-6 people doing feature development,
| bug fixes, and operational stuff (e.g. applying security
| patches, tuning alarms, upgrading instance types, tuning
| auto-scaling groups, etc. etc.).
|
| Maybe you can do it a bit cheaper, e.g. with 4-6 people,
| but my point is that there's an on-going cost of ownership
| that any custom-built solution tends to incur.
|
| Amortizing that cost over many customers is essentially the
| entire business model of AWS :)
| shrubble wrote:
| If the savings are 6 million per year, then in later years it
| should pay off since the development is a one time cost.
| inoop wrote:
| The cost doesn't suddenly drop to zero once development is
| done. Typically a system of this complexity and scale
| requires constant maintenance. You'll need someone to be
| on-call (pager duty) to respond to alarms, you'll need to
| fix bugs, improve efficiency, apply security patches, tune
| alarms and metrics, etc.
|
| In my experience you probably need a small team (6-8
| people) to maintain something like this. Maybe you can
| consolidate some things (e.g. if your system has low on-
| call pressure, you may be able to merge rotations with
| other teams, etc.) but it doesn't go down to zero.
| shrubble wrote:
| If you follow the various links on the Uber site, you see
| that they have multiple applications sitting on the same
| database. see https://www.uber.com/blog/schemaless-sql-
| database/ . It's not just 1 design of a database, with 1
| application on top...
| qaq wrote:
| If you read the article the system was a layer on top of
| DynamoDB they updated it to use internal product Docstore which
| required adding a feature to Docstore. So it's not as involved
| as people make it out to be. Also records are immutable which
| makes a lot of things way easier.
| lfmunoz4 wrote:
| Off the self software doesn't make sense for a company that is
| planning on lasting a long time. These solutions are all
| designed for multiple use cases. That means that there is
| complexity and inefficiencies that are not required for your
| particular problem. If you were to just focus on your problem
| wouldn't you just end up at an ASIC as the most optimal
| solution? Reason most software doesn't is 1) people like to re-
| invent the wheel 2) As you go start going lower level the less
| qualified people you can find.
| bjornsing wrote:
| I'm working on a specialized data store[1] that would be perfect
| for this kind of use case (large "cold" storage with indexing).
| But I'm having trouble finding potential customers. I've tried
| Google search ads but got 99% spam and 1% potential investors,
| but 0% potential customers. If anybody has any ideas I'm all
| ears.
|
| 1. https://www.haystackdb.dev/
| sanswork wrote:
| You need to be doing enterprise sales not marketing. There is a
| lot of advice here and in general on that but you definitely
| need to be making calls with that type of business.
| padjo wrote:
| Yep nobody with the problem you're offering to solve is going
| to solve it by googling and picking some random company
| they've never heard of with no track record.
| bjornsing wrote:
| Not even click a search ad and fill in a contact form? When
| I'm on the other side of the table I do that. But perhaps
| I'm unique in that aspect?
|
| (I understand there won't be any significant business
| without enterprise sales. But that's not what I'm looking
| for at this stage.)
| narnarpapadaddy wrote:
| The companies that have these types of problems all have
| AWS reps (or whatever vendor) that get first crack at a
| solution, even if their senior engineers or CTOs do some
| googling. Frequently discounts can be negotiated on
| products that aren't a perfect fit, or companies will get
| early access to new products that solve their problem
| (AWS calls this "limited preview").
|
| A good chunk of B2B infrastructure products like this are
| developed using a "golden partner" model. The first
| customer (or few) gets a free or reduced cost license,
| the developer gets a real-world scenario with real data
| to use to figure out what the minimal functionality
| actually is to be a marketable product and to work out
| bugs. This arrangement frequently requires a preexisting
| relationship and trust between both parties.
| bjornsing wrote:
| Yep. Always a bit dangerous to go up against AWS and
| similar. My hope here is that this product is too niche
| for the major cloud vendors to invest in. But since Uber
| is building stuff themselves that assumption may be
| wrong.
|
| A "golden partner" model makes a lot of sense, thanks.
| narnarpapadaddy wrote:
| If you know a technology leader at a company that has
| this problem, reach out and ask if they've had any pain
| related to it. Ask to take them out to lunch and tell
| them about a solution you've been working out. Or even a
| short demo call. See if they'd be interested in an
| "innovation partnership." You give them a discounted/free
| license (can be time limited to a year or however long it
| takes to validate that your solution works and saves them
| money - then returns to full-price), they agree to
| feature prominently in your marketing material or provide
| a reference for your next lead.
| macspoofing wrote:
| >When I'm on the other side of the table I do that
|
| No, you don't. There are many established storage
| solutions out there. If you're in the market for one, you
| can easily fill days, weeks or months vetting those. So,
| why would you bother dealing with a sales rep from a
| random one you never heard of before, and isn't used by
| anyone. You don't even provide any details on what makes
| it different or better from anything else out there.
| bjornsing wrote:
| Well the reason I'm working on this in the first place is
| that when I was on the other side of the table I was
| looking for one. I filled in the contact forms of a
| couple of different startups that had products somewhat
| in line with what I was looking for, and talked to their
| sales reps. Admittedly they weren't as early stage as my
| project, but on the other hand they weren't 100% focused
| on my use-case either.
|
| I guess what I'm trying to say is that I was hoping that
| someone with a _write intensive workload_ would want to
| spend some time evaluating a product built specifically
| for that. But perhaps I'm wrong? Even if your workload
| was 99% writes you'd rather go to some established player
| (e.g. MongoDB) with a product optimized for 50 /50
| read/write?
| macspoofing wrote:
| >I guess what I'm trying to say is that I was hoping that
| someone with a write intensive workload would want to
| spend some time evaluating a product built specifically
| for that.
|
| Again, it's not clear to me exactly what it is you're
| doing that's any different from the plethora of existing
| off-the-shelf solutions.
|
| You're saying that you started this project/company
| because you were looking for a solution to a specific use
| case (write-intensive workloads) and existing options
| didn't work - can you expand on that? Can you create a
| chart, for example, that lists out the specific things
| that Haystackdb does and alternatives don't? Presumably,
| if you optimize for write-intensive workloads, there are
| some drawbacks when it comes to reads - no? Or maybe
| storage? That's good to highlight.
|
| What you need are whitepapers/blog posts/youtube
| videos/talks at conferences/etc. that highlight the
| technical details of your solution, because you're trying
| to get technical people interested in your product to the
| point where they will invest time to learn more.
| bjornsing wrote:
| Well, it's pretty simple: HaystackDB is designed from the
| ground up for write intensive workloads, so it's much
| more economical than existing off-shelf-solutions for
| that type of workload. Is that not clear from the landing
| page?
|
| From pricing: "$0.2 per million writes, $20 per million
| reads". The typical cost profile is $2 per million
| read/writes, or even more for writes.
| yau8edq12i wrote:
| Forgive me because what follows will sound harsh, but I
| think you need to hear it based on your response.
|
| > HaystackDB is designed from the ground up for write
| intensive workloads
|
| Okay.
|
| > so it's much more economical than existing off-shelf-
| solutions for that type of workload.
|
| That's a leap in logic. Just because you designed it with
| this workload in mind, well, doesn't automatically mean
| that it's any good for this workload (or any workload).
| If solving a problem was as easy as declaring "I will
| design my solution from the ground up for this problem",
| then we'd all live in peace and harmony. So that's what
| people are asking you here: _how_ do you make your DB
| "much more economical" for that type of workload? What
| technology, what ideas have you had to make it possible?
| If you don't want to reveal that, then you need _proof_
| that it 's better than the competition, not a
| _declaration_ , that it's better than the competition.
|
| > Is that not clear from the landing page?
|
| It's clear that you want to _market_ your solution as
| something good for write-heavy workloads. Why should we
| believe you 've done a good job _designing_ your
| solution?
|
| > From pricing: "$0.2 per million writes, $20 per million
| reads". The typical cost profile is $2 per million
| read/writes, or even more for writes.
|
| Who knows how you came up with pricing? Perhaps you're
| betting on your customers being stupid and not realizing
| that taking a 10x hit on the price of reads will lose
| them (and earn you) more money in the long run. After
| all, what good is writing to a DB if you never read from
| it...? Or perhaps it's some kind of promotional / loss
| leader pricing that will change soon in the future. In
| any case, it's, again, not proof that your solution is
| adapted to the customer's problem.
| bjornsing wrote:
| > Forgive me because what follows will sound harsh, but I
| think you need to hear it based on your response.
|
| No worries. I appreciate you taking the time.
|
| > you need proof that it's better than the competition,
| not a declaration, that it's better than the competition
|
| Fair point. I realize I'll need that before making any
| sales. But I was hoping to get a few leads from the
| contact form without it.
|
| > Perhaps you're betting on your customers being stupid
| and not realizing that taking a 10x hit on the price of
| reads will lose them (and earn you) more money in the
| long run. After all, what good is writing to a DB if you
| never read from it...?
|
| No it's not a malicious trick. There are use-cases where
| most records will never be read back. For example, if you
| go into the Uber app you can find a history of all your
| trips and you can click one and bring up a receipt for
| it. Most users will rarely if ever do that. So you end up
| writing many more receipts to your database than what
| you'll ever retrieve.
| macspoofing wrote:
| >Is that not clear from the landing page?
|
| The marketing byline you have on your landing page is
| clear enough, but nobody will take that seriously without
| a deeper technical description.
|
| When I read it, I assumed you wrote some code to move
| data in and out of lower-cost S3 or Glacier storage tiers
| because you don't control storage pricing and you run on
| top of existing public cloud infrastructure. Maybe I'm
| right, maybe I'm wrong - but if I'm looking for a
| solution, I need to assess whether I should invest time
| and effort to do a deeper dive, and that's the box I
| would put you in, without any more detail.
|
| Anyway, good luck. Hope it works out.
| PaywallBuster wrote:
| Maybe find more articles like the above
|
| try connect to the respective people at said teams via LinkedIn
| and ask feedback
| bjornsing wrote:
| Perhaps. But I have a feeling it's too late once they've
| started building something in-house. Any ideas on how I could
| find the ones that will publish an article like this one year
| from now? That's the ones I'm after, I think.
| smokel wrote:
| With a disclaimer that I have no formal nor practical
| background in marketing, here are some ideas:
|
| 1. It is a bit unclear to me when I would use Haystack. The
| main advantage seems to be cost cutting. It would be nice to
| see some realized examples of this.
|
| 2. When competing for price, you may look like the cheap, and
| thereby untrusted alternative. There is a risky business
| paradox here, for which I am sure a fellow HN poster will
| supply the name: you charge less, therefore you make less, and
| you will not be able to sustain the service, making me not want
| to spend money.
|
| 3. Have you tried looking for companies that may actually need
| this solution? Have you tried contacting them directly?
| bjornsing wrote:
| 1. Good point, thanks.
|
| 2. True. One reason I haven't priced it ridiculously cheap is
| to avoid this judgement, and fate. With this pricing I won't
| necessarily have a smaller profit margin than competitors.
| The cost advantage comes from a smarter architecture. Any
| ideas on how I can communicate that would be greatly
| appreciated.
|
| 3. I used to work for one that needed it. I've also
| interviewed at one that had the same problem. A bit hesitant
| to reach out to potential customers though before I have a
| solid product I can deliver. But perhaps I shouldn't be?
| kaibee wrote:
| > 3. I used to work for one that needed it. I've also
| interviewed at one that had the same problem. A bit
| hesitant to reach out to potential customers though before
| I have a solid product I can deliver. But perhaps I
| shouldn't be?
|
| Companies generally have to be suffering pretty badly to
| take a risk on changing their tech stack to something
| unproven. And the risk for you at that point is that they
| choose to spend 10x on consultants to implement some
| existing system instead.
|
| The CTO needs to trade off the opportunity cost of
| developing new features/existing maintenance against
| integrating an unproven product. How can you de-risk this
| for them? (Even just showing that you recognize that this
| is the case can help)
|
| Maybe this is a time to "do things that don't scale". ie:
| offer to integrate it into their system for them (for at
| least some small part/pain point), and likely in parallel
| so that they can evaluate it without taking down the
| existing system.
|
| Just my two cents.
| superasn wrote:
| One observation I have regarding your homepage is that the
| message isn't very clear. The headline doesn't mention any
| benefits I get from using your software.
|
| I think you should invest some time into improving your landing
| page and maybe you may see some traction. A good resource for
| this which I've bookmarked is here(1). Hope that helps.
|
| (1) https://www.indiehackers.com/post/my-step-by-step-guide-
| to-l...
| bjornsing wrote:
| Thanks. So you think it would work better if it would just
| say "save money", rather than jump straight into the "what"?
|
| To me, when I read the below, that just screams "save money".
| But maybe I should do that conversation for the reader so to
| speak?
|
| From benefits box: "Sometimes you need to index a huge amount
| of data, to accelerate just a few search queries. But
| building indexes and keeping them in hot storage can be
| expensive. HaystackDB builds only the indexes needed for sub-
| second query latency, across billions of keys, while keeping
| all your data in low-cost object storage like S3."
| superasn wrote:
| I asked chatgpt for a headline based on your prompt and it
| gave me this:
|
| HaystackDB: _Swift Searches, Massive Savings_ - Index
| Billions, Store Smartly, Query in a Flash!"
| bjornsing wrote:
| Thanks! That headline is pretty good. :)
| csomar wrote:
| Looking at pricing, it's crazy expensive (and that comparing to
| AWS, which is crazy expensive). How do you justify that?
| bjornsing wrote:
| The idea is that it should be about a tenth of the cost
| compared to S3 or DynamoDB. Is that not how you read the
| pricing? Or do you just think that's still too expensive?
|
| EDIT: Or maybe it's because reads are expensive? That's a
| consequence of the write optimization. The idea is that
| potential customers will be doing 90%+ writes.
| emerongi wrote:
| Besides all the other good feedback here, I will offer my
| extremely petty reason why I wouldn't spend much time
| evaluating this product. In the FAQ, under the "Are
| transactions fully ACID?" heading, there is a typo:
| "simultaineously". It gives me the impression that not enough
| care has gone into an important part of this product. I know
| it's not a fair jugdgement, but first impressions matter.
| bjornsing wrote:
| That's easy to fix, thanks.
| apwell23 wrote:
| typo was not the point of the comment though :)
|
| also is there a demo or some sort of technical whitepaper.
| bjornsing wrote:
| Not yet. Is that something you'd find compelling?
| Anything in particular you'd like to see?
| jgalt212 wrote:
| Some B2B and B2B2C products just don't work with walk-in leads.
| You need someone to create and chase down a set of leads.
| bjornsing wrote:
| Good point. Any ideas how I could experiment with this "on
| the cheap"? Any tools I could use to identify and contact
| leads?
| macspoofing wrote:
| The market is saturated with storage products, so it's tough to
| differentiate yourself. Your site does not help by the way.
| You're also not selling an end-user product to the public,
| rather you're selling a technical and infrastructure solution
| to very technical people - that's a different type of sale. To
| get those people interested, you must put together technical
| whitepapers/blog posts/webinars/youtube videos/etc. to explain
| your approach.
| shrubble wrote:
| Find just 1 customer.
| Sevii wrote:
| Most places I've worked we couldn't even consider using a
| product that wasn't supported by a major cloud provider like
| this. What problem does your product solve that customers
| absolutely 100% need it?
|
| Structurally, you are a small entity trying to compete on cost
| with hyper scalar cloud providers and open source software.
| Most ISVs like you charge a ton of money for big problems very
| few enterprise customers have.
|
| I think you need to find a specific use case where your product
| is a clear winner. Like 'HaystackDB is the best option for
| healthcare exchanges to use when receiving claims'.
| bjornsing wrote:
| This is a good summary of why I'm hesitant to put more work
| into it.
|
| The counter argument I guess is that developing your own data
| store in-house should be even more of a no-no, and companies
| do that. (One example is obviously Uber, but my previous
| employer is another example.)
|
| Do you think the option to self-host the product would help
| tip the scale?
|
| > What problem does your product solve that customers
| absolutely 100% need it?
|
| To be blunt there is no such problem: you can always throw
| more money e.g. at DynamoDB. But if you have a very write
| intensive workload (such as the use-case described in the
| OP), then you can save 90% of that money.
| bastawhiz wrote:
| > $20 per million reads
|
| Quite frankly, this is not gonna work. I manage a system with a
| very write-heavy workload (lots of small writes) and even
| though our writes far outpace our reads, this pricing makes
| your system about ten times more expensive than an RDS cluster.
|
| There's no data about performance. There's no information on
| how or whether data is persisted to durable storage before a
| write is acknowledged. There's not even any information on how
| big keys or values can be. There's no public information on
| support.
|
| When choosing a system like yours, my priorities are:
|
| 1. Data safety
|
| 2. Performance
|
| 3. Cost
|
| ... In that order. You've done nothing to educate me on 1 and 2
| and your pricing isn't better than what you're seeking to
| displace.
|
| When your product is a tool for developers, show up with hard
| facts about your product. Zero people (as you've seen) are even
| remotely interested in building a product on top of a system
| without knowing whether the system will hold up to their use
| case. And other than a very anemic FAQ section, you have no
| documentation at all, whatsoever.
| bjornsing wrote:
| All valid points. I guess I'm hesitant to put time into
| documentation and similar, if I can't somehow find a steady
| stream of sales prospects.
|
| > even though our writes far outpace our reads, this pricing
| makes your system about ten times more expensive than an RDS
| cluster
|
| That indeed sounds off... Are you sure you're comparing the
| total cost to that of an RDS cluster? I am aware that reads
| will be more expensive (due to the write optimization), but I
| was hoping most customers would make it back on cheap writes.
| Also the storage itself ($0.23 per GB-month) should be much
| cheaper than RDS.
| bastawhiz wrote:
| My total database is maybe 400 gigs. Most of the writes
| overwrite existing data, so storage cost isn't a concern.
| With the cost of an upfront RI for the year on RDS (with
| basically as many iops as I can use), your solution gives
| me ~100 million reads. That's...like a month of usage at
| best.
|
| At least I'm my case, the fundamental problem you're facing
| is that reads are just too expensive. Writes and reads tend
| to grow at the same pace in many products: there's a ratio
| that tends to stay the same as you scale. $20/million reads
| is just a _lot_. The ratio of writes to reads for your
| pricing needs to be 100:1 or more for it to make sense for
| me, but I'm more like 10-20:1.
|
| > I guess I'm hesitant to put time into documentation and
| similar, if I can't somehow find a steady stream of sales
| prospects.
|
| This is part of why a database company is hard to build.
| You will simply not find anyone willing to give you money,
| because the alternative is going to be a solution your
| customers already know and understand and which is likely
| extremely mature. You're competing with Postgres and Mongo.
| You can't ship a database product that doesn't work: you're
| asking people to build on you for their _storage
| primitive_. If you fuck up, that 's a business-ending event
| for your customer. You've either got to come to the table
| with an extremely compelling product ("I couldn't build my
| business without this") or you've got to show why someone
| should trust you over an established but somewhat more
| expensive alternative.
| bjornsing wrote:
| > The ratio of writes to reads for your pricing needs to
| be 100:1 or more for it to make sense for me
|
| Correct. I bet Uber's use case here is something like
| 1000:1. I've worked on systems that were over 1000000:1.
| That's where HaystackDB makes sense.
|
| > but I'm more like 10-20:1.
|
| Then RDS is hard to beat.
| erik_seaberg wrote:
| You might be surprised how often a deep graph of
| microservices ends up rereading the same prior
| transactions over the course of stateful payment
| processing and on-demand payouts. DynamoDB can give you
| 2.6 million short reads of base load (1 RCU/s
| provisioned) for $0.12 per month, which would make a $65
| alternative (2.6 * $5 + 2.6 * $20) a hard sell.
| victor106 wrote:
| I would seriously consider an open source business model with
| an appropriate licensing model. I see lot of companies are open
| to experimenting with open source db's.
| bjornsing wrote:
| Good point. I'm thinking about releasing an open source (or
| source available) "frontend" for it, and just charge for the
| "cold storage backend". How would you feel about that?
| oldprogrammer2 wrote:
| The homepage could benefit from more tangible examples, because
| right now I can't discern where it fits into my current stack.
| For most companies, it would be replacing something in a
| specific context.
|
| Like a side-by-side example. Doing "work" on BigTable (show
| code examples) versus doing the same "work" on Haystack. Then
| show the specific metrics on how Haystack is
| cheaper/faster/better.
| cess11 wrote:
| To consider your product an alternative I'd like to see
| benchmarks that seem trustworthy, something like a Jepsen
| analysis or case studies at existing customers, and be able to
| test it within the EU, i.e. not on US:ian services.
|
| Seems you're in the vicinity of Lund, should be a 'science
| park' or similar close to the uni where you can find companies
| that have problems you could solve. Talk to 'incubators',
| 'accelerators' and the like there.
| rguillebert wrote:
| So they saved $0.000006 per record, it's really about the little
| things...
| theanirudh wrote:
| I wonder if they considered https://tigerbeetle.com
| geodel wrote:
| Would be interesting. Considering TigerBeetle is written in
| Zig. And Uber is probably only rare big company which has
| support contract with Zig foundation.
| rmccue wrote:
| Original story looks to be https://www.uber.com/en-
| AU/blog/migrating-from-dynamodb-to-l...
| deadbabe wrote:
| So did the engineers who proposed this get some kind of bonus
| considering how much money they saved the company?
| Galanwe wrote:
| Employees are constantly saving cost or adding value, that's
| what they are paid for.
| zinglersen wrote:
| If a project fails, do you pay for the loss since you want a
| share of the profits as well?
| deadbabe wrote:
| Someone probably gets fired so I guess someone does pay the
| ultimate price.
| zinglersen wrote:
| Losing your job because the outcome of your efforts (or
| even external events) is not what I would call the ultimate
| price.
|
| "The metaverse division has now lost more than $45 billion
| since the end of 2020"
|
| Your compensation for your work is your salery. So I would
| say that it's fair that the actual risk taker is benefiting
| from the potential rewards?
| HeatrayEnjoyer wrote:
| The "risk takers" are not taking at any risk at all.
| What's the chance they end up on the street, or even
| suffer personal financial stress about their life? That
| they will have to move, sell their car, home, etc. It's
| 0%.
| zinglersen wrote:
| What.. they are taking a lot of risk...
|
| But I guess we first have to agree on "who" we are taking
| about - is it the company itself or the owner /
| shareholders ?
|
| Back to your question, yes that could happen in several
| different cases. But of course the risk/benefit is not
| split 50/50 (nor 0 risk, 100 upside, as you said), in
| reality the future outcome depends on both internal and
| external events.
|
| Even the richest(?) man in the world was relatively close
| to loosing it all;
|
| Musk, who had $200 million in cash at one point, invested
| "his last cent in his businesses" and said in a 2010
| divorce proceeding, "About four months ago, I ran out of
| cash." Musk told the New York Times
| https://www.cnbc.com/2017/04/27/the-crucial-decision-
| teslas-... https://archive.nytimes.com/dealbook.nytimes.c
| om/2010/06/22/...
| cynicalsecurity wrote:
| Is it an offer to become a shareholder without actually
| buying any shares? That would be absolutely great, but
| unfortunately, it doesn't work this way.
| zinglersen wrote:
| That's the beauty of it, you can choose to spend you
| money how you want!
|
| You wouldn't want all your earnings to be in stocks, you
| want liquidity. For example investing your earned money
| into a public company, or buying food.
| chasd00 wrote:
| Getting to say you led the effort that saved $6M and resulted
| in some blog posts is probably the reward. At my firm,
| associating your name to dollars is the fastest way up the
| corporate ladder.
| HeatrayEnjoyer wrote:
| Exactly. Work should be owned by the workers.
| drpotato wrote:
| The original[1][2] articles are a better read IMO. The link is
| just a summary of the two with added spelling and grammatical
| errors that materially impact the meaning.
|
| 1. https://www.uber.com/blog/how-ledgerstore-supports-
| trillions...
|
| 2. https://www.uber.com/blog/migrating-from-dynamodb-to-
| ledgers...
| intunderflow wrote:
| Seems to happen with all our blog posts that appear on here (I
| work at Uber) - I don't get why the originals don't get upvoted
| but these rehashes do - are our titles just not as good?
| gronky_ wrote:
| Yes, that's definitely the main reason. It's called "burying
| the lede".
|
| Saving $6M is key information that makes this story
| interesting. It's buried all the way at the bottom of the
| first blog and is completely missing from the second blog
| which focuses specifically on the migration
| dboreham wrote:
| TaaS : title as a service
| k1t wrote:
| People have done this, eg https://www.reddit.com/r/Growth
| Hacking/comments/k20g42/ai_to...
|
| However that appears to be defunct now
| alexchantavy wrote:
| I'm usually guilty of this. The hands-on person involved in
| a highly technical project gets excited and bogged down in
| the details of the project that they end up not being the
| most compelling storyteller about it.
| ComodoHacker wrote:
| Don't blame yourself. Not everyone is here for the money,
| many of us are here for the tech.
| masklinn wrote:
| > I don't get why the originals don't get upvoted
|
| Because they were never submitted? I looked for the first
| one, it doesn't seem to be on HN.
| brushfoot wrote:
| Personally, yes, the rehash's title is stronger. It tells a
| story whose ending piques your curiosity to read more.
|
| "Uber Migrates" (beginning: company that I'm interested in
| does something) "1T records" (middle: that's a lot of
| records; I wonder what happened) "from DynamoDB to
| LedgerStore" (hmm, how do they compare?) "to Save $6M
| Annually" (end: that's a good chunk of change for me, but was
| it worth it to Uber? Why did it save that amount? Let me read
| more)
|
| It's a simple and engaging "there and back again" story that
| paves the way for a sequel.
|
| Versus:
|
| "How LedgerStore Supports Trillions of Indexes at Uber" (ah,
| okay, a technology supports trillions of indexes. Moving on
| to the next article in my feed)
|
| "Migrating a Trillion Entries of Uber's Ledger Data from
| DynamoDB to LedgerStore" (ah, a big migration. I'm not sure
| who did it or whether anything interesting came of it, or
| even whether it happened or is just theoretical because of
| the gerund, and moving one trillion of something is cool but
| not something I probably need to read about right now, so
| let's move on)
|
| YMMV. Some probably prefer the more abstract/less narrative
| titles, but the first one is more of an attention grabber for
| me.
| beanjuiceII wrote:
| i mean it could use a few "blazing fast" sprinkled about
| barfbagginus wrote:
| And you can't have blazing fast without rust, and a little
| kvetching about lifetimes
| vsnf wrote:
| While your broader point is well taken, isn't Uber a
| famous Go shop?
| barfbagginus wrote:
| Lol I was not being on topic or constructive - just
| repeating the meme that rust is synonymous with "blazing
| fast", because of endless statements to the effect of
| "rust is blazing fast," or "if you want blazing fast
| code, use rust," or the endless blazing fast rust
| libraries:
|
| https://duckduckgo.com/?q=blazing+fast+rust
|
| Now I'm not an expert in either rust or go. But I know my
| deductive meme logic:
|
| 1. Uber's solution is not blazing fast
|
| 2. They are a Go house
|
| Then the meme implies:
|
| 3. Their solution is slow because they did not use rust!
|
| Q.E.M. (Quod Erat Memonstrandum)
| IanCal wrote:
| Other than the comments about titles, the entire blogpost
| doesn't show for me with ublock. So I'll open it, see a
| picture of some birds, scroll around for a bit then give up.
| pests wrote:
| That's probably because you are running software that is
| meant to hide content on a page.
| IanCal wrote:
| What's the purpose of this comment?
|
| My point is that a random dev running a pretty plain
| adblock (aren't we all?) simply cannot view their post.
| This is down to uber, their practices, an external
| developer and how uber create their blog (they don't just
| have the content in the page). If I'm not a special case
| with extremely weird luck, a bunch of devs seeing links
| to their posts will open them and not see any actual
| content. They will then, I assume, be less likely to
| upvote them.
|
| _Given that they are seeing problems with posts being
| upvoted_ this seems somewhat relevant.
| pests wrote:
| I have no issues reading their blog with uBlock Origin.
|
| You are running software that is blocking content you
| want to read. That is my point.
|
| If I put on blinders and then complain I can't see your
| stuff, that's my fault not yours - regardless if your
| stuff is good or the worst annoying spam ever. If I want
| to see it for some reason, maybe I should take off the
| blinders
| IanCal wrote:
| > You are running software that is blocking content you
| want to read. That is my point.
|
| Yes. It's my point too. I am running very standard
| software for a dev and it is stopping their dev blog
| posts being visible.
|
| > If I put on blinders and then complain
|
| I'm not complaining. I'm explaining, given the evidence I
| have, why they may be seeing poor results on HN. If I'm
| not alone (and since I have no custom setup designed to
| keep our their blog posts that would be a surprise) then
| there are other developers who cannot see their posts.
| simion314 wrote:
| Ad Blocking is recommended by USA government agency for
| security reasons, not running an ad blocker is a
| dangerous and suggest lack of information/education about
| IT stuff.
| pests wrote:
| Agreed, but if legit content gets blocked you only have
| yourself to blame.
|
| Like turning off JS and saying webapps don't work
| anymore.
| IanCal wrote:
| And if someone with a js heavy blog asked why it wasn't
| getting traction on a lynx centered forum they'd probably
| be told that their content wasn't readable for a portion
| of the users.
| nvr219 wrote:
| Loads fine for me with ublock. Perhaps you have a custom
| rule blocking something?
| IanCal wrote:
| Nothing custom, so it must be on a list somewhere.
|
| edit - it doesn't have to really be blocking the actual
| post here even, if their loading code breaks when some
| other tracking code doesn't run, that could explain it.
| leadingthenet wrote:
| I have the exact same problem, except on Uber Eats.
| ckluis wrote:
| Just put all your articles into a customGPT with the examples
| from the rehashes for each one and then ask the GPT to
| rewrite your title to the a "rehash" like title for the new
| posts ;)
| dang wrote:
| Ok, we've changed to the second link from
| https://www.infoq.com/news/2024/05/uber-dynamodb-
| ledgerstore....
|
| Submitters: " _Please submit the original source. If a post
| reports on something found on another site, submit the latter._
| " - https://news.ycombinator.com/newsguidelines.html
| igammarays wrote:
| I wonder if 1.7 petabytes of data (1T indexed records) could fit
| on a single (very) beefy baremetal server for under a couple
| thousand dollars a month, served by SQLite.
|
| Like this: https://use.expensify.com/blog/scaling-sqlite-
| to-4m-qps-on-a...
| kondro wrote:
| Given 30.7TB SSD's are about $5500 each and you'd need 56 to to
| get to 1.7PB (with no redundancy). Not to mention that SQLite's
| maximum DB size is 140TB.
|
| I don't think you'd be able to fit this much storage into a
| single machine, especially not for a few thousand a month and
| SQLite wouldn't be appropriate for this use-case.
| bayindirh wrote:
| If you install a RAID controller and a couple of disk boxes,
| it's possible with 1:1 replication, or with backups. 60 disk
| 3.5" units already exist, so 2.5" SSD racks. It won't be
| cheap, but will be resilient and fast. _Bloody fast_ if you
| have the budget.
| zaphirplane wrote:
| > or with backups take a while to restore a PB and a way to
| take a hot backup without impacting the service that by
| itself is a task or snapshots which is more disk
|
| > 1:1 replication Depending on the amount of writes could
| be a ton of extra disk and a bucket for network cost
| bayindirh wrote:
| These systems support zero-downtime snapshots. You tell
| it to snapshot, it instantly snapshots, you can run a
| differential/incremental backup at great speeds. Your
| RAID controller is already caching the hot data, so the
| impact is minimal.
|
| Except network cost there's no extra disk required. It's
| just broadcasted writes consumed on the other hand.
|
| These boxes are not dumb JBODS. They support their own
| replication/backup subsystems, so everything is
| transparent.
| Closi wrote:
| Resilient and fast from a disk perspective, but in practice
| massively bottlenecked by the fact that Sqlite can only
| have 1 writer at a time.
| Neil44 wrote:
| At the moment they're just paying someone else to buy $5000
| SSD's and run a database on them at many X markup.
| omeid2 wrote:
| There is no upper bounds to economy of scales. Maybe there
| is for the cents per GB of raw storage, but power usage,
| security, rent, and everything else scales too, and few of
| them have upper bounds on economy of scales.
| Retric wrote:
| Economies of scale generally have upper limits. Often
| when you approach the largest scale the existing market
| will supply you essentially need to become your own
| supplier which then runs into span of control issues. The
| organization needs to become competitive in that new
| market or their costs increase.
|
| Keep scaling and eventually vertical integration ends up
| looking like a Soviet style planned economy. Your remote
| mining town needs some way for people to get soap etc so
| you open a store with it's own supply chain etc etc.
| ndriscoll wrote:
| There are 61.44 TB NVMe drives (best price I've seen right
| now is ~6200. They were ~4800 earlier this year). You can
| have a 1U server with 32 E1.L slots so you should be able to
| fit ~1.9PB raw storage into 1U for a little over $200k. Don't
| know how business financing works, but at 8% interest with a
| 5 year amortization, that's a bit over $4k/month.
| mobilemidget wrote:
| Do you have any good recommendation for such 1U server with
| 32 slots? Thanks
| perryh2 wrote:
| Supermicro https://www.supermicro.com/en/products/nvme?pr
| o=formfactor%3...
| jakjak123 wrote:
| Our ops team actually wanted to do this, but we on the
| project have nightmares from putting 1PB of database on a
| single host ><
| choppaface wrote:
| StorageReview plays with 2PB flash machines all the time
| https://www.youtube.com/watch?v=UQMKtlIjeuk
|
| 1PB in a rack with spinning rust + flash buffer has been easy
| for years now.
| BlackLotus89 wrote:
| No it won't. sqlite "only" works with up to 281TB [0] [1]
|
| [0] https://www.sqlite.org/releaselog/3_33_0.html
|
| [1] https://www.sqlite.org/limits.html (#12)
| sgt wrote:
| You can split up into 10 SQLite DB's on this individual
| server.
| cdchn wrote:
| You've now implemented sharding on top of SQLite.
|
| Eventually all programs will be able to read email.
| mrbungie wrote:
| Any non-trivial complexity codebase eventually implements
| a mediocre SQL/Lisp/etc.
| Closi wrote:
| Exactly! Why take a system not designed for this sort of
| scale and force it to scale, rather than use systems
| which are designed and tested for this scale and volume?
| All you will do is hackily re-invent all the other things
| that the other databases had to do to scale to this
| extent.
|
| Plus size is only one limit, you would be limited to 1
| write every few milliseconds. My napkin maths estimate is
| that there are at least 1-2m writes per hour going into
| this thing, so probably 300-600 writes / second (Average)
| and maybe over 1k writes/second peak. We are going to
| fall over here!
|
| Not sure why some people seem to have a viwe of "There is
| no scaling problem that can't be solved with a sufficient
| enough number of SQLite databases".
| zaphirplane wrote:
| > You can split up into 10 SQLite DB's on this individual
| server.
|
| 1 is a scalable, managed, highly available service, with
| economies of scale the other is a fixed size, capital
| expenditure with fixed performance, limited DR, requiring a
| couple of SRE/DevOps and colo
|
| There is also the will it always work question
| mlnj wrote:
| Just storing petabytes of data is not the issue. Managing
| and querying it reliably is.
| chasil wrote:
| Beware of WAL mode, as you sacrifice ACID in this
| configuration.
|
| https://sqlite.org/lang_attach.html
|
| 'Transactions involving multiple attached databases are
| atomic, assuming that the main database is not ":memory:"
| and the journal_mode is not WAL. If the main database is
| ":memory:" or if the journal_mode is WAL, then transactions
| continue to be atomic within each individual database file.
| But if the host computer crashes in the middle of a COMMIT
| where two or more database files are updated, some of those
| files might get the changes where others might not.'
| nemothekid wrote:
| Once you are splitting up 10 sqlite dbs you have a bespoke
| distributed system anyways, and you will find yourself
| doing all the headache of LedgerStore anyways.
|
| Most of the novel work in LedgerStore is probably around
| managing the headaches of distributed storage, not the
| persistence layer.
| tinyspacewizard wrote:
| Also a bit scary to have a system without a scaling mechnism
| built-in in the path of customer traffic. At some point you may
| be racing to upgrade it.
| sgt wrote:
| How would you replicate that SQLite DB onto other hosts to
| achieve redundancy?
| thangngoc89 wrote:
| One could use Litestream [1]
|
| [1]: https://litestream.io
| szundi wrote:
| What if a continuous replication system has a bug one day,
| and you realize you are just a bit corrupted and have to
| rerun? Or is it the same with cloud tools?
| thangngoc89 wrote:
| That's why you always test your backup. I backup the full
| sqlite.db every day and test the litestream replication
| every week. So far litestream have been solid.
| zaphirplane wrote:
| By the time the TB is restored, time to start the next
| test
|
| How do you detect restored but bit flipped data ?
| thangngoc89 wrote:
| I do this in backup testing: sqlite3
| /path/to/db sqlite> PRAGMA integrity_check;
|
| See SQLite3 documentation:
| https://www.sqlite.org/pragma.html#pragma_integrity_check
| zaphirplane wrote:
| Sounds like it will take awhile for TB and it checks db
| integrity not data integrity
| mickeyp wrote:
| Would you care to tell us what your backup and restore
| policy would be for 1.7 PB of data?
| thangngoc89 wrote:
| I'm replying to the question of how one would replicate
| SQLite 3 in production for redundancy. I myself consider
| 10GB would be the limit for using SQLite 3 in read/write
| in production and switch to PostgreSQL.
| sgt wrote:
| That's a huge discrepancy. One half of HN wants to put
| petabytes on SQLite, while your limit is only 10GB.
| Closi wrote:
| Why not use SQLite's own guidance on where SQLite
| probably isn't appropriate:
|
| - Client/Server applications (Check)
|
| - High-volumes (Check)
|
| - Large datasets (Check)
|
| - High concurrency, particularly for writes (Check)
|
| https://www.sqlite.org/whentouse.html
| jeltz wrote:
| Then the same happens as when there is a bug in Aurora's
| replication. You lose data. I know this from personal
| experience.
| anonzzzies wrote:
| Any open source doing something similar ?
| jiripospisil wrote:
| Litestream is open source.
|
| https://github.com/benbjohnson/litestream
| anonzzzies wrote:
| Ai. I remember another product and thought it was this.
| Sorry. Move on and keep up the good work.
| riku_iki wrote:
| it will take forever to create that index. Link describes 10B
| rows dataset.
| sanderjd wrote:
| Sometimes things just aren't nails, even when you have a really
| good hammer.
| khaki54 wrote:
| The value proposition of commercial loud isn't cost savings
| unless you manage to quantify all of the ancillary and
| extrinsic factors such as security risk, HVAC, datacenter
| personnel, and hardware lifecycle. Any well capitalized and
| organized company could build their own cloud much more
| cheaply, but really a significant portion of the calculation is
| outsourcing the risk components.
| igammarays wrote:
| The problem with outsourcing the risk components is that you
| don't know for sure whether they are properly taken care of.
| Major cloud providers have been caught "oopsing" your data,
| and bam it is gone. Furthermore, they have no incentive to be
| more efficient about it, they could easily be using 10x the
| amount of resources necessary, and you wouldn't even have a
| clue, you're just paying for evermore expensive crap that
| becomes less reliable over time.
| ddorian43 wrote:
| But the cloud providers compete with other other! Look at
| the efficient market in display in their bandwidth pricing!
| PretzelPirate wrote:
| For very large customers, the cloud providers do compete
| with each other on cost. They often pay different prices
| than are advertised.
| ownagefool wrote:
| > Any well capitalized and organized company could build
| their own cloud much more cheaply
|
| Lots of orgs fail to turn money into talent and then talent
| into products.
|
| It just takes one bad hire at senior level and suddenly your
| cloud is a vmware install where all machines are boot off
| network disk, and contention makes the entire thing fall
| over.
| pclmulqdq wrote:
| You wouldn't want to do 1T records on one server even if you
| could. At that scale, you would prefer to be somewhat
| distributed for availability and scalability. Also, SQLite has
| issues at large scale.
|
| A reasonable number for one server is about 32-128 TB, and 1.7
| petabytes with some redundancy fits nicely in ~30 servers with
| a decent distributed database.
| klysm wrote:
| Sure but then you get a whole new set of costs and folks you
| have to hire to maintain that hardware.
| siva7 wrote:
| Maybe it could and now you got 99 new Problems. That's why more
| experienced decision makers won't allow this to happen.
| Closi wrote:
| 1.7 petabytes on Sqlite?
|
| Sqlite's own advice:
|
| > If your data will grow to a size that you are uncomfortable
| or unable to fit into a single disk file, then you should
| select a solution other than SQLite. SQLite supports databases
| up to 281 terabytes in size, assuming you can find a disk drive
| and filesystem that will support 281-terabyte files.
|
| > Even so, when the size of the content looks like it might
| creep into the terabyte range, it would be good to consider a
| centralized client/server database [over SQLite].
| cheeze wrote:
| This is the worry IMO. It's fine to dump it on a server with
| SQLite, but once you start hitting scaling limits, you're in
| for a potentially rough migration.
| callalex wrote:
| One of the main reasons you put up with the annoyances of
| tuple-based storage like DynamoDB is because you want extremely
| high availability that simply cannot be provided by one
| computer in one physical location.
| benterix wrote:
| I read the article so I roughly know what LedgerStore is - but I
| have no idea where it is hosted.
| tiew9Vii wrote:
| From one of the original sources linked in this thread
|
| > LSG promised shorter indexing lag (i.e., time between when a
| record is written and its secondary index is created).
| Additionally, it would give us faster network latency because
| it was running on-premises within Uber's data centers.
|
| https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...
| xiwenc wrote:
| Is this another outlier when you reach certain scale, it's more
| beneficial to roll your own? Pretty amazing what Uber has to deal
| with.
|
| Also it's not very clear from the original articles, what is the
| new total "cost of ownership" of this new refactored service.
| Like now they need to manage their own databases and the storage
| backing them. Or did i miss it?
| crabbone wrote:
| I worked for a company which used Redis at the prototyping
| phase, but then wrote own database to improve performance and
| resilience. The company wasn't selling an end-user facing
| product, the product was a distributed filesystem.
|
| My take on this is that most companies don't have the expertise
| to build systems like databases, and even if the costs would
| otherwise suggest such a development as desirable would be
| simply afraid of doing it.
| ForHackernews wrote:
| Does no one ever delete data? It's hard to believe there's much
| business value in keeping every individual payment record dating
| back to 2017.
| sanderjd wrote:
| I'm not sure if they have regulatory obligations to keep them,
| or what, but it still seems like you could back them up to cold
| storage after a reasonable period of time.
| robertlagrant wrote:
| It might just be an internal policy to cover all the crazy
| combinations of regs the world over. They might just say
| 10/20/100 years is their policy, now figure out how to store
| it.
| moooo99 wrote:
| Payment information is often subject to pretty strict
| regulatory requirements, including archival durations. Having
| to keep all the original information for 10 years is not
| entirely uncommon.
| crabbone wrote:
| In systems that deal with money, money-related data is
| virtually never deleted. The reason is the fear that deletion
| can be exploited somehow in the future, rather than the old
| data being actionable.
|
| For example, if a customer registers with the name of a deleted
| customer, which will resurface some "unfinished" transactions
| or rules associated with the older version of the "same"
| customer that haven't been properly deleted but appeared to be
| deleted for a while.
|
| Also, in general, deletion is very difficult because money
| doesn't just disappear. You'd need some sort of compaction
| (think: Git squash) rather than deletion to be able to balance
| the system's books... but then you'd be filling the system with
| fake transactions...
|
| From my experience from working with these kinds of systems,
| the typical solution is to label entities with active/inactive
| labels to substitute deletion. But entities never go away.
| kobalsky wrote:
| I agree with you, but there is a plus for deleting old data.
|
| If you are not required to keep the information for more than
| X years, and you still keep it, then you have to provide it
| when it's requested.
|
| If you didn't keep it, then it can't be used against you.
|
| If you delete it after it was requested, then you are in
| trouble.
| nolongerthere wrote:
| At an individual level I appreciate when an app or service I
| use maintains all records from the start of our relationship,
| I've infrequently found myself going back and looking for
| something, and it's always a breath of fresh air to see that
| nothing was deleted.
| dang wrote:
| Sorry for the offtopicness, but please see
| https://news.ycombinator.com/item?id=40418627 regarding a
| flamewar that happened over a week ago. It's important that
| this not happen again.
| washywashy wrote:
| I pretty much never see engineering salaries factored into these
| types of savings projects. I assume because engineers are already
| viewed as a sunk cost or maybe it's just because it's way less
| tangible. Have seen many designs describe how X saves Y dollars
| but ignores the engineering effort to maintain and build it. Half
| the time I suspect it's just so people have something to work on,
| rather than it being some critical fix.
| gtirloni wrote:
| If anything, it reduces Uber's exposure to AWS' proprietary
| technology. I don't know how to measure how much that's worth
| but they probably do.
| vertis wrote:
| Companies this size almost certainly have different terms of
| use. I worked for a smaller, but still ASX200 company that
| had a custom contract, and assigned staff that would drop by
| 2-3 days a month. Of specific note was that if AWS wanted to
| stop doing business with us they had to give at least X
| notice (from memory that was 12 months for us).
|
| For our risk profile this was more than enough time to
| migrate off any AWS' proprietary technology.
|
| That makes it worth less to avoid exposure.
| scarface_74 wrote:
| This usually comes from people who have never done a mass
| migration at scale.
|
| You're always dependent on your infrastructure. Even if you
| have nothing but everything hosted on a bunch of VMs, it can
| take years and millions of dollars to migrate.
|
| No, just use Terraform and Kubernetes is not the answer.
|
| The typical enterprise is dependent on depending on the
| source between 80 - 120 SaaS products - ie outside vendors.
| gtirloni wrote:
| *> Even if you have nothing but everything hosted on a
| bunch of VMs, it can take years and millions of dollars to
| migrate.
|
| I'd assume it takes fewer millions to migrate your own tech
| stack from AWS to somewhere else than it takes to migrate
| from AWS proprietary solutions. Is that reasonable?
| scarface_74 wrote:
| No because you still have to deal with permissions,
| integrations with AWS services like networking, training,
| security audits, regression testing, often physical
| network connections (Direct Connect), DNS...
|
| And you're dealing with your PMO department, project
| managers, finance, security, contract negotiations,
| retraining your ops department...
|
| And you know that Aurora MySQL instance that was suppose
| to prevent "lock in"? I bet you someone somewhere in your
| org thought about creating an ETL job and then said
| forget it and used "select into S3" to move data from
| MySQL into S3.
|
| As a project manager trying to ship code so you can show
| "impact" to put on your promo doc, are you going to
| choose for your team to spend weeks to write an ETL job
| to prevent "lock in" or are you going to tell the
| developer to write that one line of SQL?
|
| There are all sorts of choices you can make that will
| save time and money and ship features that actually
| deliver value instead of worrying about the boogie man of
| "lock in".
|
| And I really hope that there was some better technical
| reason than just saving $6 million dollars a year for a
| multibillion dollar company to go through the migration.
| gtirloni wrote:
| Thanks for the insights. So in the case that it's
| actually more expensive to migrate your own tech stack
| somewhere else than, say, migrate from AWS proprietary to
| GCP proprietary, it seems there might be other reasons.
| scarface_74 wrote:
| The difficulty would be worse of course if you depend on
| anything proprietary from the cloud vendor.
|
| But the main question is, once you do all of this work
| and spend time to be "cloud agnostic", does it add
| business value?
|
| In the case of Dropbox, it made sense to move from the
| cloud. In the case of Netflix, they decided to move to
| the cloud.
|
| But you can't stay completely "cloud agnostic".
|
| Let's take a simple case of using Kubernetes and building
| the underlying infrastructure using Terraform.
|
| The entire idea behind Kubernetes is to abstract your
| infrastructure - storage, load balancers, etc.
|
| But eventually, you still have to deal with what's
| underneath. I used AWS's own Docker orchestration service
| for years - ECS. But I just learned Kubernetes last
| month.
|
| I still had to know how to troubleshoot problems with IAM
| permissions, load balancers, view CloudTrail logs for
| permission issues, know how the underlying storage
| providers worked, make sure I had the right plug
| installed for K8s to work with AWS's infrastructure etc.
|
| Once I got all of that figured out, then I could go
| through the tutorials and mind map the difference between
| ECS and AWS's Kubernetes implementation - EKS.
|
| But I had years of experience with AWS. I could have
| never easily troubleshoot the same types of issues with
| Azure's or GCP's version of K8s. Now multiply that by an
| entire department.
|
| Once everything is configured correctly, a developers
| experience would be the same across environments
|
| Migrations at scale are always a pain from one system to
| another.
|
| Source: I worked at AWS in the Professional Services
| department for three years. I'm mostly a developer and I
| dealt with the "modernization" side of "lift and shift
| and then modernize".
| scop wrote:
| That was where my mind went to when I saw the headline. Granted
| that while I'm not on the Finance side of things and am in fact
| a developer, "six million" didn't seem like much at all
| considering engineer salaries. It's certainly an achievement,
| but at what short and long term salaried cost?
| Rastonbury wrote:
| $6m across 2.5 years is like $15.5m, how many engineers man-
| months to breakeven, I'm pretty sure it was worth the work.
| mbesto wrote:
| $6m in perpetuity is like infinite and engineers can be
| fired.
| robocat wrote:
| It really isn't. The first years dominate the value and
| later years are worth nothing due to inflation. Google a
| calculator and use a reasonable discount rate and I
| suspect you will find that todays value for an infinite
| perpetuity is a lot less than your intuition might guess.
| It always surprises me.
| minkzilla wrote:
| But they will save more than $6M the second year because
| AWS will up their prices.
| mbesto wrote:
| What?
|
| It costs me $10M to run something every year, it now
| costs me $4M to run something every year. I have $6M in
| my pocket every year now in perpetuity. Compound that
| annually with the assumption that I maintain or increase
| top line revenues and thats pure extra profit.
|
| Note - I admit, all of this ignores two key things (a) we
| dont know the engineers salaries who built this and (b)
| we dont know the ongoing maintenance costs.
| rapht wrote:
| A primer on valuation: in many financial contexts, $1 of
| operating savings may be worth much more than $1 of
| investment.
|
| That is because an investment is a one-off, so it's actually
| worth $1, but the savings are recurring, so they are worth
| the same number of years that a company's profits are valued.
| Depending on sector and investors' beliefs in the future of
| companies, this factor is typically in the 5-20x range. That
| means that $1 of savings is well worth at least $5 of
| investments.
|
| Factor in anything you want!
| shermantanktop wrote:
| In an ideal org, perhaps. In many places, forming that team
| starts a process where it continually finds reasons to
| still exist, so your $1m is yearly until a reorg.
|
| Sigh.
| admax88qqq wrote:
| If that $1 of investment doesn't yield any returns that's
| not an investment it's just an expense.
|
| So yes $1 of savings is worth more than $1 of spending.
| TheNewsIsHere wrote:
| You could potentially take it as an investment loss for
| tax purposes. Whether that's proper depends greatly on
| the circumstances surrounding how the money was accounted
| for and spent.
| vertis wrote:
| A better strategy as a company this size would be to write the
| PRD for moving and then call AWS and negotiate.
| rmbyrro wrote:
| I think it's likely that they tried this. But DynamoDB is
| expensive to consume probably because it's expensive to run
| and maintain. If you develop for a particular use case, a lot
| of optimizations can reduce these costs. For a large enough
| business, the fixed costs of in-house are easily amortized.
| It'd be hard for AWS to compete.
| dboreham wrote:
| $6M/y is something like 20 heads (depending on where they are,
| could be more). So probably it's a win. Hard to see that this
| could take more than about 5. Add cost of hay and water of
| course.
| tinyhouse wrote:
| It's less that 20 heads. The gross spend for each engineer is
| probably closer to $0.5 million annually. You can layoff 5%
| without any impact on the company and save so much more. A
| company like Uber ($130B market cap) isn't going to bother
| with building something internally to save $6M/year. The only
| reason to do it is improve efficiency that actually improves
| the user experience, which then we're talking about a big
| deal. Sometimes those things happen only because engineers
| don't have anything else to do and someone needs a
| promotion...
| hackernewds wrote:
| is it? if you consider the value 20 engineers could drive
| instead in that time
|
| if you assume they wouldn't have had anything else meaningful
| to work on during that time to save money, then you have a
| different problem in the company. $6M seems like the value 1
| engineer can drive in a company at the scale of Uber
| appplication wrote:
| You don't need to consider the cost they could drive during
| that time. You have a direct and tangible savings for
| engineering time invested. That possible value they could
| otherwise derive is moot and hypothetical, this is the real
| deal!
|
| But if we're being honest, there isn't actually any
| meaningful quantification of engineering time to understand
| return on investments at this level (not to say there's
| _none_ , but it sure does get wish washy). Corporate and
| engineering strategy isn't so carefully weighed, and to
| believe otherwise is to fall victim to the pseudoscience
| that is software estimation. You just have to estimate
| directionally if a given proposal has you heading in a
| better direction in the long term, pursue that, and course
| correct along the way.
|
| Put another way, the end state justifies the means and
| resourcing. It's rarely possible to fully understand either
| the costs or benefits with much accuracy up front. You
| slowly put more resources into projects that show promise,
| and revoke them if the projects do not appear to be heading
| in a value add direction.
| cj wrote:
| >You don't need to consider the cost they could drive
| during that time.
|
| You don't _need_ to, but you 100% should. "Opportunity
| cost" (cost of not doing something) is real.
|
| This is the problem with all refactoring/migration
| projects. It's very easy to get a lot of people to agree
| a company should migrate from Node to Go or Monolith to
| Microservices (or to clean up a mountain of tech debt),
| but it's much harder to justify the time it takes away
| from building things your users care about.
| hibikir wrote:
| True, but often the project that was supposed to build
| something users care about turns to dust. On one side,
| you have rosy projections. On the other, a cap on gains,
| so sure everyone picks the first, but nobody measures if
| it worked.
|
| One can build a great career working only on key,
| promising initiatives that never amount to any value in
| the end. By the time it's clear the project lost money
| outright, you are on to something else.
| tinyhouse wrote:
| Spot on. $6M/annually is not much of a saving for a company
| like Uber ($130B market cap). It only makes sense if it's also
| more efficient and actually improves the app.
| tayo42 wrote:
| Yeah in one quarter from a quick google it looks like Uber
| profits $1.6 Billion. Basically why I never thought about
| cost savings projects after I learned to put things into
| perspective.
|
| People really struggle with large numbers in business I've
| noticed.
| smrtinsert wrote:
| A quick look at their careers shows they hiring eng seemingly
| anywhere but the US so maybe they've already saved the dollars
| there.
| dangus wrote:
| Because they often are a sunk cost.
|
| E.g., you can't lay off an entire SRE team and have nobody on
| the on-call rotation. If some of their project work is cost
| control that is basically free cost savings.
| debarshri wrote:
| There was an era around 2015, when all the cool tech companies
| like netflix, spotify, soundcloud, uber and others were building
| alot of infrastructure and database tools. Nowadays, engineers
| often talk in AWS/Cloud terminologies.
|
| It is breathe of fresh air to see that orgs are still building
| tools like that.
| augunrik wrote:
| Is there some information on why they need to store this much
| data for immediate retrieval? And why is it so much?
| Antony90807 wrote:
| Wow crazy amount of work went into this. Well done
| influx wrote:
| I would gladly pay 6 million/year to not be on call, and have to
| worry about things like bios and ssd firmware ever again.
| geodel wrote:
| Thats a great situation to be in when one can spend 6 million
| even when there was some chance to save.
|
| I tried same for ready to eat meal everyday to save me from
| potential kitchen disasters but sadly numbers didn't work out.
| influx wrote:
| You're not saving money, controlling your own destiny sure.
| That's worth something, maybe even more than 6 million, but I
| was a SRE at Uber who had to be oncall for systems like this,
| believe it or not, people like me aren't free either :)
| tedd4u wrote:
| My "favorite" non-cloud issue was dying/dead RAID card
| batteries in DB hosts (to preserve unflushed cache in RAM on
| the card in case of power failure).
| boringg wrote:
| How much did the migration effort cost?
| geodel wrote:
| More power to them. At this point even technically decent
| teams/companies have given up on developing large, complex
| systems in favor of SaaS. After _carefully evaluating our
| strategic course of action_ answer always is AWS.
|
| Its only team who propose alternative they have to justify
| rigorously how come they differ in conclusion.
| dboreham wrote:
| My Amazon stock thanks you.
| tonyhart7 wrote:
| "At this point even technically decent teams/companies have
| given up on developing large, complex systems in favor of SaaS"
|
| Yeah until those bill come, They would consider alternative
| cess11 wrote:
| Not if you're in the EU, due to, among other things, Schrems
| II.
| graemep wrote:
| AFAIK Schrems II prevents transfers of data to the US.
|
| AWS has datacentres around the world, including multiple
| locations in the EU.
| cess11 wrote:
| Where did you learn that?
|
| Schrems II prohibits transfer of personal information to
| companies reachable by the CLOUD act.
| ramesh31 wrote:
| Another victim of the "Great Normalization", i.e. that entire
| generation of garbage tech debt generated during the 2010s that
| was built on NoSQL stores that never should have been, is now
| coming due. You could probably make an entire consulting business
| out of migrating these things to MySQL.
| cgh wrote:
| This is exactly the comment I came here to make. The NoSql
| technical debt accretes like dead leaves blocking a sewer
| drain. Eventually someone has to wade in and normalize...the
| sewer grate, I guess. Okay, it's not a perfect analogy.
| riku_iki wrote:
| > . You could probably make an entire consulting business out
| of migrating these things to MySQL.
|
| I think it will be very complex task to run MySQL for 1PB 1T
| transactions..
| jcims wrote:
| Every time I've ever used DynamoDB it cost way more than I would
| have ever expected.
| alexey-salmin wrote:
| Yes
| ledgerdev wrote:
| Say you wanted to build an app on a database like LedgerStore but
| at much smaller scale, what are the best open source options out
| there right now?
| superzamp wrote:
| We have a pretty minimal setup at formancehq/ledger[1] that
| uses pg as a storage backend, and comes with a programmability
| layer to model complex transactions (e.g. sourcing $100 from
| three accounts in cascade).
|
| [1] https://github.com/formancehq/ledger
| qwertyuiop_ wrote:
| Assuming there are a minimum of two teams a total of 20
| maintaining this in-house software, I gave 250k as cost per
| engineer (salary plus health and other benefit costs to the
| company). Thats $5 million right there. I am estimating lowest
| range. Thats why Amazon calls these efforts undifferentiated
| heavy lifting. is there a slight premium to pay than rolling your
| own and maintaining yes. Its worth all the trouble and security
| and management overhead into rolling your own.
| xyst wrote:
| Uber must have picked up some Google rejects. This type of
| homegrown project was seen at Google all the time.
|
| Usually to aim for a significant promotion.
|
| "Designed and built homegrown system to save $Xm! Give me promo,
| bro?"
|
| Just so happened to ignore that it took X+Y additional to build.
| Also it will probably be going to the G graveyard in a few years.
| cj wrote:
| $6m in annual cost savings is truly unremarkable, if we are to
| believe levels.fyi [0] [1]
|
| If you're truly paying engineers, project managers, etc $500k a
| head, it dramatically undermines the financial cost savings.
|
| It very well might be the case that "We spent $25m of
| engineering resources to save $6m annually".
|
| [0] https://www.levels.fyi/companies/uber/salaries/software-
| engi...
|
| [1] https://www.levels.fyi/companies/uber/salaries/software-
| engi...
| booi wrote:
| That's what I was thinking and fully loaded cost at least 35%
| more than their salary.
|
| Imagine trading 5 headcount full-time to manage the 1T+ fully
| custom database on an ongoing basis when they could have just
| used DynamoDB and have been done with it.
|
| Or better, having to engineer a new feature that already
| existed in DynamoDB and just losing money at that point.
| dangus wrote:
| Of course now we are assuming that the existing solution
| didn't also require engineering salaries to maintain.
| leoqa wrote:
| Bitter story time:
|
| I made a config change to our AWS instances and projected
| approximately $10MM/year in AWS costs savings (pre-savings).
|
| My boss asked me "Who told you to do this? We need to focus
| on $project instead". I found another team and transferred
| out. 3 months later there was a big fire drill about AWS
| costs and they took my 1-pager and executed it. Didn't get
| any credit in the shipped email nor did the manager reach out
| to apologize.
| cj wrote:
| The organization still got the end result, though (to play
| devil's advocate). That sounds like a win for the company.
| They got the cost savings, plus they redirected attention
| back to a project that was higher priority than saving $10m
| 3 months earlier than they could have.
| heavenlyblue wrote:
| Of course you didn't. You used your time to promote
| yourself instead of doing what you were asked to do
| instead. That could have cost a promotion for your manager
| who could have promoted you.
| noncoml wrote:
| I don't know you are downvoted. Aligning the personal
| interest vector with the companies interest vector is a
| huge problem that is usually underrepresented in NH
| comments.
|
| Usually we only complain about the short sighting of the
| CEOs that prioritize short term stock gains over long
| term prosperity, but that also is just a specialized case
| of the success vector misalignment
| shawabawa3 wrote:
| > we spent $25m of engineering resources to save $6m annually
|
| This is a huge ROI. Borrowing $25m costs about $1.25m/yr so
| you're winning even with no upfront costs
| valiant55 wrote:
| I mean, ideally you are still employing those people into
| the future. Plus was there other opportunities to drive
| value that would've been better spent?
| vineyardmike wrote:
| Ideally for the people, but not a requirement. I doubt
| they won't be conducting more layoffs either.
| choppaface wrote:
| Yes but since revenue is growing at over 70% due to squeezing
| out the drivers, there's more money to spend on fighting
| Amazon over the DynamoDB contract
| https://www.forbes.com/sites/lensherman/2023/01/16/ubers-
| new...
| amanda99 wrote:
| Right, but that's $25m in R&D investment. Much better than
| $6m in cost of good/services delivered! Former is great
| innovation and will be ignored by investors because it's just
| a fixed cost on the way to becoming profitable. The latter is
| going to appear in the marginal cost of services calculation.
| password4321 wrote:
| https://news.ycombinator.com/item?id=38300425#38322311
|
| > _Uber is famous for NIH syndrome_
| amluto wrote:
| The thing I find odd about this is that the headline figure is
| about old immutable records. Almost all of that 1.7PB is
| ancient by what seems to be to by any practical standard. Uber
| is not likely to care about the credit card authorization flow
| for a ride two years ago, except maybe for analytics.
|
| If I were doing this, I would be looking at data warehousing
| systems. 1.7PB of, say, Parquet files in S3 is not terribly
| expensive. 1.7PB of Parquet files in on-prem or collocated
| object storage, even replicated a zillion times, is quite
| cheap. And quite a few companies and open-source projects are
| currently competing aggressively to provide awesome tools for
| querying that data.
|
| The hot data would fit on basically anything -- the choice
| should be about robustness and barely even consider cost per
| TiB. Datomic got written up recently and seems credible for
| this type of application. FoundationDB is bulletproof. Postgres
| could probably handle it without breaking a sweat, although
| active/active replication isn't free. Heck, writes straight to
| a warehouse with a cache in front to help with reads seems
| credible -- Uber rides rarely go for longer than a couple
| hours, and back-of-the-envelope math suggests that the total
| data rate is maybe 50GB/hour. An entire day of data for an
| entire country would fit on a single very ordinary commodity
| server, and the live data for _the entire world_ would fit on
| one mildly beefy server. The indexes involved sound
| straightforward.
| crmd wrote:
| The primary use case is not analytics. This data store is the
| system of record in their credit card authorization and
| billing pipeline, and so it has extreme consistency
| requirements. The lion's share of its engineering is to
| provide consistency across a large spectrum of failure modes.
|
| Old data could probably live at lower cost in a data
| warehouse, but then developers would have multiple systems
| and namespaces to deal with in order to query on
| transactions.
| 8ytecoder wrote:
| Which is ...normal. Different types of data access have
| different SLA requirements. Any company that cares about
| cost will warehouse data after x time frame. It's done even
| in banking. Uber has a much more lenient need to make this
| data instantly available.
| jiggawatts wrote:
| Immutable data is always consistent. This is almost
| certainly an append-only ledger, a well-established
| solution for a simple problem.
| MarkMarine wrote:
| I heard from some X-Uber people that you could call Uber a
| database company as much as you could call it a transportation
| company. Something like 80+ databases invented there in one
| form or another.
|
| Promotion-driven development. I suppose better than blog post
| driven development, but marginally so.
| vineyardmike wrote:
| > I suppose better than blog post driven development, but
| marginally so.
|
| I find DB development quite interesting, and I find Uber's
| core product quite not interesting (from an engineering
| perspective).
|
| So as an outsider without any financial stake in the company,
| please keep writing about databases!
| foota wrote:
| The article states that they already had an in house solution for
| cold data, so one of the benefits they claim is simplifying by
| moving to one system for both hot and cold data.
| citizenpaul wrote:
| I think there is some reckoning of cloud service providers
| coming(assuming logical actors...). I was doing some contract
| work for a small place that had a GCP Bigtable that was costing
| $11k+ per month for some reports that were based on data from a
| 375MB !!! mysql db into big-table for the reports to run.
|
| They hired some out of school data scientist to do reports and
| they were doing crazy ineffective things with the tiny dataset.
| Wanted me to fix it for pennies tomorrow and I declined.
| remus wrote:
| Not that I disagree with your overall point, but I don't think
| this
|
| > I was doing some contract work for a small place that had a
| GCP Bigtable that was costing $11k+ per month for some reports
| that were based on data from a 375MB !!! mysql db into big-
| table for the reports to run.
|
| Is a good example. It's just a badly architected system, and
| you'd have exactly the same problem if you were running the
| same thing on a massively over provisioned on premise db.
| PeterZaitsev wrote:
| I think this is fantastic illustration of how expensive
| proprietary cloud based data stores can be... and what it is
| feasible to migrate from them to something else.
| benced wrote:
| $6M... isn't that much?
| SkyMarshal wrote:
| It seems LedgerStore is not open source [1], and finding any info
| on it requires following a trail of backlinked Uber blog posts.
| Here's one with the most info on LedgerStore that I can find,
| from 2021:
|
| https://www.uber.com/en-US/blog/dynamodb-to-docstore-migrati...
|
| [1]:https://github.com/uber
| PeterZaitsev wrote:
| Yeah. This looks like some internal solution. In general Uber
| seems to be high on "not invented here" scale - they like to
| conclude no existing Open Source solutions are good enough for
| them and they need to build their own... this is different from
| Facebook approach for example which chose to made MySQL better
| by adding MyRocks/RocksDB to it and keep them Open Source.
| vertis wrote:
| It's a weird world where Facebook/Meta is becoming a small
| bastion of hope. Llama 2/3 being an example of bucking the
| trend of going closed source for LLM models.
|
| Granted it's not quite in the same calibre as OpenAI/Claude,
| and the real test is when it is and they still release it.
| alexey-salmin wrote:
| I don't know about economics of this particular project but damn
| dynamodb is expensive. At some point I was thinking that everyone
| else was just using it wrong, doing scans and queries instead of
| point-wise lookups into pre-computed tables.
|
| It turns out however that even when you use it as a distributed
| hashtable you still pay a huge premium.
| pojzon wrote:
| You look at stuff like that and think about
|
| ,,how much talent is wasted on pointless things that help noone
| in the world while getting paid heaps for nothing"
|
| We could accomplish everything if ppl stopped wasting time on
| pointless tasks.
| otterley wrote:
| Does anyone know whether Uber considered Amazon QLDB for the
| implementation? Seems like it might have been a good fit, at
| first blush.
| yazaddaruvala wrote:
| Reading the article it's clear pretty quickly that Uber was using
| DynamoDB poorly.
|
| It seems they need strong consistency for certain CUJs and then a
| lot of data warehousing for historical transactions.
|
| It's strange to me that they didn't first convert their 2 table
| DynamoDB architecture into DynamoDB and Redshift architecture or
| similar. This is a pretty common pattern.
| PartiallyTyped wrote:
| I don't understand why they needed 2 weeks of _immutable_
| transactions in Dynamo. Could anyone give any hints?
___________________________________________________________________
(page generated 2024-05-20 23:00 UTC)