[HN Gopher] Migrating Uber's ledger data from DynamoDB to Ledger...
       ___________________________________________________________________
        
       Migrating Uber's ledger data from DynamoDB to LedgerStore
        
       Author : gronky_
       Score  : 267 points
       Date   : 2024-05-20 10:01 UTC (12 hours ago)
        
 (HTM) web link (www.uber.com)
 (TXT) w3m dump (www.uber.com)
        
       | drexlspivey wrote:
       | > Uber migrated all its payment transaction data from DynamoDB
       | and blob storage into a new long-term solution
       | 
       | No way they have 1 trillion transactions right?
        
         | rco8786 wrote:
         | 1T "records". Any given transaction can have N records. I'm
         | assuming this includes Uber Eats as well.
        
           | drexlspivey wrote:
           | Still, they have 10B rides in 2023 including Eats, say
           | 75-100B since inception. What would be a record such that
           | each transaction needs 10-15 on average?
        
             | ndr wrote:
             | Consider it might be quite de-normalized as typical at
             | scale.
             | 
             | Some records for the customer, some for the driver, some
             | for the restaurant...
        
               | eru wrote:
               | You might even have a few more.
               | 
               | Eg you might have a record for each stage of the meal.
               | When it's ordered, when it's cooked, when it's delivered,
               | etc.
        
             | re-thc wrote:
             | > What would be a record such that each transaction needs
             | 10-15 on average?
             | 
             | Does it have to be 1-dimensional? Depends exactly what
             | payments is. There are refunds, discounts, paying e.g.
             | drivers. There are also things like monthly subscriptions
             | people can subscribe to for discounts / unlimited uses.
             | Lots of things add up.
        
             | csomar wrote:
             | I can see that as transactions with credit cards go through
             | lots of process (withholding, approval, charging, settling,
             | etc..)
        
             | shrubble wrote:
             | They need to pay the driver and they need to handle taxes;
             | that alone triples your estimated 100B.
        
             | rco8786 wrote:
             | > 75-100B
             | 
             | This seems low, off the bat. 15 years of Uber, 9 years of
             | Uber Eats.
             | 
             | But even just looking at my most recent trip with Uber,
             | there are 7 different records visible on the receipt. Not
             | including backend recordkeeping that isn't exposed to the
             | user (driver payments, driver loan repayments, revenue
             | recognition, internal fees/records, etc).
             | 
             | Total trip amount, Trip fare, Booking fee, Tip, State fee,
             | Payment #1 (trip itself), and Payment #2 (driver tip)
             | 
             | Now consider Uber Eats where there is (at least) one record
             | for each item in an order...plus tax, tip, etc as always.
             | 
             | Then consider things like wait time charges, subscriptions,
             | split charges, pending charges, chargebacks, refunds,
             | disputes, blah blah blah.
             | 
             | An average of 10 records per customer transaction seems
             | entirely reasonable.
        
         | bjornsing wrote:
         | The blog post says billions of transactions and trillions of
         | indexes (or rather index entries I presume), if I remember
         | correctly.
        
       | sha_r_roh wrote:
       | Congrats to anyone who worked on it! However, I'm guessing the
       | cost of just running this team be quite large and not
       | significantly different from the savings (6M), and add on top of
       | it the overhead of maintenance. Payments would not likely be a
       | long-term bet as well, so kind of interesting why teams take up
       | such projects ? Is it some kind of sunk-cost with the engineering
       | teams you already have?
        
         | bjornsing wrote:
         | > Payments would not likely be a long-term bet as well
         | 
         | How so? It's a pretty ubiquitous problem...
        
         | kondro wrote:
         | The estimate sounds suspiciously similar to just the data
         | storage component of DynamoDB. 1.7PB of data and indexes is
         | about $5.1m/year in DynamoDB storage at list.
        
           | sakjur wrote:
           | Supporting that, Uber's blog post linked from the article
           | mentions cost savings as a benefit from going from three
           | systems to one, and doesn't really mention any dollar figure
           | afaict.
           | 
           | https://www.uber.com/en-AU/blog/migrating-from-dynamodb-
           | to-l...
        
         | bachmeier wrote:
         | > I'm guessing the cost of just running this team be quite
         | large and not significantly different from the savings (6M),
         | and add on top of it the overhead of maintenance
         | 
         | I'm guessing they know a lot about their costs, and you know
         | very little. There's little value in insulting the team members
         | like this.
        
           | whamlastxmas wrote:
           | It's not insulting to speculate in a conversational way
           | around the errors we very very commonly see
        
           | szundi wrote:
           | That was not a nice reply for a non-insult. Do you have
           | anything to add maybe?
        
             | bachmeier wrote:
             | > That was not a nice reply for a non-insult.
             | 
             | It's an insult if you dismissively explain basic things to
             | the folks working on the project.
        
           | inoop wrote:
           | > I'm guessing they know a lot about their costs, and you
           | know very little.
           | 
           | I'm curious what makes you believe the OP doesn't know about
           | cost? They might be director-level at a large tech company
           | with 20+ years experience for all you know...
           | 
           | > There's little value in insulting the team members like
           | this.
           | 
           | I'd argue it's not insulting to question a claim (i.e. 'we
           | saved $6MM') that is offered with little explanation.
        
             | qaq wrote:
             | Regardless of position at some other company it will tell
             | you precisely 0 about this specific situation.
        
         | smokel wrote:
         | At one end of the spectrum, some people here claim to write
         | this kind of software over a weekend. Some others claim they
         | require a salary of $600,000, and still need nine additional
         | colleagues to pull something like this off.
         | 
         | There is a lot of room in between, where cost estimates are
         | more realistic.
        
           | szundi wrote:
           | This answer pretty much sums a lot of my experience. Of
           | course when the guy somehow pulls this off in 2 weeks it is
           | seen as an easy side project with proof that it is, haha
        
             | datadrivenangel wrote:
             | This is why incentives favor the heavy bloated enterprise
             | approach: if it looks expensive, people feel like they got
             | something good for their money.
        
           | renegade-otter wrote:
           | Plenty of things can be prototyped over a weekend, but many
           | will require months and even years to get production-ready,
           | feature-complete, and useful, especially at scale.
        
         | cdchn wrote:
         | Developing and maintaining a totally bespoke DB system with
         | that kind of volume even for $5m/yr, spitball you could get
         | yourself 25 top-notch engineers without AI PhDs and have
         | another mil left over for metal. Sounds plenty feasible to have
         | a nice tailored suit for a core part of your business.
        
           | inoop wrote:
           | > you could get yourself 25 top-notch engineers without AI
           | PhD
           | 
           | Not in the US though. According to levels.fyi, an SDE2 makes
           | ~275k/year at Uber. Hire 25 of those and you're already at
           | $6.875MM. In reality you're going to have a mix of SDE1,
           | SDE2, SDE3, and staff so total salaries will be higher.
           | 
           | Then you gotta add taxes, office space, dental, medical, etc.
           | You may as well double that number.
           | 
           | And that's just the cost of labor, you haven't spun up a
           | single machine and or sent a single byte across a wire.
        
             | silverquiet wrote:
             | Work from home doesn't mean that home has to be in the US.
        
             | cdchn wrote:
             | "and have another mil left over for metal" was the part
             | accounting for hardware, infrastructure, etc.
             | 
             | And you can fudge the employee salary a mil or two either
             | way, but the point is that spending that much on a team to
             | build something isn't infeasible or even unreasonable.
        
             | cthalupa wrote:
             | > Then you gotta add taxes, office space, dental, medical,
             | etc. You may as well double that number.
             | 
             | Economies of scale help a bit with this for larger
             | companies, so it's probably not quite double for Uber, but
             | yeah, not too far off as a general rule of thumb. Probably
             | a 75% increase on the employee facing total comp to get
             | fairly close to the company's actual cost for the employee.
        
           | aeyes wrote:
           | It doesn't sound like they needed to implement a new DB
           | system for this.
           | 
           | This is using existing features of Docstore which is Uber's
           | own DynamoDB (sharded MySQL) which they seem to be using for
           | almost everything.
        
           | davedx wrote:
           | Is accounting really a core part of Uber's business? They're
           | a transportation company not a bank. I kind of question the
           | premise really
        
             | cdchn wrote:
             | Uber is a technology company that tracks 'rides' between
             | drivers that are contractors and customers, and accounts
             | for taking money from one and giving it to another. I
             | wouldn't just call it a core part, I'd go so far as it say
             | it is the intrinsic essence of their business. They're not
             | a bank, but they're not running a branch with tellers
             | taking cash and running ATMs, either.
        
               | shermantanktop wrote:
               | They are in the transportation market serving
               | transportation needs for a transportation-seeking
               | customer base. How they accomplish that is obviously
               | interesting, but their attempts to move laterally haven't
               | been amazing from what I can tell (though I don't follow
               | them closely).
               | 
               | They are structured and run like a tech company but imo
               | they don't produce a tech product.
        
         | mlrtime wrote:
         | You're assuming that the team only works on this product. It is
         | possible they are owners of a lot more than just 1 db.
        
         | inoop wrote:
         | I'd be curious as well to see a more complete cost-benefit
         | analysis, and I'd be especially interested in labor cost.
         | 
         | We don't know how much time and head count Uber committed to
         | this project, but I would be impressed if they were able to
         | pull this off with fewer than 6-8 people. We can use that to
         | get a very rough lower-bound cost estimate.
         | 
         | For example, AWS internally uses a rule of thumb where each
         | developer should generate about $1MM ARR (annual recurring
         | revenue). So, if you have 20 head count, your service should
         | bring in about $20MM annually. If Uber pulled this off with a
         | team of ~6 engineers, by AWS logic, they should about break
         | even.
         | 
         | Another rule of thumb I sometimes see applied is 2x developer
         | salary. So for example, let's assume a 7-person team of 2xSDE1,
         | 3xSDE2, 1xSDE3, and 1xSTAFF, then according to levels.fyi that
         | would be a total annual salary of $2.3MM. Double that, and you
         | get $4.6MM/year to justify that team annual cost footprint,
         | which is still less than $6MM.
         | 
         | Of course, this is assuming a small increase in headcount to
         | operate this new, custom data store, and does not factor in a
         | potentially significant development and migration cost.
         | 
         | So unless my math is completely off, it sounds to me like the
         | cost of development, migration, and ownership is not that far
         | off from the cost of the status quo (i.e. DynamoDb).
        
           | Rastonbury wrote:
           | Not an engineer, but something like this takes 6-8 people
           | working on only this for a full year?
        
             | inoop wrote:
             | That has been my experience, yes. You need one full-time
             | manager, one full-time on-call/pager duty (usually a
             | rotation), and then 4-6 people doing feature development,
             | bug fixes, and operational stuff (e.g. applying security
             | patches, tuning alarms, upgrading instance types, tuning
             | auto-scaling groups, etc. etc.).
             | 
             | Maybe you can do it a bit cheaper, e.g. with 4-6 people,
             | but my point is that there's an on-going cost of ownership
             | that any custom-built solution tends to incur.
             | 
             | Amortizing that cost over many customers is essentially the
             | entire business model of AWS :)
        
           | shrubble wrote:
           | If the savings are 6 million per year, then in later years it
           | should pay off since the development is a one time cost.
        
             | inoop wrote:
             | The cost doesn't suddenly drop to zero once development is
             | done. Typically a system of this complexity and scale
             | requires constant maintenance. You'll need someone to be
             | on-call (pager duty) to respond to alarms, you'll need to
             | fix bugs, improve efficiency, apply security patches, tune
             | alarms and metrics, etc.
             | 
             | In my experience you probably need a small team (6-8
             | people) to maintain something like this. Maybe you can
             | consolidate some things (e.g. if your system has low on-
             | call pressure, you may be able to merge rotations with
             | other teams, etc.) but it doesn't go down to zero.
        
               | shrubble wrote:
               | If you follow the various links on the Uber site, you see
               | that they have multiple applications sitting on the same
               | database. see https://www.uber.com/blog/schemaless-sql-
               | database/ . It's not just 1 design of a database, with 1
               | application on top...
        
         | qaq wrote:
         | If you read the article the system was a layer on top of
         | DynamoDB they updated it to use internal product Docstore which
         | required adding a feature to Docstore. So it's not as involved
         | as people make it out to be. Also records are immutable which
         | makes a lot of things way easier.
        
         | lfmunoz4 wrote:
         | Off the self software doesn't make sense for a company that is
         | planning on lasting a long time. These solutions are all
         | designed for multiple use cases. That means that there is
         | complexity and inefficiencies that are not required for your
         | particular problem. If you were to just focus on your problem
         | wouldn't you just end up at an ASIC as the most optimal
         | solution? Reason most software doesn't is 1) people like to re-
         | invent the wheel 2) As you go start going lower level the less
         | qualified people you can find.
        
       | bjornsing wrote:
       | I'm working on a specialized data store[1] that would be perfect
       | for this kind of use case (large "cold" storage with indexing).
       | But I'm having trouble finding potential customers. I've tried
       | Google search ads but got 99% spam and 1% potential investors,
       | but 0% potential customers. If anybody has any ideas I'm all
       | ears.
       | 
       | 1. https://www.haystackdb.dev/
        
         | sanswork wrote:
         | You need to be doing enterprise sales not marketing. There is a
         | lot of advice here and in general on that but you definitely
         | need to be making calls with that type of business.
        
           | padjo wrote:
           | Yep nobody with the problem you're offering to solve is going
           | to solve it by googling and picking some random company
           | they've never heard of with no track record.
        
             | bjornsing wrote:
             | Not even click a search ad and fill in a contact form? When
             | I'm on the other side of the table I do that. But perhaps
             | I'm unique in that aspect?
             | 
             | (I understand there won't be any significant business
             | without enterprise sales. But that's not what I'm looking
             | for at this stage.)
        
               | narnarpapadaddy wrote:
               | The companies that have these types of problems all have
               | AWS reps (or whatever vendor) that get first crack at a
               | solution, even if their senior engineers or CTOs do some
               | googling. Frequently discounts can be negotiated on
               | products that aren't a perfect fit, or companies will get
               | early access to new products that solve their problem
               | (AWS calls this "limited preview").
               | 
               | A good chunk of B2B infrastructure products like this are
               | developed using a "golden partner" model. The first
               | customer (or few) gets a free or reduced cost license,
               | the developer gets a real-world scenario with real data
               | to use to figure out what the minimal functionality
               | actually is to be a marketable product and to work out
               | bugs. This arrangement frequently requires a preexisting
               | relationship and trust between both parties.
        
               | bjornsing wrote:
               | Yep. Always a bit dangerous to go up against AWS and
               | similar. My hope here is that this product is too niche
               | for the major cloud vendors to invest in. But since Uber
               | is building stuff themselves that assumption may be
               | wrong.
               | 
               | A "golden partner" model makes a lot of sense, thanks.
        
               | narnarpapadaddy wrote:
               | If you know a technology leader at a company that has
               | this problem, reach out and ask if they've had any pain
               | related to it. Ask to take them out to lunch and tell
               | them about a solution you've been working out. Or even a
               | short demo call. See if they'd be interested in an
               | "innovation partnership." You give them a discounted/free
               | license (can be time limited to a year or however long it
               | takes to validate that your solution works and saves them
               | money - then returns to full-price), they agree to
               | feature prominently in your marketing material or provide
               | a reference for your next lead.
        
               | macspoofing wrote:
               | >When I'm on the other side of the table I do that
               | 
               | No, you don't. There are many established storage
               | solutions out there. If you're in the market for one, you
               | can easily fill days, weeks or months vetting those. So,
               | why would you bother dealing with a sales rep from a
               | random one you never heard of before, and isn't used by
               | anyone. You don't even provide any details on what makes
               | it different or better from anything else out there.
        
               | bjornsing wrote:
               | Well the reason I'm working on this in the first place is
               | that when I was on the other side of the table I was
               | looking for one. I filled in the contact forms of a
               | couple of different startups that had products somewhat
               | in line with what I was looking for, and talked to their
               | sales reps. Admittedly they weren't as early stage as my
               | project, but on the other hand they weren't 100% focused
               | on my use-case either.
               | 
               | I guess what I'm trying to say is that I was hoping that
               | someone with a _write intensive workload_ would want to
               | spend some time evaluating a product built specifically
               | for that. But perhaps I'm wrong? Even if your workload
               | was 99% writes you'd rather go to some established player
               | (e.g. MongoDB) with a product optimized for 50 /50
               | read/write?
        
               | macspoofing wrote:
               | >I guess what I'm trying to say is that I was hoping that
               | someone with a write intensive workload would want to
               | spend some time evaluating a product built specifically
               | for that.
               | 
               | Again, it's not clear to me exactly what it is you're
               | doing that's any different from the plethora of existing
               | off-the-shelf solutions.
               | 
               | You're saying that you started this project/company
               | because you were looking for a solution to a specific use
               | case (write-intensive workloads) and existing options
               | didn't work - can you expand on that? Can you create a
               | chart, for example, that lists out the specific things
               | that Haystackdb does and alternatives don't? Presumably,
               | if you optimize for write-intensive workloads, there are
               | some drawbacks when it comes to reads - no? Or maybe
               | storage? That's good to highlight.
               | 
               | What you need are whitepapers/blog posts/youtube
               | videos/talks at conferences/etc. that highlight the
               | technical details of your solution, because you're trying
               | to get technical people interested in your product to the
               | point where they will invest time to learn more.
        
               | bjornsing wrote:
               | Well, it's pretty simple: HaystackDB is designed from the
               | ground up for write intensive workloads, so it's much
               | more economical than existing off-shelf-solutions for
               | that type of workload. Is that not clear from the landing
               | page?
               | 
               | From pricing: "$0.2 per million writes, $20 per million
               | reads". The typical cost profile is $2 per million
               | read/writes, or even more for writes.
        
               | yau8edq12i wrote:
               | Forgive me because what follows will sound harsh, but I
               | think you need to hear it based on your response.
               | 
               | > HaystackDB is designed from the ground up for write
               | intensive workloads
               | 
               | Okay.
               | 
               | > so it's much more economical than existing off-shelf-
               | solutions for that type of workload.
               | 
               | That's a leap in logic. Just because you designed it with
               | this workload in mind, well, doesn't automatically mean
               | that it's any good for this workload (or any workload).
               | If solving a problem was as easy as declaring "I will
               | design my solution from the ground up for this problem",
               | then we'd all live in peace and harmony. So that's what
               | people are asking you here: _how_ do you make your DB
               | "much more economical" for that type of workload? What
               | technology, what ideas have you had to make it possible?
               | If you don't want to reveal that, then you need _proof_
               | that it 's better than the competition, not a
               | _declaration_ , that it's better than the competition.
               | 
               | > Is that not clear from the landing page?
               | 
               | It's clear that you want to _market_ your solution as
               | something good for write-heavy workloads. Why should we
               | believe you 've done a good job _designing_ your
               | solution?
               | 
               | > From pricing: "$0.2 per million writes, $20 per million
               | reads". The typical cost profile is $2 per million
               | read/writes, or even more for writes.
               | 
               | Who knows how you came up with pricing? Perhaps you're
               | betting on your customers being stupid and not realizing
               | that taking a 10x hit on the price of reads will lose
               | them (and earn you) more money in the long run. After
               | all, what good is writing to a DB if you never read from
               | it...? Or perhaps it's some kind of promotional / loss
               | leader pricing that will change soon in the future. In
               | any case, it's, again, not proof that your solution is
               | adapted to the customer's problem.
        
               | bjornsing wrote:
               | > Forgive me because what follows will sound harsh, but I
               | think you need to hear it based on your response.
               | 
               | No worries. I appreciate you taking the time.
               | 
               | > you need proof that it's better than the competition,
               | not a declaration, that it's better than the competition
               | 
               | Fair point. I realize I'll need that before making any
               | sales. But I was hoping to get a few leads from the
               | contact form without it.
               | 
               | > Perhaps you're betting on your customers being stupid
               | and not realizing that taking a 10x hit on the price of
               | reads will lose them (and earn you) more money in the
               | long run. After all, what good is writing to a DB if you
               | never read from it...?
               | 
               | No it's not a malicious trick. There are use-cases where
               | most records will never be read back. For example, if you
               | go into the Uber app you can find a history of all your
               | trips and you can click one and bring up a receipt for
               | it. Most users will rarely if ever do that. So you end up
               | writing many more receipts to your database than what
               | you'll ever retrieve.
        
               | macspoofing wrote:
               | >Is that not clear from the landing page?
               | 
               | The marketing byline you have on your landing page is
               | clear enough, but nobody will take that seriously without
               | a deeper technical description.
               | 
               | When I read it, I assumed you wrote some code to move
               | data in and out of lower-cost S3 or Glacier storage tiers
               | because you don't control storage pricing and you run on
               | top of existing public cloud infrastructure. Maybe I'm
               | right, maybe I'm wrong - but if I'm looking for a
               | solution, I need to assess whether I should invest time
               | and effort to do a deeper dive, and that's the box I
               | would put you in, without any more detail.
               | 
               | Anyway, good luck. Hope it works out.
        
         | PaywallBuster wrote:
         | Maybe find more articles like the above
         | 
         | try connect to the respective people at said teams via LinkedIn
         | and ask feedback
        
           | bjornsing wrote:
           | Perhaps. But I have a feeling it's too late once they've
           | started building something in-house. Any ideas on how I could
           | find the ones that will publish an article like this one year
           | from now? That's the ones I'm after, I think.
        
         | smokel wrote:
         | With a disclaimer that I have no formal nor practical
         | background in marketing, here are some ideas:
         | 
         | 1. It is a bit unclear to me when I would use Haystack. The
         | main advantage seems to be cost cutting. It would be nice to
         | see some realized examples of this.
         | 
         | 2. When competing for price, you may look like the cheap, and
         | thereby untrusted alternative. There is a risky business
         | paradox here, for which I am sure a fellow HN poster will
         | supply the name: you charge less, therefore you make less, and
         | you will not be able to sustain the service, making me not want
         | to spend money.
         | 
         | 3. Have you tried looking for companies that may actually need
         | this solution? Have you tried contacting them directly?
        
           | bjornsing wrote:
           | 1. Good point, thanks.
           | 
           | 2. True. One reason I haven't priced it ridiculously cheap is
           | to avoid this judgement, and fate. With this pricing I won't
           | necessarily have a smaller profit margin than competitors.
           | The cost advantage comes from a smarter architecture. Any
           | ideas on how I can communicate that would be greatly
           | appreciated.
           | 
           | 3. I used to work for one that needed it. I've also
           | interviewed at one that had the same problem. A bit hesitant
           | to reach out to potential customers though before I have a
           | solid product I can deliver. But perhaps I shouldn't be?
        
             | kaibee wrote:
             | > 3. I used to work for one that needed it. I've also
             | interviewed at one that had the same problem. A bit
             | hesitant to reach out to potential customers though before
             | I have a solid product I can deliver. But perhaps I
             | shouldn't be?
             | 
             | Companies generally have to be suffering pretty badly to
             | take a risk on changing their tech stack to something
             | unproven. And the risk for you at that point is that they
             | choose to spend 10x on consultants to implement some
             | existing system instead.
             | 
             | The CTO needs to trade off the opportunity cost of
             | developing new features/existing maintenance against
             | integrating an unproven product. How can you de-risk this
             | for them? (Even just showing that you recognize that this
             | is the case can help)
             | 
             | Maybe this is a time to "do things that don't scale". ie:
             | offer to integrate it into their system for them (for at
             | least some small part/pain point), and likely in parallel
             | so that they can evaluate it without taking down the
             | existing system.
             | 
             | Just my two cents.
        
         | superasn wrote:
         | One observation I have regarding your homepage is that the
         | message isn't very clear. The headline doesn't mention any
         | benefits I get from using your software.
         | 
         | I think you should invest some time into improving your landing
         | page and maybe you may see some traction. A good resource for
         | this which I've bookmarked is here(1). Hope that helps.
         | 
         | (1) https://www.indiehackers.com/post/my-step-by-step-guide-
         | to-l...
        
           | bjornsing wrote:
           | Thanks. So you think it would work better if it would just
           | say "save money", rather than jump straight into the "what"?
           | 
           | To me, when I read the below, that just screams "save money".
           | But maybe I should do that conversation for the reader so to
           | speak?
           | 
           | From benefits box: "Sometimes you need to index a huge amount
           | of data, to accelerate just a few search queries. But
           | building indexes and keeping them in hot storage can be
           | expensive. HaystackDB builds only the indexes needed for sub-
           | second query latency, across billions of keys, while keeping
           | all your data in low-cost object storage like S3."
        
             | superasn wrote:
             | I asked chatgpt for a headline based on your prompt and it
             | gave me this:
             | 
             | HaystackDB: _Swift Searches, Massive Savings_ - Index
             | Billions, Store Smartly, Query in a Flash!"
        
               | bjornsing wrote:
               | Thanks! That headline is pretty good. :)
        
         | csomar wrote:
         | Looking at pricing, it's crazy expensive (and that comparing to
         | AWS, which is crazy expensive). How do you justify that?
        
           | bjornsing wrote:
           | The idea is that it should be about a tenth of the cost
           | compared to S3 or DynamoDB. Is that not how you read the
           | pricing? Or do you just think that's still too expensive?
           | 
           | EDIT: Or maybe it's because reads are expensive? That's a
           | consequence of the write optimization. The idea is that
           | potential customers will be doing 90%+ writes.
        
         | emerongi wrote:
         | Besides all the other good feedback here, I will offer my
         | extremely petty reason why I wouldn't spend much time
         | evaluating this product. In the FAQ, under the "Are
         | transactions fully ACID?" heading, there is a typo:
         | "simultaineously". It gives me the impression that not enough
         | care has gone into an important part of this product. I know
         | it's not a fair jugdgement, but first impressions matter.
        
           | bjornsing wrote:
           | That's easy to fix, thanks.
        
             | apwell23 wrote:
             | typo was not the point of the comment though :)
             | 
             | also is there a demo or some sort of technical whitepaper.
        
               | bjornsing wrote:
               | Not yet. Is that something you'd find compelling?
               | Anything in particular you'd like to see?
        
         | jgalt212 wrote:
         | Some B2B and B2B2C products just don't work with walk-in leads.
         | You need someone to create and chase down a set of leads.
        
           | bjornsing wrote:
           | Good point. Any ideas how I could experiment with this "on
           | the cheap"? Any tools I could use to identify and contact
           | leads?
        
         | macspoofing wrote:
         | The market is saturated with storage products, so it's tough to
         | differentiate yourself. Your site does not help by the way.
         | You're also not selling an end-user product to the public,
         | rather you're selling a technical and infrastructure solution
         | to very technical people - that's a different type of sale. To
         | get those people interested, you must put together technical
         | whitepapers/blog posts/webinars/youtube videos/etc. to explain
         | your approach.
        
         | shrubble wrote:
         | Find just 1 customer.
        
         | Sevii wrote:
         | Most places I've worked we couldn't even consider using a
         | product that wasn't supported by a major cloud provider like
         | this. What problem does your product solve that customers
         | absolutely 100% need it?
         | 
         | Structurally, you are a small entity trying to compete on cost
         | with hyper scalar cloud providers and open source software.
         | Most ISVs like you charge a ton of money for big problems very
         | few enterprise customers have.
         | 
         | I think you need to find a specific use case where your product
         | is a clear winner. Like 'HaystackDB is the best option for
         | healthcare exchanges to use when receiving claims'.
        
           | bjornsing wrote:
           | This is a good summary of why I'm hesitant to put more work
           | into it.
           | 
           | The counter argument I guess is that developing your own data
           | store in-house should be even more of a no-no, and companies
           | do that. (One example is obviously Uber, but my previous
           | employer is another example.)
           | 
           | Do you think the option to self-host the product would help
           | tip the scale?
           | 
           | > What problem does your product solve that customers
           | absolutely 100% need it?
           | 
           | To be blunt there is no such problem: you can always throw
           | more money e.g. at DynamoDB. But if you have a very write
           | intensive workload (such as the use-case described in the
           | OP), then you can save 90% of that money.
        
         | bastawhiz wrote:
         | > $20 per million reads
         | 
         | Quite frankly, this is not gonna work. I manage a system with a
         | very write-heavy workload (lots of small writes) and even
         | though our writes far outpace our reads, this pricing makes
         | your system about ten times more expensive than an RDS cluster.
         | 
         | There's no data about performance. There's no information on
         | how or whether data is persisted to durable storage before a
         | write is acknowledged. There's not even any information on how
         | big keys or values can be. There's no public information on
         | support.
         | 
         | When choosing a system like yours, my priorities are:
         | 
         | 1. Data safety
         | 
         | 2. Performance
         | 
         | 3. Cost
         | 
         | ... In that order. You've done nothing to educate me on 1 and 2
         | and your pricing isn't better than what you're seeking to
         | displace.
         | 
         | When your product is a tool for developers, show up with hard
         | facts about your product. Zero people (as you've seen) are even
         | remotely interested in building a product on top of a system
         | without knowing whether the system will hold up to their use
         | case. And other than a very anemic FAQ section, you have no
         | documentation at all, whatsoever.
        
           | bjornsing wrote:
           | All valid points. I guess I'm hesitant to put time into
           | documentation and similar, if I can't somehow find a steady
           | stream of sales prospects.
           | 
           | > even though our writes far outpace our reads, this pricing
           | makes your system about ten times more expensive than an RDS
           | cluster
           | 
           | That indeed sounds off... Are you sure you're comparing the
           | total cost to that of an RDS cluster? I am aware that reads
           | will be more expensive (due to the write optimization), but I
           | was hoping most customers would make it back on cheap writes.
           | Also the storage itself ($0.23 per GB-month) should be much
           | cheaper than RDS.
        
             | bastawhiz wrote:
             | My total database is maybe 400 gigs. Most of the writes
             | overwrite existing data, so storage cost isn't a concern.
             | With the cost of an upfront RI for the year on RDS (with
             | basically as many iops as I can use), your solution gives
             | me ~100 million reads. That's...like a month of usage at
             | best.
             | 
             | At least I'm my case, the fundamental problem you're facing
             | is that reads are just too expensive. Writes and reads tend
             | to grow at the same pace in many products: there's a ratio
             | that tends to stay the same as you scale. $20/million reads
             | is just a _lot_. The ratio of writes to reads for your
             | pricing needs to be 100:1 or more for it to make sense for
             | me, but I'm more like 10-20:1.
             | 
             | > I guess I'm hesitant to put time into documentation and
             | similar, if I can't somehow find a steady stream of sales
             | prospects.
             | 
             | This is part of why a database company is hard to build.
             | You will simply not find anyone willing to give you money,
             | because the alternative is going to be a solution your
             | customers already know and understand and which is likely
             | extremely mature. You're competing with Postgres and Mongo.
             | You can't ship a database product that doesn't work: you're
             | asking people to build on you for their _storage
             | primitive_. If you fuck up, that 's a business-ending event
             | for your customer. You've either got to come to the table
             | with an extremely compelling product ("I couldn't build my
             | business without this") or you've got to show why someone
             | should trust you over an established but somewhat more
             | expensive alternative.
        
               | bjornsing wrote:
               | > The ratio of writes to reads for your pricing needs to
               | be 100:1 or more for it to make sense for me
               | 
               | Correct. I bet Uber's use case here is something like
               | 1000:1. I've worked on systems that were over 1000000:1.
               | That's where HaystackDB makes sense.
               | 
               | > but I'm more like 10-20:1.
               | 
               | Then RDS is hard to beat.
        
               | erik_seaberg wrote:
               | You might be surprised how often a deep graph of
               | microservices ends up rereading the same prior
               | transactions over the course of stateful payment
               | processing and on-demand payouts. DynamoDB can give you
               | 2.6 million short reads of base load (1 RCU/s
               | provisioned) for $0.12 per month, which would make a $65
               | alternative (2.6 * $5 + 2.6 * $20) a hard sell.
        
         | victor106 wrote:
         | I would seriously consider an open source business model with
         | an appropriate licensing model. I see lot of companies are open
         | to experimenting with open source db's.
        
           | bjornsing wrote:
           | Good point. I'm thinking about releasing an open source (or
           | source available) "frontend" for it, and just charge for the
           | "cold storage backend". How would you feel about that?
        
         | oldprogrammer2 wrote:
         | The homepage could benefit from more tangible examples, because
         | right now I can't discern where it fits into my current stack.
         | For most companies, it would be replacing something in a
         | specific context.
         | 
         | Like a side-by-side example. Doing "work" on BigTable (show
         | code examples) versus doing the same "work" on Haystack. Then
         | show the specific metrics on how Haystack is
         | cheaper/faster/better.
        
         | cess11 wrote:
         | To consider your product an alternative I'd like to see
         | benchmarks that seem trustworthy, something like a Jepsen
         | analysis or case studies at existing customers, and be able to
         | test it within the EU, i.e. not on US:ian services.
         | 
         | Seems you're in the vicinity of Lund, should be a 'science
         | park' or similar close to the uni where you can find companies
         | that have problems you could solve. Talk to 'incubators',
         | 'accelerators' and the like there.
        
       | rguillebert wrote:
       | So they saved $0.000006 per record, it's really about the little
       | things...
        
       | theanirudh wrote:
       | I wonder if they considered https://tigerbeetle.com
        
         | geodel wrote:
         | Would be interesting. Considering TigerBeetle is written in
         | Zig. And Uber is probably only rare big company which has
         | support contract with Zig foundation.
        
       | rmccue wrote:
       | Original story looks to be https://www.uber.com/en-
       | AU/blog/migrating-from-dynamodb-to-l...
        
       | deadbabe wrote:
       | So did the engineers who proposed this get some kind of bonus
       | considering how much money they saved the company?
        
         | Galanwe wrote:
         | Employees are constantly saving cost or adding value, that's
         | what they are paid for.
        
         | zinglersen wrote:
         | If a project fails, do you pay for the loss since you want a
         | share of the profits as well?
        
           | deadbabe wrote:
           | Someone probably gets fired so I guess someone does pay the
           | ultimate price.
        
             | zinglersen wrote:
             | Losing your job because the outcome of your efforts (or
             | even external events) is not what I would call the ultimate
             | price.
             | 
             | "The metaverse division has now lost more than $45 billion
             | since the end of 2020"
             | 
             | Your compensation for your work is your salery. So I would
             | say that it's fair that the actual risk taker is benefiting
             | from the potential rewards?
        
               | HeatrayEnjoyer wrote:
               | The "risk takers" are not taking at any risk at all.
               | What's the chance they end up on the street, or even
               | suffer personal financial stress about their life? That
               | they will have to move, sell their car, home, etc. It's
               | 0%.
        
               | zinglersen wrote:
               | What.. they are taking a lot of risk...
               | 
               | But I guess we first have to agree on "who" we are taking
               | about - is it the company itself or the owner /
               | shareholders ?
               | 
               | Back to your question, yes that could happen in several
               | different cases. But of course the risk/benefit is not
               | split 50/50 (nor 0 risk, 100 upside, as you said), in
               | reality the future outcome depends on both internal and
               | external events.
               | 
               | Even the richest(?) man in the world was relatively close
               | to loosing it all;
               | 
               | Musk, who had $200 million in cash at one point, invested
               | "his last cent in his businesses" and said in a 2010
               | divorce proceeding, "About four months ago, I ran out of
               | cash." Musk told the New York Times
               | https://www.cnbc.com/2017/04/27/the-crucial-decision-
               | teslas-... https://archive.nytimes.com/dealbook.nytimes.c
               | om/2010/06/22/...
        
               | cynicalsecurity wrote:
               | Is it an offer to become a shareholder without actually
               | buying any shares? That would be absolutely great, but
               | unfortunately, it doesn't work this way.
        
               | zinglersen wrote:
               | That's the beauty of it, you can choose to spend you
               | money how you want!
               | 
               | You wouldn't want all your earnings to be in stocks, you
               | want liquidity. For example investing your earned money
               | into a public company, or buying food.
        
         | chasd00 wrote:
         | Getting to say you led the effort that saved $6M and resulted
         | in some blog posts is probably the reward. At my firm,
         | associating your name to dollars is the fastest way up the
         | corporate ladder.
        
         | HeatrayEnjoyer wrote:
         | Exactly. Work should be owned by the workers.
        
       | drpotato wrote:
       | The original[1][2] articles are a better read IMO. The link is
       | just a summary of the two with added spelling and grammatical
       | errors that materially impact the meaning.
       | 
       | 1. https://www.uber.com/blog/how-ledgerstore-supports-
       | trillions...
       | 
       | 2. https://www.uber.com/blog/migrating-from-dynamodb-to-
       | ledgers...
        
         | intunderflow wrote:
         | Seems to happen with all our blog posts that appear on here (I
         | work at Uber) - I don't get why the originals don't get upvoted
         | but these rehashes do - are our titles just not as good?
        
           | gronky_ wrote:
           | Yes, that's definitely the main reason. It's called "burying
           | the lede".
           | 
           | Saving $6M is key information that makes this story
           | interesting. It's buried all the way at the bottom of the
           | first blog and is completely missing from the second blog
           | which focuses specifically on the migration
        
             | dboreham wrote:
             | TaaS : title as a service
        
               | k1t wrote:
               | People have done this, eg https://www.reddit.com/r/Growth
               | Hacking/comments/k20g42/ai_to...
               | 
               | However that appears to be defunct now
        
             | alexchantavy wrote:
             | I'm usually guilty of this. The hands-on person involved in
             | a highly technical project gets excited and bogged down in
             | the details of the project that they end up not being the
             | most compelling storyteller about it.
        
               | ComodoHacker wrote:
               | Don't blame yourself. Not everyone is here for the money,
               | many of us are here for the tech.
        
           | masklinn wrote:
           | > I don't get why the originals don't get upvoted
           | 
           | Because they were never submitted? I looked for the first
           | one, it doesn't seem to be on HN.
        
           | brushfoot wrote:
           | Personally, yes, the rehash's title is stronger. It tells a
           | story whose ending piques your curiosity to read more.
           | 
           | "Uber Migrates" (beginning: company that I'm interested in
           | does something) "1T records" (middle: that's a lot of
           | records; I wonder what happened) "from DynamoDB to
           | LedgerStore" (hmm, how do they compare?) "to Save $6M
           | Annually" (end: that's a good chunk of change for me, but was
           | it worth it to Uber? Why did it save that amount? Let me read
           | more)
           | 
           | It's a simple and engaging "there and back again" story that
           | paves the way for a sequel.
           | 
           | Versus:
           | 
           | "How LedgerStore Supports Trillions of Indexes at Uber" (ah,
           | okay, a technology supports trillions of indexes. Moving on
           | to the next article in my feed)
           | 
           | "Migrating a Trillion Entries of Uber's Ledger Data from
           | DynamoDB to LedgerStore" (ah, a big migration. I'm not sure
           | who did it or whether anything interesting came of it, or
           | even whether it happened or is just theoretical because of
           | the gerund, and moving one trillion of something is cool but
           | not something I probably need to read about right now, so
           | let's move on)
           | 
           | YMMV. Some probably prefer the more abstract/less narrative
           | titles, but the first one is more of an attention grabber for
           | me.
        
           | beanjuiceII wrote:
           | i mean it could use a few "blazing fast" sprinkled about
        
             | barfbagginus wrote:
             | And you can't have blazing fast without rust, and a little
             | kvetching about lifetimes
        
               | vsnf wrote:
               | While your broader point is well taken, isn't Uber a
               | famous Go shop?
        
               | barfbagginus wrote:
               | Lol I was not being on topic or constructive - just
               | repeating the meme that rust is synonymous with "blazing
               | fast", because of endless statements to the effect of
               | "rust is blazing fast," or "if you want blazing fast
               | code, use rust," or the endless blazing fast rust
               | libraries:
               | 
               | https://duckduckgo.com/?q=blazing+fast+rust
               | 
               | Now I'm not an expert in either rust or go. But I know my
               | deductive meme logic:
               | 
               | 1. Uber's solution is not blazing fast
               | 
               | 2. They are a Go house
               | 
               | Then the meme implies:
               | 
               | 3. Their solution is slow because they did not use rust!
               | 
               | Q.E.M. (Quod Erat Memonstrandum)
        
           | IanCal wrote:
           | Other than the comments about titles, the entire blogpost
           | doesn't show for me with ublock. So I'll open it, see a
           | picture of some birds, scroll around for a bit then give up.
        
             | pests wrote:
             | That's probably because you are running software that is
             | meant to hide content on a page.
        
               | IanCal wrote:
               | What's the purpose of this comment?
               | 
               | My point is that a random dev running a pretty plain
               | adblock (aren't we all?) simply cannot view their post.
               | This is down to uber, their practices, an external
               | developer and how uber create their blog (they don't just
               | have the content in the page). If I'm not a special case
               | with extremely weird luck, a bunch of devs seeing links
               | to their posts will open them and not see any actual
               | content. They will then, I assume, be less likely to
               | upvote them.
               | 
               |  _Given that they are seeing problems with posts being
               | upvoted_ this seems somewhat relevant.
        
               | pests wrote:
               | I have no issues reading their blog with uBlock Origin.
               | 
               | You are running software that is blocking content you
               | want to read. That is my point.
               | 
               | If I put on blinders and then complain I can't see your
               | stuff, that's my fault not yours - regardless if your
               | stuff is good or the worst annoying spam ever. If I want
               | to see it for some reason, maybe I should take off the
               | blinders
        
               | IanCal wrote:
               | > You are running software that is blocking content you
               | want to read. That is my point.
               | 
               | Yes. It's my point too. I am running very standard
               | software for a dev and it is stopping their dev blog
               | posts being visible.
               | 
               | > If I put on blinders and then complain
               | 
               | I'm not complaining. I'm explaining, given the evidence I
               | have, why they may be seeing poor results on HN. If I'm
               | not alone (and since I have no custom setup designed to
               | keep our their blog posts that would be a surprise) then
               | there are other developers who cannot see their posts.
        
               | simion314 wrote:
               | Ad Blocking is recommended by USA government agency for
               | security reasons, not running an ad blocker is a
               | dangerous and suggest lack of information/education about
               | IT stuff.
        
               | pests wrote:
               | Agreed, but if legit content gets blocked you only have
               | yourself to blame.
               | 
               | Like turning off JS and saying webapps don't work
               | anymore.
        
               | IanCal wrote:
               | And if someone with a js heavy blog asked why it wasn't
               | getting traction on a lynx centered forum they'd probably
               | be told that their content wasn't readable for a portion
               | of the users.
        
             | nvr219 wrote:
             | Loads fine for me with ublock. Perhaps you have a custom
             | rule blocking something?
        
               | IanCal wrote:
               | Nothing custom, so it must be on a list somewhere.
               | 
               | edit - it doesn't have to really be blocking the actual
               | post here even, if their loading code breaks when some
               | other tracking code doesn't run, that could explain it.
        
               | leadingthenet wrote:
               | I have the exact same problem, except on Uber Eats.
        
           | ckluis wrote:
           | Just put all your articles into a customGPT with the examples
           | from the rehashes for each one and then ask the GPT to
           | rewrite your title to the a "rehash" like title for the new
           | posts ;)
        
         | dang wrote:
         | Ok, we've changed to the second link from
         | https://www.infoq.com/news/2024/05/uber-dynamodb-
         | ledgerstore....
         | 
         | Submitters: " _Please submit the original source. If a post
         | reports on something found on another site, submit the latter._
         | " - https://news.ycombinator.com/newsguidelines.html
        
       | igammarays wrote:
       | I wonder if 1.7 petabytes of data (1T indexed records) could fit
       | on a single (very) beefy baremetal server for under a couple
       | thousand dollars a month, served by SQLite.
       | 
       | Like this: https://use.expensify.com/blog/scaling-sqlite-
       | to-4m-qps-on-a...
        
         | kondro wrote:
         | Given 30.7TB SSD's are about $5500 each and you'd need 56 to to
         | get to 1.7PB (with no redundancy). Not to mention that SQLite's
         | maximum DB size is 140TB.
         | 
         | I don't think you'd be able to fit this much storage into a
         | single machine, especially not for a few thousand a month and
         | SQLite wouldn't be appropriate for this use-case.
        
           | bayindirh wrote:
           | If you install a RAID controller and a couple of disk boxes,
           | it's possible with 1:1 replication, or with backups. 60 disk
           | 3.5" units already exist, so 2.5" SSD racks. It won't be
           | cheap, but will be resilient and fast. _Bloody fast_ if you
           | have the budget.
        
             | zaphirplane wrote:
             | > or with backups take a while to restore a PB and a way to
             | take a hot backup without impacting the service that by
             | itself is a task or snapshots which is more disk
             | 
             | > 1:1 replication Depending on the amount of writes could
             | be a ton of extra disk and a bucket for network cost
        
               | bayindirh wrote:
               | These systems support zero-downtime snapshots. You tell
               | it to snapshot, it instantly snapshots, you can run a
               | differential/incremental backup at great speeds. Your
               | RAID controller is already caching the hot data, so the
               | impact is minimal.
               | 
               | Except network cost there's no extra disk required. It's
               | just broadcasted writes consumed on the other hand.
               | 
               | These boxes are not dumb JBODS. They support their own
               | replication/backup subsystems, so everything is
               | transparent.
        
             | Closi wrote:
             | Resilient and fast from a disk perspective, but in practice
             | massively bottlenecked by the fact that Sqlite can only
             | have 1 writer at a time.
        
           | Neil44 wrote:
           | At the moment they're just paying someone else to buy $5000
           | SSD's and run a database on them at many X markup.
        
             | omeid2 wrote:
             | There is no upper bounds to economy of scales. Maybe there
             | is for the cents per GB of raw storage, but power usage,
             | security, rent, and everything else scales too, and few of
             | them have upper bounds on economy of scales.
        
               | Retric wrote:
               | Economies of scale generally have upper limits. Often
               | when you approach the largest scale the existing market
               | will supply you essentially need to become your own
               | supplier which then runs into span of control issues. The
               | organization needs to become competitive in that new
               | market or their costs increase.
               | 
               | Keep scaling and eventually vertical integration ends up
               | looking like a Soviet style planned economy. Your remote
               | mining town needs some way for people to get soap etc so
               | you open a store with it's own supply chain etc etc.
        
           | ndriscoll wrote:
           | There are 61.44 TB NVMe drives (best price I've seen right
           | now is ~6200. They were ~4800 earlier this year). You can
           | have a 1U server with 32 E1.L slots so you should be able to
           | fit ~1.9PB raw storage into 1U for a little over $200k. Don't
           | know how business financing works, but at 8% interest with a
           | 5 year amortization, that's a bit over $4k/month.
        
             | mobilemidget wrote:
             | Do you have any good recommendation for such 1U server with
             | 32 slots? Thanks
        
               | perryh2 wrote:
               | Supermicro https://www.supermicro.com/en/products/nvme?pr
               | o=formfactor%3...
        
             | jakjak123 wrote:
             | Our ops team actually wanted to do this, but we on the
             | project have nightmares from putting 1PB of database on a
             | single host ><
        
           | choppaface wrote:
           | StorageReview plays with 2PB flash machines all the time
           | https://www.youtube.com/watch?v=UQMKtlIjeuk
           | 
           | 1PB in a rack with spinning rust + flash buffer has been easy
           | for years now.
        
         | BlackLotus89 wrote:
         | No it won't. sqlite "only" works with up to 281TB [0] [1]
         | 
         | [0] https://www.sqlite.org/releaselog/3_33_0.html
         | 
         | [1] https://www.sqlite.org/limits.html (#12)
        
           | sgt wrote:
           | You can split up into 10 SQLite DB's on this individual
           | server.
        
             | cdchn wrote:
             | You've now implemented sharding on top of SQLite.
             | 
             | Eventually all programs will be able to read email.
        
               | mrbungie wrote:
               | Any non-trivial complexity codebase eventually implements
               | a mediocre SQL/Lisp/etc.
        
               | Closi wrote:
               | Exactly! Why take a system not designed for this sort of
               | scale and force it to scale, rather than use systems
               | which are designed and tested for this scale and volume?
               | All you will do is hackily re-invent all the other things
               | that the other databases had to do to scale to this
               | extent.
               | 
               | Plus size is only one limit, you would be limited to 1
               | write every few milliseconds. My napkin maths estimate is
               | that there are at least 1-2m writes per hour going into
               | this thing, so probably 300-600 writes / second (Average)
               | and maybe over 1k writes/second peak. We are going to
               | fall over here!
               | 
               | Not sure why some people seem to have a viwe of "There is
               | no scaling problem that can't be solved with a sufficient
               | enough number of SQLite databases".
        
             | zaphirplane wrote:
             | > You can split up into 10 SQLite DB's on this individual
             | server.
             | 
             | 1 is a scalable, managed, highly available service, with
             | economies of scale the other is a fixed size, capital
             | expenditure with fixed performance, limited DR, requiring a
             | couple of SRE/DevOps and colo
             | 
             | There is also the will it always work question
        
             | mlnj wrote:
             | Just storing petabytes of data is not the issue. Managing
             | and querying it reliably is.
        
             | chasil wrote:
             | Beware of WAL mode, as you sacrifice ACID in this
             | configuration.
             | 
             | https://sqlite.org/lang_attach.html
             | 
             | 'Transactions involving multiple attached databases are
             | atomic, assuming that the main database is not ":memory:"
             | and the journal_mode is not WAL. If the main database is
             | ":memory:" or if the journal_mode is WAL, then transactions
             | continue to be atomic within each individual database file.
             | But if the host computer crashes in the middle of a COMMIT
             | where two or more database files are updated, some of those
             | files might get the changes where others might not.'
        
             | nemothekid wrote:
             | Once you are splitting up 10 sqlite dbs you have a bespoke
             | distributed system anyways, and you will find yourself
             | doing all the headache of LedgerStore anyways.
             | 
             | Most of the novel work in LedgerStore is probably around
             | managing the headaches of distributed storage, not the
             | persistence layer.
        
         | tinyspacewizard wrote:
         | Also a bit scary to have a system without a scaling mechnism
         | built-in in the path of customer traffic. At some point you may
         | be racing to upgrade it.
        
         | sgt wrote:
         | How would you replicate that SQLite DB onto other hosts to
         | achieve redundancy?
        
           | thangngoc89 wrote:
           | One could use Litestream [1]
           | 
           | [1]: https://litestream.io
        
             | szundi wrote:
             | What if a continuous replication system has a bug one day,
             | and you realize you are just a bit corrupted and have to
             | rerun? Or is it the same with cloud tools?
        
               | thangngoc89 wrote:
               | That's why you always test your backup. I backup the full
               | sqlite.db every day and test the litestream replication
               | every week. So far litestream have been solid.
        
               | zaphirplane wrote:
               | By the time the TB is restored, time to start the next
               | test
               | 
               | How do you detect restored but bit flipped data ?
        
               | thangngoc89 wrote:
               | I do this in backup testing:                   sqlite3
               | /path/to/db         sqlite> PRAGMA integrity_check;
               | 
               | See SQLite3 documentation:
               | https://www.sqlite.org/pragma.html#pragma_integrity_check
        
               | zaphirplane wrote:
               | Sounds like it will take awhile for TB and it checks db
               | integrity not data integrity
        
               | mickeyp wrote:
               | Would you care to tell us what your backup and restore
               | policy would be for 1.7 PB of data?
        
               | thangngoc89 wrote:
               | I'm replying to the question of how one would replicate
               | SQLite 3 in production for redundancy. I myself consider
               | 10GB would be the limit for using SQLite 3 in read/write
               | in production and switch to PostgreSQL.
        
               | sgt wrote:
               | That's a huge discrepancy. One half of HN wants to put
               | petabytes on SQLite, while your limit is only 10GB.
        
               | Closi wrote:
               | Why not use SQLite's own guidance on where SQLite
               | probably isn't appropriate:
               | 
               | - Client/Server applications (Check)
               | 
               | - High-volumes (Check)
               | 
               | - Large datasets (Check)
               | 
               | - High concurrency, particularly for writes (Check)
               | 
               | https://www.sqlite.org/whentouse.html
        
               | jeltz wrote:
               | Then the same happens as when there is a bug in Aurora's
               | replication. You lose data. I know this from personal
               | experience.
        
             | anonzzzies wrote:
             | Any open source doing something similar ?
        
               | jiripospisil wrote:
               | Litestream is open source.
               | 
               | https://github.com/benbjohnson/litestream
        
               | anonzzzies wrote:
               | Ai. I remember another product and thought it was this.
               | Sorry. Move on and keep up the good work.
        
         | riku_iki wrote:
         | it will take forever to create that index. Link describes 10B
         | rows dataset.
        
         | sanderjd wrote:
         | Sometimes things just aren't nails, even when you have a really
         | good hammer.
        
         | khaki54 wrote:
         | The value proposition of commercial loud isn't cost savings
         | unless you manage to quantify all of the ancillary and
         | extrinsic factors such as security risk, HVAC, datacenter
         | personnel, and hardware lifecycle. Any well capitalized and
         | organized company could build their own cloud much more
         | cheaply, but really a significant portion of the calculation is
         | outsourcing the risk components.
        
           | igammarays wrote:
           | The problem with outsourcing the risk components is that you
           | don't know for sure whether they are properly taken care of.
           | Major cloud providers have been caught "oopsing" your data,
           | and bam it is gone. Furthermore, they have no incentive to be
           | more efficient about it, they could easily be using 10x the
           | amount of resources necessary, and you wouldn't even have a
           | clue, you're just paying for evermore expensive crap that
           | becomes less reliable over time.
        
             | ddorian43 wrote:
             | But the cloud providers compete with other other! Look at
             | the efficient market in display in their bandwidth pricing!
        
               | PretzelPirate wrote:
               | For very large customers, the cloud providers do compete
               | with each other on cost. They often pay different prices
               | than are advertised.
        
           | ownagefool wrote:
           | > Any well capitalized and organized company could build
           | their own cloud much more cheaply
           | 
           | Lots of orgs fail to turn money into talent and then talent
           | into products.
           | 
           | It just takes one bad hire at senior level and suddenly your
           | cloud is a vmware install where all machines are boot off
           | network disk, and contention makes the entire thing fall
           | over.
        
         | pclmulqdq wrote:
         | You wouldn't want to do 1T records on one server even if you
         | could. At that scale, you would prefer to be somewhat
         | distributed for availability and scalability. Also, SQLite has
         | issues at large scale.
         | 
         | A reasonable number for one server is about 32-128 TB, and 1.7
         | petabytes with some redundancy fits nicely in ~30 servers with
         | a decent distributed database.
        
         | klysm wrote:
         | Sure but then you get a whole new set of costs and folks you
         | have to hire to maintain that hardware.
        
         | siva7 wrote:
         | Maybe it could and now you got 99 new Problems. That's why more
         | experienced decision makers won't allow this to happen.
        
         | Closi wrote:
         | 1.7 petabytes on Sqlite?
         | 
         | Sqlite's own advice:
         | 
         | > If your data will grow to a size that you are uncomfortable
         | or unable to fit into a single disk file, then you should
         | select a solution other than SQLite. SQLite supports databases
         | up to 281 terabytes in size, assuming you can find a disk drive
         | and filesystem that will support 281-terabyte files.
         | 
         | > Even so, when the size of the content looks like it might
         | creep into the terabyte range, it would be good to consider a
         | centralized client/server database [over SQLite].
        
           | cheeze wrote:
           | This is the worry IMO. It's fine to dump it on a server with
           | SQLite, but once you start hitting scaling limits, you're in
           | for a potentially rough migration.
        
         | callalex wrote:
         | One of the main reasons you put up with the annoyances of
         | tuple-based storage like DynamoDB is because you want extremely
         | high availability that simply cannot be provided by one
         | computer in one physical location.
        
       | benterix wrote:
       | I read the article so I roughly know what LedgerStore is - but I
       | have no idea where it is hosted.
        
         | tiew9Vii wrote:
         | From one of the original sources linked in this thread
         | 
         | > LSG promised shorter indexing lag (i.e., time between when a
         | record is written and its secondary index is created).
         | Additionally, it would give us faster network latency because
         | it was running on-premises within Uber's data centers.
         | 
         | https://www.uber.com/en-AU/blog/migrating-from-dynamodb-to-l...
        
       | xiwenc wrote:
       | Is this another outlier when you reach certain scale, it's more
       | beneficial to roll your own? Pretty amazing what Uber has to deal
       | with.
       | 
       | Also it's not very clear from the original articles, what is the
       | new total "cost of ownership" of this new refactored service.
       | Like now they need to manage their own databases and the storage
       | backing them. Or did i miss it?
        
         | crabbone wrote:
         | I worked for a company which used Redis at the prototyping
         | phase, but then wrote own database to improve performance and
         | resilience. The company wasn't selling an end-user facing
         | product, the product was a distributed filesystem.
         | 
         | My take on this is that most companies don't have the expertise
         | to build systems like databases, and even if the costs would
         | otherwise suggest such a development as desirable would be
         | simply afraid of doing it.
        
       | ForHackernews wrote:
       | Does no one ever delete data? It's hard to believe there's much
       | business value in keeping every individual payment record dating
       | back to 2017.
        
         | sanderjd wrote:
         | I'm not sure if they have regulatory obligations to keep them,
         | or what, but it still seems like you could back them up to cold
         | storage after a reasonable period of time.
        
         | robertlagrant wrote:
         | It might just be an internal policy to cover all the crazy
         | combinations of regs the world over. They might just say
         | 10/20/100 years is their policy, now figure out how to store
         | it.
        
         | moooo99 wrote:
         | Payment information is often subject to pretty strict
         | regulatory requirements, including archival durations. Having
         | to keep all the original information for 10 years is not
         | entirely uncommon.
        
         | crabbone wrote:
         | In systems that deal with money, money-related data is
         | virtually never deleted. The reason is the fear that deletion
         | can be exploited somehow in the future, rather than the old
         | data being actionable.
         | 
         | For example, if a customer registers with the name of a deleted
         | customer, which will resurface some "unfinished" transactions
         | or rules associated with the older version of the "same"
         | customer that haven't been properly deleted but appeared to be
         | deleted for a while.
         | 
         | Also, in general, deletion is very difficult because money
         | doesn't just disappear. You'd need some sort of compaction
         | (think: Git squash) rather than deletion to be able to balance
         | the system's books... but then you'd be filling the system with
         | fake transactions...
         | 
         | From my experience from working with these kinds of systems,
         | the typical solution is to label entities with active/inactive
         | labels to substitute deletion. But entities never go away.
        
           | kobalsky wrote:
           | I agree with you, but there is a plus for deleting old data.
           | 
           | If you are not required to keep the information for more than
           | X years, and you still keep it, then you have to provide it
           | when it's requested.
           | 
           | If you didn't keep it, then it can't be used against you.
           | 
           | If you delete it after it was requested, then you are in
           | trouble.
        
         | nolongerthere wrote:
         | At an individual level I appreciate when an app or service I
         | use maintains all records from the start of our relationship,
         | I've infrequently found myself going back and looking for
         | something, and it's always a breath of fresh air to see that
         | nothing was deleted.
        
           | dang wrote:
           | Sorry for the offtopicness, but please see
           | https://news.ycombinator.com/item?id=40418627 regarding a
           | flamewar that happened over a week ago. It's important that
           | this not happen again.
        
       | washywashy wrote:
       | I pretty much never see engineering salaries factored into these
       | types of savings projects. I assume because engineers are already
       | viewed as a sunk cost or maybe it's just because it's way less
       | tangible. Have seen many designs describe how X saves Y dollars
       | but ignores the engineering effort to maintain and build it. Half
       | the time I suspect it's just so people have something to work on,
       | rather than it being some critical fix.
        
         | gtirloni wrote:
         | If anything, it reduces Uber's exposure to AWS' proprietary
         | technology. I don't know how to measure how much that's worth
         | but they probably do.
        
           | vertis wrote:
           | Companies this size almost certainly have different terms of
           | use. I worked for a smaller, but still ASX200 company that
           | had a custom contract, and assigned staff that would drop by
           | 2-3 days a month. Of specific note was that if AWS wanted to
           | stop doing business with us they had to give at least X
           | notice (from memory that was 12 months for us).
           | 
           | For our risk profile this was more than enough time to
           | migrate off any AWS' proprietary technology.
           | 
           | That makes it worth less to avoid exposure.
        
           | scarface_74 wrote:
           | This usually comes from people who have never done a mass
           | migration at scale.
           | 
           | You're always dependent on your infrastructure. Even if you
           | have nothing but everything hosted on a bunch of VMs, it can
           | take years and millions of dollars to migrate.
           | 
           | No, just use Terraform and Kubernetes is not the answer.
           | 
           | The typical enterprise is dependent on depending on the
           | source between 80 - 120 SaaS products - ie outside vendors.
        
             | gtirloni wrote:
             | *> Even if you have nothing but everything hosted on a
             | bunch of VMs, it can take years and millions of dollars to
             | migrate.
             | 
             | I'd assume it takes fewer millions to migrate your own tech
             | stack from AWS to somewhere else than it takes to migrate
             | from AWS proprietary solutions. Is that reasonable?
        
               | scarface_74 wrote:
               | No because you still have to deal with permissions,
               | integrations with AWS services like networking, training,
               | security audits, regression testing, often physical
               | network connections (Direct Connect), DNS...
               | 
               | And you're dealing with your PMO department, project
               | managers, finance, security, contract negotiations,
               | retraining your ops department...
               | 
               | And you know that Aurora MySQL instance that was suppose
               | to prevent "lock in"? I bet you someone somewhere in your
               | org thought about creating an ETL job and then said
               | forget it and used "select into S3" to move data from
               | MySQL into S3.
               | 
               | As a project manager trying to ship code so you can show
               | "impact" to put on your promo doc, are you going to
               | choose for your team to spend weeks to write an ETL job
               | to prevent "lock in" or are you going to tell the
               | developer to write that one line of SQL?
               | 
               | There are all sorts of choices you can make that will
               | save time and money and ship features that actually
               | deliver value instead of worrying about the boogie man of
               | "lock in".
               | 
               | And I really hope that there was some better technical
               | reason than just saving $6 million dollars a year for a
               | multibillion dollar company to go through the migration.
        
               | gtirloni wrote:
               | Thanks for the insights. So in the case that it's
               | actually more expensive to migrate your own tech stack
               | somewhere else than, say, migrate from AWS proprietary to
               | GCP proprietary, it seems there might be other reasons.
        
               | scarface_74 wrote:
               | The difficulty would be worse of course if you depend on
               | anything proprietary from the cloud vendor.
               | 
               | But the main question is, once you do all of this work
               | and spend time to be "cloud agnostic", does it add
               | business value?
               | 
               | In the case of Dropbox, it made sense to move from the
               | cloud. In the case of Netflix, they decided to move to
               | the cloud.
               | 
               | But you can't stay completely "cloud agnostic".
               | 
               | Let's take a simple case of using Kubernetes and building
               | the underlying infrastructure using Terraform.
               | 
               | The entire idea behind Kubernetes is to abstract your
               | infrastructure - storage, load balancers, etc.
               | 
               | But eventually, you still have to deal with what's
               | underneath. I used AWS's own Docker orchestration service
               | for years - ECS. But I just learned Kubernetes last
               | month.
               | 
               | I still had to know how to troubleshoot problems with IAM
               | permissions, load balancers, view CloudTrail logs for
               | permission issues, know how the underlying storage
               | providers worked, make sure I had the right plug
               | installed for K8s to work with AWS's infrastructure etc.
               | 
               | Once I got all of that figured out, then I could go
               | through the tutorials and mind map the difference between
               | ECS and AWS's Kubernetes implementation - EKS.
               | 
               | But I had years of experience with AWS. I could have
               | never easily troubleshoot the same types of issues with
               | Azure's or GCP's version of K8s. Now multiply that by an
               | entire department.
               | 
               | Once everything is configured correctly, a developers
               | experience would be the same across environments
               | 
               | Migrations at scale are always a pain from one system to
               | another.
               | 
               | Source: I worked at AWS in the Professional Services
               | department for three years. I'm mostly a developer and I
               | dealt with the "modernization" side of "lift and shift
               | and then modernize".
        
         | scop wrote:
         | That was where my mind went to when I saw the headline. Granted
         | that while I'm not on the Finance side of things and am in fact
         | a developer, "six million" didn't seem like much at all
         | considering engineer salaries. It's certainly an achievement,
         | but at what short and long term salaried cost?
        
           | Rastonbury wrote:
           | $6m across 2.5 years is like $15.5m, how many engineers man-
           | months to breakeven, I'm pretty sure it was worth the work.
        
             | mbesto wrote:
             | $6m in perpetuity is like infinite and engineers can be
             | fired.
        
               | robocat wrote:
               | It really isn't. The first years dominate the value and
               | later years are worth nothing due to inflation. Google a
               | calculator and use a reasonable discount rate and I
               | suspect you will find that todays value for an infinite
               | perpetuity is a lot less than your intuition might guess.
               | It always surprises me.
        
               | minkzilla wrote:
               | But they will save more than $6M the second year because
               | AWS will up their prices.
        
               | mbesto wrote:
               | What?
               | 
               | It costs me $10M to run something every year, it now
               | costs me $4M to run something every year. I have $6M in
               | my pocket every year now in perpetuity. Compound that
               | annually with the assumption that I maintain or increase
               | top line revenues and thats pure extra profit.
               | 
               | Note - I admit, all of this ignores two key things (a) we
               | dont know the engineers salaries who built this and (b)
               | we dont know the ongoing maintenance costs.
        
           | rapht wrote:
           | A primer on valuation: in many financial contexts, $1 of
           | operating savings may be worth much more than $1 of
           | investment.
           | 
           | That is because an investment is a one-off, so it's actually
           | worth $1, but the savings are recurring, so they are worth
           | the same number of years that a company's profits are valued.
           | Depending on sector and investors' beliefs in the future of
           | companies, this factor is typically in the 5-20x range. That
           | means that $1 of savings is well worth at least $5 of
           | investments.
           | 
           | Factor in anything you want!
        
             | shermantanktop wrote:
             | In an ideal org, perhaps. In many places, forming that team
             | starts a process where it continually finds reasons to
             | still exist, so your $1m is yearly until a reorg.
             | 
             | Sigh.
        
             | admax88qqq wrote:
             | If that $1 of investment doesn't yield any returns that's
             | not an investment it's just an expense.
             | 
             | So yes $1 of savings is worth more than $1 of spending.
        
               | TheNewsIsHere wrote:
               | You could potentially take it as an investment loss for
               | tax purposes. Whether that's proper depends greatly on
               | the circumstances surrounding how the money was accounted
               | for and spent.
        
         | vertis wrote:
         | A better strategy as a company this size would be to write the
         | PRD for moving and then call AWS and negotiate.
        
           | rmbyrro wrote:
           | I think it's likely that they tried this. But DynamoDB is
           | expensive to consume probably because it's expensive to run
           | and maintain. If you develop for a particular use case, a lot
           | of optimizations can reduce these costs. For a large enough
           | business, the fixed costs of in-house are easily amortized.
           | It'd be hard for AWS to compete.
        
         | dboreham wrote:
         | $6M/y is something like 20 heads (depending on where they are,
         | could be more). So probably it's a win. Hard to see that this
         | could take more than about 5. Add cost of hay and water of
         | course.
        
           | tinyhouse wrote:
           | It's less that 20 heads. The gross spend for each engineer is
           | probably closer to $0.5 million annually. You can layoff 5%
           | without any impact on the company and save so much more. A
           | company like Uber ($130B market cap) isn't going to bother
           | with building something internally to save $6M/year. The only
           | reason to do it is improve efficiency that actually improves
           | the user experience, which then we're talking about a big
           | deal. Sometimes those things happen only because engineers
           | don't have anything else to do and someone needs a
           | promotion...
        
           | hackernewds wrote:
           | is it? if you consider the value 20 engineers could drive
           | instead in that time
           | 
           | if you assume they wouldn't have had anything else meaningful
           | to work on during that time to save money, then you have a
           | different problem in the company. $6M seems like the value 1
           | engineer can drive in a company at the scale of Uber
        
             | appplication wrote:
             | You don't need to consider the cost they could drive during
             | that time. You have a direct and tangible savings for
             | engineering time invested. That possible value they could
             | otherwise derive is moot and hypothetical, this is the real
             | deal!
             | 
             | But if we're being honest, there isn't actually any
             | meaningful quantification of engineering time to understand
             | return on investments at this level (not to say there's
             | _none_ , but it sure does get wish washy). Corporate and
             | engineering strategy isn't so carefully weighed, and to
             | believe otherwise is to fall victim to the pseudoscience
             | that is software estimation. You just have to estimate
             | directionally if a given proposal has you heading in a
             | better direction in the long term, pursue that, and course
             | correct along the way.
             | 
             | Put another way, the end state justifies the means and
             | resourcing. It's rarely possible to fully understand either
             | the costs or benefits with much accuracy up front. You
             | slowly put more resources into projects that show promise,
             | and revoke them if the projects do not appear to be heading
             | in a value add direction.
        
               | cj wrote:
               | >You don't need to consider the cost they could drive
               | during that time.
               | 
               | You don't _need_ to, but you 100% should.  "Opportunity
               | cost" (cost of not doing something) is real.
               | 
               | This is the problem with all refactoring/migration
               | projects. It's very easy to get a lot of people to agree
               | a company should migrate from Node to Go or Monolith to
               | Microservices (or to clean up a mountain of tech debt),
               | but it's much harder to justify the time it takes away
               | from building things your users care about.
        
               | hibikir wrote:
               | True, but often the project that was supposed to build
               | something users care about turns to dust. On one side,
               | you have rosy projections. On the other, a cap on gains,
               | so sure everyone picks the first, but nobody measures if
               | it worked.
               | 
               | One can build a great career working only on key,
               | promising initiatives that never amount to any value in
               | the end. By the time it's clear the project lost money
               | outright, you are on to something else.
        
         | tinyhouse wrote:
         | Spot on. $6M/annually is not much of a saving for a company
         | like Uber ($130B market cap). It only makes sense if it's also
         | more efficient and actually improves the app.
        
           | tayo42 wrote:
           | Yeah in one quarter from a quick google it looks like Uber
           | profits $1.6 Billion. Basically why I never thought about
           | cost savings projects after I learned to put things into
           | perspective.
           | 
           | People really struggle with large numbers in business I've
           | noticed.
        
         | smrtinsert wrote:
         | A quick look at their careers shows they hiring eng seemingly
         | anywhere but the US so maybe they've already saved the dollars
         | there.
        
         | dangus wrote:
         | Because they often are a sunk cost.
         | 
         | E.g., you can't lay off an entire SRE team and have nobody on
         | the on-call rotation. If some of their project work is cost
         | control that is basically free cost savings.
        
       | debarshri wrote:
       | There was an era around 2015, when all the cool tech companies
       | like netflix, spotify, soundcloud, uber and others were building
       | alot of infrastructure and database tools. Nowadays, engineers
       | often talk in AWS/Cloud terminologies.
       | 
       | It is breathe of fresh air to see that orgs are still building
       | tools like that.
        
       | augunrik wrote:
       | Is there some information on why they need to store this much
       | data for immediate retrieval? And why is it so much?
        
       | Antony90807 wrote:
       | Wow crazy amount of work went into this. Well done
        
       | influx wrote:
       | I would gladly pay 6 million/year to not be on call, and have to
       | worry about things like bios and ssd firmware ever again.
        
         | geodel wrote:
         | Thats a great situation to be in when one can spend 6 million
         | even when there was some chance to save.
         | 
         | I tried same for ready to eat meal everyday to save me from
         | potential kitchen disasters but sadly numbers didn't work out.
        
           | influx wrote:
           | You're not saving money, controlling your own destiny sure.
           | That's worth something, maybe even more than 6 million, but I
           | was a SRE at Uber who had to be oncall for systems like this,
           | believe it or not, people like me aren't free either :)
        
         | tedd4u wrote:
         | My "favorite" non-cloud issue was dying/dead RAID card
         | batteries in DB hosts (to preserve unflushed cache in RAM on
         | the card in case of power failure).
        
       | boringg wrote:
       | How much did the migration effort cost?
        
       | geodel wrote:
       | More power to them. At this point even technically decent
       | teams/companies have given up on developing large, complex
       | systems in favor of SaaS. After _carefully evaluating our
       | strategic course of action_ answer always is AWS.
       | 
       | Its only team who propose alternative they have to justify
       | rigorously how come they differ in conclusion.
        
         | dboreham wrote:
         | My Amazon stock thanks you.
        
         | tonyhart7 wrote:
         | "At this point even technically decent teams/companies have
         | given up on developing large, complex systems in favor of SaaS"
         | 
         | Yeah until those bill come, They would consider alternative
        
         | cess11 wrote:
         | Not if you're in the EU, due to, among other things, Schrems
         | II.
        
           | graemep wrote:
           | AFAIK Schrems II prevents transfers of data to the US.
           | 
           | AWS has datacentres around the world, including multiple
           | locations in the EU.
        
             | cess11 wrote:
             | Where did you learn that?
             | 
             | Schrems II prohibits transfer of personal information to
             | companies reachable by the CLOUD act.
        
       | ramesh31 wrote:
       | Another victim of the "Great Normalization", i.e. that entire
       | generation of garbage tech debt generated during the 2010s that
       | was built on NoSQL stores that never should have been, is now
       | coming due. You could probably make an entire consulting business
       | out of migrating these things to MySQL.
        
         | cgh wrote:
         | This is exactly the comment I came here to make. The NoSql
         | technical debt accretes like dead leaves blocking a sewer
         | drain. Eventually someone has to wade in and normalize...the
         | sewer grate, I guess. Okay, it's not a perfect analogy.
        
         | riku_iki wrote:
         | > . You could probably make an entire consulting business out
         | of migrating these things to MySQL.
         | 
         | I think it will be very complex task to run MySQL for 1PB 1T
         | transactions..
        
       | jcims wrote:
       | Every time I've ever used DynamoDB it cost way more than I would
       | have ever expected.
        
         | alexey-salmin wrote:
         | Yes
        
       | ledgerdev wrote:
       | Say you wanted to build an app on a database like LedgerStore but
       | at much smaller scale, what are the best open source options out
       | there right now?
        
         | superzamp wrote:
         | We have a pretty minimal setup at formancehq/ledger[1] that
         | uses pg as a storage backend, and comes with a programmability
         | layer to model complex transactions (e.g. sourcing $100 from
         | three accounts in cascade).
         | 
         | [1] https://github.com/formancehq/ledger
        
       | qwertyuiop_ wrote:
       | Assuming there are a minimum of two teams a total of 20
       | maintaining this in-house software, I gave 250k as cost per
       | engineer (salary plus health and other benefit costs to the
       | company). Thats $5 million right there. I am estimating lowest
       | range. Thats why Amazon calls these efforts undifferentiated
       | heavy lifting. is there a slight premium to pay than rolling your
       | own and maintaining yes. Its worth all the trouble and security
       | and management overhead into rolling your own.
        
       | xyst wrote:
       | Uber must have picked up some Google rejects. This type of
       | homegrown project was seen at Google all the time.
       | 
       | Usually to aim for a significant promotion.
       | 
       | "Designed and built homegrown system to save $Xm! Give me promo,
       | bro?"
       | 
       | Just so happened to ignore that it took X+Y additional to build.
       | Also it will probably be going to the G graveyard in a few years.
        
         | cj wrote:
         | $6m in annual cost savings is truly unremarkable, if we are to
         | believe levels.fyi [0] [1]
         | 
         | If you're truly paying engineers, project managers, etc $500k a
         | head, it dramatically undermines the financial cost savings.
         | 
         | It very well might be the case that "We spent $25m of
         | engineering resources to save $6m annually".
         | 
         | [0] https://www.levels.fyi/companies/uber/salaries/software-
         | engi...
         | 
         | [1] https://www.levels.fyi/companies/uber/salaries/software-
         | engi...
        
           | booi wrote:
           | That's what I was thinking and fully loaded cost at least 35%
           | more than their salary.
           | 
           | Imagine trading 5 headcount full-time to manage the 1T+ fully
           | custom database on an ongoing basis when they could have just
           | used DynamoDB and have been done with it.
           | 
           | Or better, having to engineer a new feature that already
           | existed in DynamoDB and just losing money at that point.
        
           | dangus wrote:
           | Of course now we are assuming that the existing solution
           | didn't also require engineering salaries to maintain.
        
           | leoqa wrote:
           | Bitter story time:
           | 
           | I made a config change to our AWS instances and projected
           | approximately $10MM/year in AWS costs savings (pre-savings).
           | 
           | My boss asked me "Who told you to do this? We need to focus
           | on $project instead". I found another team and transferred
           | out. 3 months later there was a big fire drill about AWS
           | costs and they took my 1-pager and executed it. Didn't get
           | any credit in the shipped email nor did the manager reach out
           | to apologize.
        
             | cj wrote:
             | The organization still got the end result, though (to play
             | devil's advocate). That sounds like a win for the company.
             | They got the cost savings, plus they redirected attention
             | back to a project that was higher priority than saving $10m
             | 3 months earlier than they could have.
        
             | heavenlyblue wrote:
             | Of course you didn't. You used your time to promote
             | yourself instead of doing what you were asked to do
             | instead. That could have cost a promotion for your manager
             | who could have promoted you.
        
               | noncoml wrote:
               | I don't know you are downvoted. Aligning the personal
               | interest vector with the companies interest vector is a
               | huge problem that is usually underrepresented in NH
               | comments.
               | 
               | Usually we only complain about the short sighting of the
               | CEOs that prioritize short term stock gains over long
               | term prosperity, but that also is just a specialized case
               | of the success vector misalignment
        
           | shawabawa3 wrote:
           | > we spent $25m of engineering resources to save $6m annually
           | 
           | This is a huge ROI. Borrowing $25m costs about $1.25m/yr so
           | you're winning even with no upfront costs
        
             | valiant55 wrote:
             | I mean, ideally you are still employing those people into
             | the future. Plus was there other opportunities to drive
             | value that would've been better spent?
        
               | vineyardmike wrote:
               | Ideally for the people, but not a requirement. I doubt
               | they won't be conducting more layoffs either.
        
           | choppaface wrote:
           | Yes but since revenue is growing at over 70% due to squeezing
           | out the drivers, there's more money to spend on fighting
           | Amazon over the DynamoDB contract
           | https://www.forbes.com/sites/lensherman/2023/01/16/ubers-
           | new...
        
           | amanda99 wrote:
           | Right, but that's $25m in R&D investment. Much better than
           | $6m in cost of good/services delivered! Former is great
           | innovation and will be ignored by investors because it's just
           | a fixed cost on the way to becoming profitable. The latter is
           | going to appear in the marginal cost of services calculation.
        
         | password4321 wrote:
         | https://news.ycombinator.com/item?id=38300425#38322311
         | 
         | > _Uber is famous for NIH syndrome_
        
         | amluto wrote:
         | The thing I find odd about this is that the headline figure is
         | about old immutable records. Almost all of that 1.7PB is
         | ancient by what seems to be to by any practical standard. Uber
         | is not likely to care about the credit card authorization flow
         | for a ride two years ago, except maybe for analytics.
         | 
         | If I were doing this, I would be looking at data warehousing
         | systems. 1.7PB of, say, Parquet files in S3 is not terribly
         | expensive. 1.7PB of Parquet files in on-prem or collocated
         | object storage, even replicated a zillion times, is quite
         | cheap. And quite a few companies and open-source projects are
         | currently competing aggressively to provide awesome tools for
         | querying that data.
         | 
         | The hot data would fit on basically anything -- the choice
         | should be about robustness and barely even consider cost per
         | TiB. Datomic got written up recently and seems credible for
         | this type of application. FoundationDB is bulletproof. Postgres
         | could probably handle it without breaking a sweat, although
         | active/active replication isn't free. Heck, writes straight to
         | a warehouse with a cache in front to help with reads seems
         | credible -- Uber rides rarely go for longer than a couple
         | hours, and back-of-the-envelope math suggests that the total
         | data rate is maybe 50GB/hour. An entire day of data for an
         | entire country would fit on a single very ordinary commodity
         | server, and the live data for _the entire world_ would fit on
         | one mildly beefy server. The indexes involved sound
         | straightforward.
        
           | crmd wrote:
           | The primary use case is not analytics. This data store is the
           | system of record in their credit card authorization and
           | billing pipeline, and so it has extreme consistency
           | requirements. The lion's share of its engineering is to
           | provide consistency across a large spectrum of failure modes.
           | 
           | Old data could probably live at lower cost in a data
           | warehouse, but then developers would have multiple systems
           | and namespaces to deal with in order to query on
           | transactions.
        
             | 8ytecoder wrote:
             | Which is ...normal. Different types of data access have
             | different SLA requirements. Any company that cares about
             | cost will warehouse data after x time frame. It's done even
             | in banking. Uber has a much more lenient need to make this
             | data instantly available.
        
             | jiggawatts wrote:
             | Immutable data is always consistent. This is almost
             | certainly an append-only ledger, a well-established
             | solution for a simple problem.
        
         | MarkMarine wrote:
         | I heard from some X-Uber people that you could call Uber a
         | database company as much as you could call it a transportation
         | company. Something like 80+ databases invented there in one
         | form or another.
         | 
         | Promotion-driven development. I suppose better than blog post
         | driven development, but marginally so.
        
           | vineyardmike wrote:
           | > I suppose better than blog post driven development, but
           | marginally so.
           | 
           | I find DB development quite interesting, and I find Uber's
           | core product quite not interesting (from an engineering
           | perspective).
           | 
           | So as an outsider without any financial stake in the company,
           | please keep writing about databases!
        
       | foota wrote:
       | The article states that they already had an in house solution for
       | cold data, so one of the benefits they claim is simplifying by
       | moving to one system for both hot and cold data.
        
       | citizenpaul wrote:
       | I think there is some reckoning of cloud service providers
       | coming(assuming logical actors...). I was doing some contract
       | work for a small place that had a GCP Bigtable that was costing
       | $11k+ per month for some reports that were based on data from a
       | 375MB !!! mysql db into big-table for the reports to run.
       | 
       | They hired some out of school data scientist to do reports and
       | they were doing crazy ineffective things with the tiny dataset.
       | Wanted me to fix it for pennies tomorrow and I declined.
        
         | remus wrote:
         | Not that I disagree with your overall point, but I don't think
         | this
         | 
         | > I was doing some contract work for a small place that had a
         | GCP Bigtable that was costing $11k+ per month for some reports
         | that were based on data from a 375MB !!! mysql db into big-
         | table for the reports to run.
         | 
         | Is a good example. It's just a badly architected system, and
         | you'd have exactly the same problem if you were running the
         | same thing on a massively over provisioned on premise db.
        
       | PeterZaitsev wrote:
       | I think this is fantastic illustration of how expensive
       | proprietary cloud based data stores can be... and what it is
       | feasible to migrate from them to something else.
        
       | benced wrote:
       | $6M... isn't that much?
        
       | SkyMarshal wrote:
       | It seems LedgerStore is not open source [1], and finding any info
       | on it requires following a trail of backlinked Uber blog posts.
       | Here's one with the most info on LedgerStore that I can find,
       | from 2021:
       | 
       | https://www.uber.com/en-US/blog/dynamodb-to-docstore-migrati...
       | 
       | [1]:https://github.com/uber
        
         | PeterZaitsev wrote:
         | Yeah. This looks like some internal solution. In general Uber
         | seems to be high on "not invented here" scale - they like to
         | conclude no existing Open Source solutions are good enough for
         | them and they need to build their own... this is different from
         | Facebook approach for example which chose to made MySQL better
         | by adding MyRocks/RocksDB to it and keep them Open Source.
        
           | vertis wrote:
           | It's a weird world where Facebook/Meta is becoming a small
           | bastion of hope. Llama 2/3 being an example of bucking the
           | trend of going closed source for LLM models.
           | 
           | Granted it's not quite in the same calibre as OpenAI/Claude,
           | and the real test is when it is and they still release it.
        
       | alexey-salmin wrote:
       | I don't know about economics of this particular project but damn
       | dynamodb is expensive. At some point I was thinking that everyone
       | else was just using it wrong, doing scans and queries instead of
       | point-wise lookups into pre-computed tables.
       | 
       | It turns out however that even when you use it as a distributed
       | hashtable you still pay a huge premium.
        
       | pojzon wrote:
       | You look at stuff like that and think about
       | 
       | ,,how much talent is wasted on pointless things that help noone
       | in the world while getting paid heaps for nothing"
       | 
       | We could accomplish everything if ppl stopped wasting time on
       | pointless tasks.
        
       | otterley wrote:
       | Does anyone know whether Uber considered Amazon QLDB for the
       | implementation? Seems like it might have been a good fit, at
       | first blush.
        
       | yazaddaruvala wrote:
       | Reading the article it's clear pretty quickly that Uber was using
       | DynamoDB poorly.
       | 
       | It seems they need strong consistency for certain CUJs and then a
       | lot of data warehousing for historical transactions.
       | 
       | It's strange to me that they didn't first convert their 2 table
       | DynamoDB architecture into DynamoDB and Redshift architecture or
       | similar. This is a pretty common pattern.
        
         | PartiallyTyped wrote:
         | I don't understand why they needed 2 weeks of _immutable_
         | transactions in Dynamo. Could anyone give any hints?
        
       ___________________________________________________________________
       (page generated 2024-05-20 23:00 UTC)