[HN Gopher] Saving cloud costs by writing our own database
       ___________________________________________________________________
        
       Saving cloud costs by writing our own database
        
       Author : wolframhempel
       Score  : 146 points
       Date   : 2024-04-04 11:54 UTC (2 days ago)
        
 (HTM) web link (hivekit.io)
 (TXT) w3m dump (hivekit.io)
        
       | icsa wrote:
       | How is it possible to save more than 100% ?
        
         | wolframhempel wrote:
         | Fair, should be 98%. Can't change the title anymore though
        
         | jbverschoor wrote:
         | Aws credits
        
         | jayd16 wrote:
         | Move from the cloud to on-prem and then sell extra
         | availability.
        
         | olddustytrail wrote:
         | Isn't it obvious?!
         | 
         | 1. Write your own database
         | 
         | 2. ???
         | 
         | 3. Profit!
        
         | aclatuts wrote:
         | Receive a license fee from someone else for using your
         | software!
        
       | mdaniel wrote:
       | Anytime I hear "we need to blast in per-second measurements of
       | ..." my mind jumps to "well, have you looked at the bazillions of
       | timeseries databases out there?" Because the fact those payloads
       | happen to be (time, lat, long, device_id) tuples seems immaterial
       | to the timeseries database and can then be rolled up into
       | whatever level of aggregation one wishes for long-term storage
       | 
       | It also seems that just about every open source "datadog / new
       | relic replacement" is built on top of ClickHouse, and even they
       | themselves allege multi-petabyte capabilities
       | <https://news.ycombinator.com/item?id=39905443>
       | 
       | OT1H, I saw the "we did research" part of the post, and I for
       | sure have no horse in your race of NIH, but "we write to EBS,
       | what's the worst that can happen" strikes me as ... be sure
       | you're comfortable with the tradeoffs you've made in order to get
       | a catchy blog post title
        
         | robertlagrant wrote:
         | > but "we write to EBS, what's the worst that can happen"
         | strikes me as ... be sure you're comfortable with the tradeoffs
         | you've made in order to get a catchy blog post title
         | 
         | In what way?
        
           | freeone3000 wrote:
           | EBS latency is all over the place. The jitter is up to the
           | 100ms scale, even on subsequent IOPS. We've also had
           | intermittent failures for fsync(), which is a case that
           | should be handled but is exceptionally rare for
           | traditionally-attached drives.
        
             | RHSeeger wrote:
             | The author does note in the writeup that they are
             | comfortable with some (relatively rare) data loss; like
             | server failure and the like. Given their use cases, it
             | seems like the jitter/loss of EBS wouldn't be too impactful
             | to them.
        
               | solatic wrote:
               | There's different kinds of data loss. There's data loss
               | because you lose the whole drive; because you lost a
               | whole write; because a write was only partially written.
               | Half the problem with NIH solutions is, what happens when
               | you try to read from your bespoke binary format, and the
               | result is corrupted in some way? So much of the value of
               | battle-tested, multi-decade-old databases is that those
               | are _solved problems_ that you, the engineer building on
               | top of the database, do not need to worry about.
               | 
               | Of course data loss is alright when you're talking about
               | a few records within a billion. It is categorically
               | _unacceptable_ when AWS loses your drive, you try to
               | restore from backup, the application crashes when trying
               | to use the restored backup because of  "corruption", the
               | executives are pissed because downtime is reaching into
               | the hours/days while you frantically try to FedEx a
               | laptop to the one engineer who knows your bespoke binary
               | format and can maybe heal the backup by hand except he's
               | on vacation and didn't bring his laptop with him.
        
         | Spivak wrote:
         | I mean if you spun up Postgres on EC2 you would be directly
         | writing to EBS so that's not really the part I'm worried about.
         | I'm more worried about the lack of replication, seemingly no
         | way to scale reads or writes, beyond a single server, and no
         | way to failover uninterrupted.
         | 
         | I'm guessing it doesn't matter for their use-case which is a
         | good thing. When you realize you only need like this teeny
         | subset of db features and none of the hard parts writing you
         | own starts to get feasible.
        
           | VirusNewbie wrote:
           | Right, cassandra/scylla model is _really_ good for time
           | series use cases, i've yet to see good arguments against
           | them.
        
         | speedgoose wrote:
         | ClickHouse is one of the few databases that can handle most of
         | the time-series use cases.
         | 
         | InfluxDB, the most popular time-series database, is optimised
         | for a very specific kind of workloads: many sensors publishing
         | frequently to a single node, and frequent queries that are not
         | going far back in time. It's great for that. But it doesn't
         | support doing slightly advanced queries such an average over
         | two sensors. It also doesn't scale and is pretty slow to query
         | far back in time due to its architecture.
         | 
         | TimeScaleDB is a bit more advanced, because it's built on top
         | of PostGreSQL, but it's not very fast. It's better than vanilla
         | PostGreSQL for time-series.
         | 
         | The TSM Bench paper has interesting figures, but in short
         | ClickHouse wins and manage well in almost all benchmarks.
         | 
         | https://dl.acm.org/doi/abs/10.14778/3611479.3611532
         | 
         | https://imgur.com/a/QmWlxz9
         | 
         | Unfortunately, the paper didn't benchmark DuckDB, Apache IoTDB,
         | and VictoriaMetrics. They also didn't benchmark proprietary
         | databases such as Vertica or BigQuery.
         | 
         | If you deal with time-series data, ClickHouse is likely going
         | to perform very well.
        
           | lispisok wrote:
           | I work on a project that ingests sensor measurements from the
           | field and in our testing found timescaledb was by far the
           | best choice. The performance x all their timeseries specific
           | features like continuous aggregates and `time_bucket` plus
           | access to the postgres ecosystem was killer for us. We get
           | about 90% reduction in storage with compression without much
           | performance hit too
        
             | omeze wrote:
             | Did you try clickhouse? What were its weak points?
        
         | Too wrote:
         | Apache Parquet as data format on disk seems to be popular these
         | days for similar DIY log/time series applications. It can be
         | appended locally and flushed to S3 for persistence.
        
       | nikonyrh wrote:
       | Very interesting, it must feel great to get to apply CS knowledge
       | at work, rather than writing basic CRUD apis / websites.
        
         | hasmanean wrote:
         | Stick the gps data in a binary file. Store then filename in the
         | database record.
        
       | MuffinFlavored wrote:
       | > We want to be able to handle up to 30k location updates per
       | second per node. They can be buffered before writing, leading to
       | a much lower number of IOPS.
       | 
       | > This storage engine is part of our server binary, so the cost
       | for running it hasn't changed. What has changed though, is that
       | we've replaced our $10k/month Aurora instances with a $200/month
       | Elastic Block Storage (EBS) volume. We are using Provisioned IOPS
       | SSD (io2) with 3000 IOPS and are batching updates to one write
       | per second per node and realm.
       | 
       | I would be curious to hear what that "1 write per second" looks
       | like in terms of throughput/size?
        
         | zaroth wrote:
         | Well they said ~40 bytes per update, so 30k * 40 = 1.2MB/sec...
         | so quite trivial.
         | 
         | They also said 30GB per month which works out to 0.7MB/sec if
         | load is perfectly constant.
        
           | MuffinFlavored wrote:
           | > we've replaced our $10k/month Aurora
           | 
           | How does 0.7MB/sec end up costing $10k/mo in a hosted
           | database?
           | 
           | Can you not achieve 1MB/sec of "queued writes" or something
           | clever against SQLite?
        
             | speedgoose wrote:
             | SQLite in WAL mode would manage for sure.
        
       | awinter-py wrote:
       | we have invented write concern = 0
        
       | RHSeeger wrote:
       | > we've replaced our $10k/month Aurora instances with a
       | $200/month Elastic Block Storage (EBS) volume.
       | 
       | Without any intent to insult what you've done (because the
       | information is interesting and the writeup is well done)... how
       | do the numbers work out when you account for actually
       | implementing and maintaining the database?
       | 
       | - Developer(s) time to initially implement it
       | 
       | - PjM/PM time to organize initial build
       | 
       | - Developer(s) time for maintenance (fix bugs and enhancement
       | requirements)
       | 
       | - PjM/PM time to organize maintenance
       | 
       | The cost of someone to maintain the actual "service" (independent
       | of the development of it) is, I assume, either similar or lower,
       | so there's probably a win there. I'm assuming you have someone on
       | board that was on charge of making sure Aurora was configured /
       | being used correctly; and it would be just as easier if not
       | easier to do the same for your custom database.
       | 
       | The cost of 120,000/year for Aurora seems like it would be less
       | than the cost of development/organization time for the custom
       | database.
       | 
       | Note: It's clear you have other reasons for needing your custom
       | database. I get that. I was just curious about the costs.
        
         | donohoe wrote:
         | I came here to ask the same question.
         | 
         | If this db requires 1 full-time developer then the cost would
         | immediately be not worth it (assuming salary + benefits >
         | $120k/yr)
         | 
         | As you say, without details it's hard to know if this was a
         | good idea.
        
           | bilsbie wrote:
           | Shouldn't we up our standard developer cost for inflation?
           | 
           | That barely qualifies for the median mortgage in the US.
        
             | twbarber wrote:
             | I believe the 120k number was in reference to the OP's
             | Aurora spend.
        
             | Filligree wrote:
             | What makes you think a standard developer can afford a
             | mortgage?
        
             | nightski wrote:
             | I find that highly unlikely. Maybe in specific markets but
             | not US wide.
        
             | hibikir wrote:
             | The median home price is under 400K, so a 120k salary is
             | not really stretched.
             | 
             | Now, median in the Seattle metro, or in San Francisco,
             | sure. But 120k in, say, St Louis is still going to get you
             | an intermediate dev, no problem, and they can afford a
             | house by themselves too. There are 4 bedroom houses in my
             | neighborhood for 300K.
        
           | solatic wrote:
           | I actually disagree with you here. There are costs above and
           | beyond the engineer's effect on the balance sheet. There's
           | the partial salary of management to manage them, plus asking
           | them to document their work and train others so that the
           | database won't have a bus factor of 1. So in well-run
           | engineering departments, there's no such thing as paying for
           | a "single" engineering salary. You have teams; a team
           | maintains the system and it has a pre-existing workload.
           | 
           | A large part of the value of popular platforms is precisely
           | that they are _not_ bespoke. You can hire engineers with
           | MySQL /Postgres experience. You cannot hire engineers who
           | already have experience with your bespoke systems.
        
         | Spivak wrote:
         | I think for this kind of thing their needs are so simple and
         | well-suited to a bespoke implementation that it probably paid
         | for itself in less than 4 months. This doesn't seem like a db
         | implementation that's going to need dedicated maintenance.
         | 
         | They're operationally using a funny spelling of SQLite and I
         | don't imagine anyone arguing that such a thing needs constant
         | attention.
        
         | ReflectedImage wrote:
         | Well presumably they need only 1/3 of a developer to do this
         | and they intend to scale up 10x in the next 5 years.
         | 
         | $60,000 per year in-house vs $1,200,000 per year aurora. No
         | brainer really.
        
           | andai wrote:
           | Also worth mentioning that it's 150x faster.
        
           | cortesoft wrote:
           | Its $120,000 a year for aurora, not $1,200,000.
        
             | bbarnett wrote:
             | _What has changed though, is that we've replaced our $10k
             | /month Aurora instances with a $200/month Elastic Block
             | Storage (EBS) volume._
             | 
             | Note 'instances' eg plural, versus a singular EBS. There is
             | some ambiguity here, I'm not sure where the 10x came from,
             | but it seems plausible.
        
               | swasheck wrote:
               | if they were able to replace aurora instances with a
               | glorified kvp store, then they bought the wrong tool in
               | the first place.
               | 
               | i saved hundreds of dollars per month by switching from
               | an audi a4 to riding a home-built bike for my 1.5 mile
               | commute to work
        
               | bbarnett wrote:
               | To be fair, one hardware mysql server for 5k, outperforms
               | a dozen auroea instances.
               | 
               | It really bugs me how everyone has drank the coolaide.
               | Cloud is stupid expensive, but of course this is cloud vs
               | cloud.
        
         | rdtsc wrote:
         | > The cost of 120,000/year for Aurora seems like it would be
         | less than the cost of development/organization time for the
         | custom database
         | 
         | Only if they planned on hiring someone just to develop this new
         | database and if they switch to Aurora they'd let them go
         | immediately. If the said developer was already costing them
         | $250k to maintain and develop the application and work on top
         | of Aurora cost seems like a good way to save $100k/year.
        
           | organsnyder wrote:
           | There's the opportunity cost of whatever else they could have
           | been paying that developer to work on.
        
             | grogenaut wrote:
             | Agreed, a developer that can pull this off is pretty good,
             | if maybe distracted by shiny objects, what could they do
             | working on the actual product instead of this technological
             | terror?
        
             | rdtsc wrote:
             | True. Also, to your point, one could argue that if that
             | developer leaves, they'd have an easier time hiring anyone
             | with Aurora experience as opposed to someone to learn and
             | maintain the custom database.
             | 
             | But at the same time, Aurora costs could also scale with
             | usage. It may cost $120k one year, $180k next year, $500k
             | the year after. If the database they have now is well
             | designed after it's already built it may not need active
             | development every year but adding a feature here and there.
             | Also, switching back to Aurora could also be an opportunity
             | cost "we should have written our own thing and could have
             | saved millions ...".
        
             | addicted wrote:
             | Well considering the cost is lower than Aurora isn't the
             | opportunity cost in favor of the home built situation.
        
               | organsnyder wrote:
               | That's only true if Aurora is the most valuable thing
               | that developer could be working on.
        
         | preommr wrote:
         | I feel like the word "database" is throwing people off because
         | they're comparing it with something like MySql/Postgres, when
         | this seems slightly more complex than a k/v store stored to a
         | file, with some other indexing, where data integrity is a low
         | priority. That shouldn't take too much time and should be
         | fairly isolated on the tech side so little involvement from
         | product/project managers.
        
           | hmottestad wrote:
           | A k/v store typically is really fast at looking up the value
           | based on the key. So there are usually some pretty advanced
           | indexes involved.
        
             | arandomusername wrote:
             | or a simple b-tree...
        
               | eatonphil wrote:
               | My simple btrees have had bugs in them. :)
               | 
               | (Though to be fair, if I actually wanted to put this in
               | prod it probably wouldn't take too long to fuzz it and
               | fix the kinks.)
               | 
               | https://github.com/eatonphil/btree-rs
        
             | paulddraper wrote:
             | 80/20
        
         | samatman wrote:
         | I would imagine, as someone with no special insight into
         | goings-on at Hivekit, that the answer is intended scale.
         | 
         | They mention 13.5k simultaneous connections. The US has 4.2
         | million tractors alone, just the US, just tractors. If they get
         | 10% of those tractors on the network that's a 30x to their data
         | storage needs. So multiply that across the entire planet, and
         | all the use cases they hope to serve.
         | 
         | Investing time early on so that they can store 50x data-per-
         | dollar is almost certainly time well spent.
        
           | kdazzle wrote:
           | Presumably those tractors wouldn't be connecting directly to
           | the db though. Not sure why they dont just go the standard
           | iot events route and store data in a data lake and propagate
           | into an analytics db/warehouse from there. Add a layer to
           | make recent events available immediately.
           | 
           | S3 is relatively cheap.
        
         | exe34 wrote:
         | > PjM/PM
         | 
         | What do you need them for?
        
         | g9yuayon wrote:
         | > PjM/PM time to organize initial build
         | 
         | This sounds what big companies or a disorganized company would
         | need. For an efficient enough company, a project like this
         | needs just one or two dedicated engineers.
         | 
         | In fact, I can't imagine why this project needs a PM at all.
         | The database is used by engineers and is built by engineers.
         | Engineers should be their own PMs. It's like we need a PM for a
         | programming language, but no, the compiler writer must be the
         | language designer and must use the the language. Those who do
         | not use a product or do not have in-depth knowledge in the the
         | domain should not be the PM of the product.
        
           | vannevar wrote:
           | >For an efficient enough company, a project like this needs
           | just one or two dedicated engineers.
           | 
           | Maybe for a research project or a hobby project, but not for
           | a real, high performance database to be used in a business-
           | critical application.
           | 
           | FTA:
           | 
           | "Databases are a nightmare to write, from Atomicity,
           | Consistency, Isolation, and Durability (ACID) requirements to
           | sharding to fault recovery to administration - everything is
           | hard beyond belief."
           | 
           | >Engineers should be their own PMs.
           | 
           | For small projects, sure (your "one or two dedicated
           | engineers"). But once you start tackling projects that
           | require larger teams, or even teams of teams, you need
           | someone to track and prioritize the work remaining and the
           | work in progress (as well as the corresponding budgets for
           | personnel, services, and other resources). Similar to the way
           | a sole proprietor can do their own accounting, but a multi-
           | million dollar business probably should have an accountant.
           | 
           | As an aside, I wonder if this might be a use case for a
           | bitmap db engine like Featurebase
           | (https://www.featurebase.com/).
        
             | delusional wrote:
             | > what we've built is just a cursor streaming a binary file
             | feed with a very limited set of functionality - but then
             | again, it's the exact functionality we need and we didn't
             | lose any features.
             | 
             | The trick is that they didn't need a database that provides
             | "Atomicity, Consistency, Isolation, and Durability (ACID)".
             | By only implementing what they need they were able to keep
             | the project small.
             | 
             | It's like people are scared of doing anything without
             | making it into some huge multi hundred developer effort.
             | They've written a super simple append only document store.
             | It's not rocket science. It's not a general purpose
             | arbitrary SQL database.
        
           | cortesoft wrote:
           | > a project like this needs just one or two dedicated
           | engineers.
           | 
           | So that is at least 20k a month, for fairly cheap engineers.
        
           | vineyardmike wrote:
           | > In fact, I can't imagine why this project needs a PM at
           | all. The database is used by engineers and is built by
           | engineers. Engineers should be their own PMs.
           | 
           | What about when two different projects have two different
           | requirements they need supported by the database. Which one
           | is implemented first? What about if there is only engineering
           | capacity to implement one?
           | 
           | I don't think a database is the place for "just send a PR for
           | adding your required feature and ping the team that owns it"
           | kind of development. It requires research, planning,
           | architecture review, testing, etc. It's not a hobby project,
           | it's a critical tool for the business.
        
             | delusional wrote:
             | > Which one is implemented first?
             | 
             | One of them. This is true whether you have a person named
             | "PM" or not. It's just a matter of who picks.
             | 
             | > What about if there is only engineering capacity to
             | implement one?
             | 
             | How does naming some guy "PM" solve the issue? The team
             | just picks one of the features.
        
         | deedasmi wrote:
         | Don't forget this is a largely one time cost vs Aurora, which
         | scales cost with usage.
         | 
         | Also they said their current volume is around 13k/second.
         | They've built the new platform for 30k/sec per node. This
         | should last them a long time with minimal maintenance.
        
       | loftsy wrote:
       | Apache Cassandra could be a good fit here. Highly parallel
       | frequent writes with some consistency loss allowed.
        
       | bawolff wrote:
       | Kind of misleading to not include the cost of developing it
       | yourself.
       | 
       | I think everything is cheaper than cloud if you do it yourself
       | when you don't count staffing cost.
        
         | benrutter wrote:
         | Yeah and for most companies without a huge supply of developers
         | the financial risk of having all your stuff blitzed when your
         | home spun solution fails.
        
       | kaladin_1 wrote:
       | I love the attitude, we didn't see a good fit so we rolled ours.
       | 
       | Sure it won't cover the bazillion cases the DBs out there do but
       | that's not what you need. The source code is small enough for any
       | team member to jump in and debug while pushing performance in any
       | direction you want.
       | 
       | Cudos!
        
       | INTPenis wrote:
       | That is such an insane headline.
       | 
       | You might as well say "we saved 100% of cloud costs by writing
       | our own cloud".
        
       | yunohn wrote:
       | This is more a bespoke file format than a full blown database.
       | It's optimized for one table schema and a few specific queries.
       | 
       | Not a negative though, not everything needs a general purpose
       | database. Clearly this satisfies their requirements, which is the
       | most important thing.
        
         | Kalanos wrote:
         | Exactly. There are a hundred questions that come to mind like
         | how does it handle concurrent writes, sharding, views.
         | 
         | https://en.wikipedia.org/wiki/Database#Database_management_s...
         | 
         | I'm sure they learned a lot, but probably a waste in the long
         | run
        
       | yau8edq12i wrote:
       | Wasn't this already discussed here yesterday? The main criticism
       | of the article is that they didn't write a database, they wrote
       | an append-only log system with limited query capabilities. Which
       | is fine. But it's not a "database" in the sense that someone
       | would understand when reading the title.
        
         | throwaway63467 wrote:
         | Why isn't that a database? In my understanding a DB needs to be
         | able to store structured data and retrieve it, so not sure
         | what's missing here? Many modern DBs are effectively append
         | only logs with compaction and some indexing on top as well as a
         | query engine, so personally I don't think it's weird to call
         | this a DB.
        
           | hmottestad wrote:
           | I don't know what point you are really trying to make. At uni
           | the DBMS that everyone learns in their database course is an
           | SQL database. The database part is technically just a binary
           | file, but it's not what people usually mean when they say
           | they need a database for their project. Just like a search
           | engine doesn't have to be anything more than indexOf and a
           | big text file. It's just not very useful to think of it like
           | that.
        
             | Symbiote wrote:
             | You're describing a relational database management system,
             | which is a specific type of software implementing a
             | specific type of database.
        
           | didgetmaster wrote:
           | I agree. Is there an industry accepted definition of what a
           | system must do before it can be called a database?
           | 
           | I also wrote a KV system to keep track of metadata (tags) for
           | an object store I invented. I discovered that it could also
           | be used to create relational tables and perform fast queries
           | against them without needing separate indexes.
           | 
           | I started calling it a database and many people complained
           | that I was misusing the term because it can't yet do
           | everything that Postgres, MySQL, or SQLite can do.
        
             | josephg wrote:
             | Sounds like a database to me.
             | 
             | Databases have a long history that reaches back much
             | further than the modern, full featured SQL databases we
             | have today. What you built sounds like it would fit in well
             | amongst the non-sql databases of the world, like
             | Berkeleydb, indexeddb, mongo, redis, and so on.
        
           | yau8edq12i wrote:
           | Don't be absurd. By your standard, cat, grep and a file form
           | a database. Sure, if you interpret literally what a database
           | is, that fits. But once again, it's not what people have in
           | mind when they read "we cut cloud costs by writing our own
           | database".
        
             | swiftcoder wrote:
             | cat + grep absolutely constitute a database (and it's
             | probably in use in production _somewhere_ ). No need to
             | gatekeep the concept of a database
        
             | com2kid wrote:
             | File systems are databases. Different file systems choose
             | different trade offs, different indexing strategies, etc.
             | 
             | Git is also a database. I got into this argument with
             | someone when I proposed using Github as a database to store
             | configuration entries. Our requirements included needing
             | the ability to review changes before they went live, and
             | the ability to easily undo changes to the config. If your
             | requirements for a DB include those two things, Github is a
             | damn good database platform! (Azure even has built in
             | support for storing configurations in Github!)
        
         | superq wrote:
         | It's difficult to be pedantic about an ambiguous term like
         | database without additional qualification or specificity.
         | 
         | There are more types of databases than those that end in "SQL".
         | 
         | A CSV file alone is a database. The rows are, well, rows. So is
         | a DBM file, which is what MySQL was originally built on (might
         | still be). Or an SQLite file.
         | 
         | The client or server API doesn't have to be part of the
         | database itself.
        
         | xyst wrote:
         | Sounds like Kafka to me. Except have to rewrite components like
         | ksqldb
        
         | Retr0id wrote:
         | If you described those needs to the average engineer, they'd
         | correctly say "use a database".
        
         | eatonphil wrote:
         | > they wrote an append-only log system with limited query
         | capabilities.
         | 
         | This sounds like a database to me.
        
         | forrestthewoods wrote:
         | Writing custom code that does exactly what you need and nothing
         | else is underrated. More people should do that! This is a great
         | example.
        
         | hmottestad wrote:
         | Yeah. They basically defined a binary format. I wouldn't call
         | it a database either.
        
         | mamcx wrote:
         | > But it's not a "database" in the sense that someone would
         | understand when reading the title.
         | 
         | Sure, because it is common for people to mix a "database" (aka:
         | data in some kind of structure) with a paradigm (relational,
         | SQL, document, kv) with a "database _system_ " aka: and app
         | that manages the database.
        
       | Simon_ORourke wrote:
       | I've no doubt this is true, however, anyone I've ever met who
       | exclaimed "let's create our own database" would be viewed as
       | dangerous, unprofessional or downright uneducated in any business
       | meeting. There's just too much can go badly wrong, for all the
       | sunk cost in getting anything up and running.
        
         | democracy wrote:
         | Depends on their meaning of a "database"
        
         | mavili wrote:
         | That is such a problem in today's world. Of course you don't
         | want to re-invent the wheel and all that, but we must be open
         | to the idea of having to do it. Innovation stagnates if people
         | suggesting redoing something are immediately seen as
         | "dangerous, unprofessional or downright uneducated"
        
           | MaKey wrote:
           | I think the issue is that you rarely get to see a neat new
           | solution to a given problem. Usually you'll see some kind of
           | half-baked attempted solution that's worse than the already
           | existing alternatives.
        
             | mavili wrote:
             | Yes, but what I'm describing is the problem of not even
             | listening to the idea of a new attempt.
        
         | akira2501 wrote:
         | > would be viewed as dangerous, unprofessional or downright
         | uneducated in any business meeting
         | 
         | Sounds like a great place to work.
         | 
         | > There's just too much can go badly wrong, for all the sunk
         | cost in getting anything up and running.
         | 
         | Engineering is the art of compromise. In many cases the
         | compromises would not be worth it, but that doesn't mean there
         | are zero places where it would be, and eschewing the discussion
         | out fear of how it would be perceived is the opposite of
         | Engineering.
        
       | endisneigh wrote:
       | It would be interesting to see a database built from the ground
       | up for being trivial to maintain.
       | 
       | I use managed databases, but is there really that much to do for
       | maintaining a database? The host requires some level of
       | maintenance - changing disks, updating the host operating system,
       | failover during downtime for machine repair, etc. if you use a
       | database built for failover I imagine much of this doesn't
       | actually affect the operations that much assuming you slightly
       | over provision.
       | 
       | For a database alone I think the work needed to maintain is
       | greatly exaggerated. That being said I still think it's more than
       | using a managed database, which is why my company still does so.
       | 
       | In this case though, an append log seems pretty simple imo.
       | Better to self host.
        
       | the_duke wrote:
       | I don't know what geospatial features are needed, but otherwise
       | time series databases are great for this use case.
       | 
       | I especially like Clickhouse, it's generic but also a powerhouse
       | that handles most things you throw at it, handles huge write
       | volumes (with sufficient batching), supports horizontal scaling,
       | and offloading long-term storage to S3 for much smaller disk
       | requirements. The geo features in clickhouse are pretty basic,
       | but it does have some builtin geo datatypes and functions for eg
       | calculating the distance.
        
       | fifilura wrote:
       | Would building a data lakehouse be an option?
       | 
       | Stream the events to s3 stored as Parquet or Avro files, maybe in
       | Iceberg format.
       | 
       | And then use Trino/Athena to do the long term heavy lifting. Or
       | for on-demand use cases.
       | 
       | Then only push what you actually need live to a Aurora.
        
         | bsaul wrote:
         | I had a similar idea (except using kafka) : have all the nodes
         | write to a kafka cluster, used for buffering, and let some
         | consumer write those data in batch, into whatever database
         | engine(s) you need for querying, with intermediate pre-
         | processing steps whenever needed. This lets you trade latency
         | for write buffering, while not loosing data thanks to kafka
         | durability guarantees.
         | 
         | What would you use for streaming directly to s3 in high volumes
         | ?
        
           | fifilura wrote:
           | Yeah kafka would handle it, except in my experience i would
           | like to avoid kafka if possible, since it adds complexity.
           | (Fair enough it depends on how precious your data is, if it
           | is acceptable to loose some of it if a node crashes)
           | 
           | But somehow they are ingesting the data over network. Would
           | writing files to s3 be slower than that? Otherwise you don't
           | need much more than a RAM buffer?
           | 
           | Edit: to be clear, kafka is probably the right choice here,
           | it is just that kafka and me is not a love story.
           | 
           | But it should be cheaper to store long term data in s3 than
           | storing it in kafka, right?
        
       | diziet wrote:
       | As others had mentioned, probably hosting your own clickhouse
       | instance could yield major savings while allowing for much more
       | flexibility in the future for querying data. If your use case can
       | be served by what clickhouse offers, gosh is it an incredibly
       | fast and reliable open source solution that you can host
       | yourself.
        
       | zinodaur wrote:
       | Very cool! When I started reading the article I thought it was
       | going to end up using an LSM tree/RocksDB but y'all went even
       | more custom than that
        
       | kumarm wrote:
       | I have built similar system in 2002 using JGroups (JavaGroups at
       | the time before open source project was acquired by JBoss) while
       | persisting asynchronously to DB (Oracle at the time). Our scale
       | even in 2002 was much higher than 13,000 vehicles.
       | 
       | The project I believe still appears in success story on JGroups
       | website after 20+ years. I am surprised people are writing their
       | own databases for location storage in 2024 :). There was no need
       | to invent new technology in 2002 and definitely not in 2024.
        
       | jrockway wrote:
       | Everyone seems fixated on the word database and the engineering
       | cost of writing one. This is a log file. You write data to the
       | end of it. You flush it to disk whenever you've filled up some
       | unit of storage that is efficient to write to disk. Every query
       | is a full table scan. If you have multiple writers, this works
       | out very nicely when you have one API server per disk; each
       | server writes its own files (with a simple mutex gating the write
       | out of a batch of records), and queries involve opening all the
       | files in parallel and aggregating the result. (Map, shuffle,
       | reduce.)
       | 
       | Atomic: not applicable, as there are no transactions. Consistent:
       | no, as there is no protection about losing the tail end of writes
       | (consider "no space left on device" halfway through a record).
       | Independent: not applicable, as there are no transactions.
       | Durable: no, the data is buffered in memory before being written
       | to the network (EBS is the network, not a disk).
       | 
       | So with all of this in mind, the engineering cost is not going to
       | be higher than $10,000 a month. It's a print statement.
       | 
       | If it sounds like I'm being negative, I'm not. Log files are one
       | of my favorite types of time series data storage. A for loop that
       | reads every record is one of my favorite query plans. But this is
       | not what things like Postgres or Aurora aim to do, they aim for
       | things like "we need to edit past data several times per second
       | and derive some of those edits from data that is also being
       | edited". Now you have some complexity and a big old binary log
       | file and some for loops isn't really going to get you there. But
       | if you don't need those things, then you don't need those things,
       | and you don't need to pay for them.
       | 
       | The question you always have to ask, though, is have you reasoned
       | about the business impacts of losing data through unhandled
       | transactional conflicts? "read committed" or "non-durable writes"
       | are often big customer service problems. "You deducted this bill
       | payment twice, and now I can't pay the rent!" Does it matter to
       | your end users? If not, you can save a lot of time and money. If
       | it does, well, then the best-effort log file probably isn't going
       | to be good for business.
        
         | bradleyjg wrote:
         | If you only need those things there's also an off the shelf
         | solution for log files. Time you spend reinventing the wheel is
         | time you aren't spending finding product-market fit (if you've
         | already found it you wouldn't even consider it because you'd be
         | too busy servicing the flood of customers.)
         | 
         | Unless your company is so far past product market fit that it
         | hires qualified applicants by the classfull _or_ whatever-it-is
         | is their product, they have no business coding up custom infra
         | bits. The opportunity cost alone is sufficient argument
         | against, though far from the only one.
        
           | jrockway wrote:
           | I think that EBS is the difficult engineering problem that
           | they purchased instead of built from scratch here. Writing
           | binary records to a file and reading them all into memory is
           | not going to be a time sink that prevents you from finding
           | product/market fit. The $120,000/year burn rate on Aurora
           | they had seems alarming; an alarm that strongly implies "we
           | didn't use the right system for this problem".
           | 
           | My guess for "why didn't they use something off the shelf" is
           | that no existing software would be satisfied with the
           | tradeoffs they made here. Nobody else wants this.
        
         | happymellon wrote:
         | It's also nonsense.
         | 
         | If those were your requirements, why on earth are you using
         | Aurora?
         | 
         | Aurora is a multi-region, failover protected, backup managed
         | service.
         | 
         | This isn't. It would have been cheaper and quicker to install
         | an OpenSource logging DB on an EC2. Like Elastic.
        
       | mavili wrote:
       | That's called engineering; you had a problem, you came up with a
       | solution THAT WORKS for your needs. Nicely done and thanks for
       | sharing.
        
       | xyst wrote:
       | This seems like they rewrote Kafka to me.
       | 
       | Even moderately sized Kafka clusters can handle the throughput
       | requirement. Can even optimize for performance over durability.
       | 
       | Some limited query capability with components such as ksqldb.
       | 
       | Maybe offload historical data to blob storage.
       | 
       | Then again, Kafka is kind of complicated to run at these scales.
       | Very easy to fuck up.
        
         | kdazzle wrote:
         | Plus managed kafka is pretty expensive
        
       | time0ut wrote:
       | Good article.
       | 
       | > EBS has automated backups and recovery built in and high uptime
       | guarantees, so we don't feel that we've missed out on any of the
       | reliability guarantees that Aurora offered.
       | 
       | It may not matter for their use case, but I don't believe this is
       | accurate in a general sense. EBS volumes are local to an
       | availability zone while Aurora's storage is replicated across a
       | quorum of AZs [0]. If a region loses an AZ, the database instance
       | can be failed over to a healthy one with little downtime. This
       | has only happened to me a couple times over the past three years,
       | but it was pretty seamless and things were back on track pretty
       | fast.
       | 
       | I didn't see anything in the article about addressing
       | availability if there is an AZ outage. It may simply not matter
       | or maybe they have solved for it. Could be a good topic for a
       | follow up article.
       | 
       | [0] https://aws.amazon.com/blogs/database/introducing-the-
       | aurora...
        
       | SmellTheGlove wrote:
       | I'm surprised to see the (mostly) critical posts. My reaction
       | before coming to the comments was:
       | 
       | - This is core to their platform, makes sense to fit it closely
       | to their use cases
       | 
       | - They didn't need most of what a full database offers - they're
       | "just" logging
       | 
       | - They know the tradeoffs and designed appropriately to accept
       | those to keep costs down
       | 
       | I'm a big believer in building on top of the solved problems in
       | the world, but it's also completely okay to build shit. That used
       | to be what this industry did, and now it seems to have shifted in
       | the direction of like 5-10% of large players invent shit and open
       | source it, and the other 90-95% are just stitching together
       | things they didn't build in infrastructure that they don't own or
       | operate, to produce the latest CRUD app. And hell, that's not bad
       | either, it's pretty much my job. But it's also occasionally nice
       | to see someone build to their spec and save a few dollars. It's a
       | good reminder that costs matter, particularly when money isn't
       | free and incinerating endless piles of it chasing a (successful)
       | public exit is no longer the norm.
       | 
       | I get the arguments that developer time isn't free, but neither
       | is running AWS managed services, despite the name. And they
       | didn't really build a general purpose database, they built a much
       | simpler logger for their use case to replace a database. I'd be
       | surprised if they hired someone additional to build this, and if
       | they did, I'd guess (knowing absolutely nothing) that the added
       | dev spends 80% of their time doing other things. It's not like
       | they launched a datacenter. They just built the software and run
       | it on cheaper AWS services versus paying AWS extra for the more
       | complex product.
        
       | CapeTheory wrote:
       | It's amazing what can happen when software companies start doing
       | something approximating real engineering, rather than just
       | sitting a UI on top of some managed services.
        
       | zX41ZdbW wrote:
       | Sounds totally redundant to me. You can write all location
       | updates into ClickHouse, and the problem is solved.
       | 
       | As a demo, I've recently implemented a tool to browse 50 billion
       | airplane locations: https://adsb.exposed/
       | 
       | Disclaimer: I'm the author of ClickHouse.
        
       | kroolik wrote:
       | I could be missing something, but I can't really wrap my head
       | around "unlimited paralelism".
       | 
       | What they say is that the logic is embedded into their server
       | binary and they write to a local EBS. But what happens when they
       | have two servers? EBS can't be rw mounted in multiple places.
       | 
       | Won't adding the second and more servers cause trouble like
       | migrating data when new server joins the cluster, or a server
       | leaves the cluster?
       | 
       | I understand Aurora was too expensive for them. But I think it is
       | important to note their whole setup is not HA at all (which may
       | be fine, but the header could be misleading).
        
         | klohto wrote:
         | EBS can be multi attached for a long time now, no perf impact
        
           | kroolik wrote:
           | Oh thanks! Always thought it was what EFS was for. They are
           | still limited to the same AZ, so no multi AZ redundancy.
        
             | klohto wrote:
             | yea, multi AZ failover would be an issue but I assume they
             | don't care that much.
             | 
             | you could spinup new EBS from the backup when the first
             | region fails or keep a warm copy there, but seems like a
             | lot of extra engi work.
        
       | afro88 wrote:
       | These two sentences don't work together:
       | 
       | > [We need to cater for] Delivery companies that want to be able
       | to replay the exact seconds leading up to an accident.
       | 
       | > We are ok with losing some data. We buffer about 1 second worth
       | of updates before we write to disk
       | 
       | Impressive engineering effort on it's own though!
        
       | bevekspldnw wrote:
       | "We are running a cloud platform that tracks tens of thousands of
       | people and vehicles simultaneously"
       | 
       | ...that's not something to brag about.
        
         | sneak wrote:
         | Why, because you think the surveillance implies that it's
         | nonconsensual and thus unethical, or the very small scale
         | (<100k clients) means this isn't actually a very difficult
         | engineering challenge?
        
       ___________________________________________________________________
       (page generated 2024-04-06 23:01 UTC)