[HN Gopher] DynamoDB 10 years later
       ___________________________________________________________________
        
       DynamoDB 10 years later
        
       Author : mariuz
       Score  : 212 points
       Date   : 2022-01-20 11:26 UTC (11 hours ago)
        
 (HTM) web link (www.amazon.science)
 (TXT) w3m dump (www.amazon.science)
        
       | graderjs wrote:
       | The in-the-trenches, technical, battle-tested/battle-scarred
       | comments on this thread are why I come to HN for the comments.
        
       | tmitchel2 wrote:
       | DynamoDB for me is the perfect database for my serverless /
       | graphql API. My only gripe is the limitation of items in a
       | transaction of 25. I've had to resort to layering another
       | transaction mgmt system on the top of it.
        
       | salil999 wrote:
       | I can safely say that the team members working in DynamoDB are
       | very skilled and they care deeply about the product. They really
       | work hard and think of interesting solutions to a lot of problems
       | that their biggest customers face which is great from a product
       | standpoint. There are some pretty smart people working there.
       | 
       | Engineering, however, was a disaster story. Code is horribly
       | written and very few tests are maintained to make sure
       | deployments go without issues. There was too much emphasis on
       | deployment and getting fixes/features out over making sure it
       | won't break anything else. It was a common scenario to release a
       | new feature and put duct tape all around it to make sure it
       | "works". And way too many operational issues. There are a lot of
       | ways to break DynamoDB :)
       | 
       | Overall, though, the product is very solid and it's one of the
       | few database that you can say "just works" when it comes to
       | scalability and reliability (as most AWS services are)
       | 
       | I worked at DynamoDB for over 2 years.
        
         | deanCommie wrote:
         | If the developers are happy about the code and testing quality
         | of a project, then you waited too long to ship.
         | 
         | If the customers don't have any feedback or missed feature asks
         | at launch, you waited too long to ship.
         | 
         | You know who has great internal code and test quality? Google.
         | Which is why Google doesn't ship. They're a wealth distribution
         | charity for talented engineers. And their competitive advantage
         | is that they lure talented people away from other companies
         | where they might actually ship something and compete with
         | Google, to instead park them, distract them with toys, beer
         | kegs, readability reviews, and monorepo upgrades.
        
         | geodel wrote:
         | Very interesting!
         | 
         | To me the takeaway is large/interesting/challenging engineering
         | projects are pretty close to disasters generally. Some time
         | they do become disaster actually.
         | 
         | On the other hand if a project looks like straight up designed,
         | neatly put into JIRA stories, and developers deliver code
         | consistently week after week then it may be a successfully
         | planned and delivered project. But it would mostly be doing
         | stuff that has already been many times over and likely by same
         | people on team.
         | 
         | At least this has been my experience while working on
         | standardized / templated projects vs something new.
        
           | notyourwork wrote:
           | Challenging the cutting edge of your product domain is what I
           | get from this. Easy things are easy and predictable. Hard
           | things and unpredictable evolving requirements are a tension
           | against the initial system design which is the foundation of
           | your code base. Over time the larger projects get the perhaps
           | further they deviate from the original design. If you could
           | predict it up front in many cases its not all that
           | interesting or challenging of a problem. Duct tape is fine to
           | use as long as you understand when you've gone too far and
           | might want to re-design from scratch based on prior
           | learnings.
        
         | otterley wrote:
         | Customers care about the outcome, not the internal process.
         | Besides, I've never worked at any sizable company in my
         | 20+-year-long career where I didn't conclude, "it's a miracle
         | this garbage works at all."
         | 
         | Enjoy the sausage, but if you have a weak stomach, don't watch
         | how it's made.
         | 
         | (I work for AWS but not on the DynamoDB team and I have no
         | first-hand knowledge of the above claim. Opinions are my own
         | and not those of my employer.)
        
           | listenallyall wrote:
           | Just curious, why do you mention you work at AWS if you're
           | just disclaiming that fact in the next sentence? Besides,
           | nothing you stated is specific to AWS or any of its products.
        
             | sokoloff wrote:
             | I don't work at Amazon, but our company's social media
             | policy requires us to be transparent about a possible
             | conflict of interest when speaking about things "close to"
             | our company/our position in the industry and also need to
             | be clear about whether we're speaking in an official
             | capacity or in a personal capacity.
             | 
             | This is designed to reduce the chances of eager employees
             | going out and astro-turfing or otherwise acting in trust-
             | damaging ways while thinking they're "helping".
        
             | sealjam wrote:
             | Crudely speaking, the fact that they work at AWS means that
             | it's in their best interests for AWS to be perceived
             | positively.
             | 
             | When this is the case it's often nice to state this
             | conflict of interest, so others can take your appraisal in
             | the appropriate context.
             | 
             | I'm not implying anything about the post, just stating what
             | I assume to be the reason for the disclosure.
        
           | chrisfosterelli wrote:
           | > Customers care about the outcome, not the internal process.
           | 
           | This is true though there's only so much technical debt and
           | internal process chaos you can create before it affects the
           | outcome. It's a leading indicator, so by the time customers
           | are feeling that pain you've got a lot of work in front of
           | you before you can turn it around, if at all, and customers
           | are not going to be happy for that duration.
           | 
           | Technical debt is not something to completely defeat or
           | completely ignore, instead it's a tradeoff to manage.
        
             | jsdalton wrote:
             | This article from Martin Fowler explores your point in
             | greater depth. It's a good read:
             | https://martinfowler.com/articles/is-quality-worth-
             | cost.html
             | 
             | One concrete problem with technical debt the article
             | highlights is it that negatively impacts the time to
             | deliver new features. Customers today usually expect not
             | only a great initial feature set from a product, but also a
             | steady stream of improvements and growth, along with
             | responsiveness to feedback and pain points.
        
           | salil999 wrote:
           | Exactly this. I was too young at the time to grasp this idea.
        
           | wnolens wrote:
           | > Customers care about the outcome, not the internal process
           | 
           | Additionally, the business cares about the outcome, not the
           | internal process.
           | 
           | Ostensibly, the business should care about process but it
           | actually doesn't matter as long as the product is just good
           | enough to obtain/retain customers, and the people spending
           | the money (managers) aren't incentivized to make costs any
           | lower than previously promised (status quo).
        
         | vasili111 wrote:
         | >Engineering, however, was a disaster story. Code is horribly
         | written and very few tests are maintained to make sure
         | deployments go without issues. There was too much emphasis on
         | deployment and getting fixes/features out over making sure it
         | won't break anything else. It was a common scenario to release
         | a new feature and put duct tape all around it to make sure it
         | "works". And way too many operational issues. There are a lot
         | of ways to break DynamoDB :)
         | 
         | >Overall, though, the product is very solid and it's one of the
         | few database that you can say "just works" when it comes to
         | scalability and reliability (as most AWS services are)
         | 
         | How those two can coexist?
        
           | [deleted]
        
           | badhombres wrote:
           | I mean eventually enough duct tape can be solid like a tank
           | :)
        
             | jeffreygoesto wrote:
             | Or a bridge...
             | 
             | https://www.popularmechanics.com/science/a5732/mythbusters-
             | b...
        
               | bastardoperator wrote:
               | Or a boat...
               | 
               | https://flexsealproducts.com/products/flex-tape
        
           | meepmorp wrote:
           | You need the really good duct tape.
        
           | 0xbadcafebee wrote:
           | You throw bodies at it. A small bunch of people will be
           | overworked, stressed, constantly fighting fires and
           | struggling to fight technical debt, implement features, and
           | keep the thing afloat. Production is always a hair away from
           | falling over but luck and grit keeps it running. To the team
           | it's a nightmare, to the business everything is fine.
        
             | dustingetz wrote:
             | literally every co
             | 
             | if you want to know why capitalism causes this, start a
             | startup and prioritize quality, do not get to market, do
             | not raise money, do not pass go, watch dumpster fires with
             | millions of betrayed and angry users raise their series d
        
             | wnolens wrote:
             | this is the answer.
             | 
             | source: currently being burned out on an adjacent aws
             | team..
        
               | 0xbadcafebee wrote:
               | That sucks, man. If they won't move you to another team,
               | just get out of there. We don't benefit by suffering for
               | them, and they're not gonna change.
        
           | indogooner wrote:
           | It sounds incredulous but I have heard similar things about
           | Oracle. May be a large dev team can duct tape enough so that
           | the product is solid.
        
             | gfd wrote:
             | You're probably thinking of this comment, oracle and 25
             | million lines of c code:
             | https://news.ycombinator.com/item?id=18442941
        
             | eternalban wrote:
             | They both likely have _solid_ 80% solutions (design) and
             | incrementally cover the 20% gap as need arises. This in
             | turn adds to operational complexity.
             | 
             | Alternative would be to attempt a near 'perfect' solution
             | for the product requirements and that may either hit an
             | impossibility wall or may require substantial long term
             | effort that would impede product development cycles. So
             | likely the former approach is the smarter choice.
        
         | AtlasBarfed wrote:
         | So they just rolled out global replication, and I can't for the
         | life of me figure out how they resolve write conflicts without
         | cell timestamps or any other obvious CRDT measures.
         | 
         | Questions were handwaved away, and the usual Amazon black box
         | non-answers which always smells like they are hiding problems.
         | 
         | Any ideas how this is working? It seems bolt-on and not well
         | thought out, and I doubt they'll ever pay for Aphyr to put it
         | through his torture tests.
        
           | greiskul wrote:
           | From: https://aws.amazon.com/dynamodb/global-tables/
           | 
           | Consistency and conflict resolution
           | 
           | Any changes made to any item in any replica table are
           | replicated to all the other replicas within the same global
           | table. In a global table, a newly written item is usually
           | propagated to all replica tables within a second. With a
           | global table, each replica table stores the same set of data
           | items. DynamoDB does not support partial replication of only
           | some of the items. If applications update the same item in
           | different Regions at about the same time, conflicts can
           | arise. To help ensure eventual consistency, DynamoDB global
           | tables use a last-writer-wins reconciliation between
           | concurrent updates, in which DynamoDB makes a best effort to
           | determine the last writer. With this conflict resolution
           | mechanism, all replicas agree on the latest update and
           | converge toward a state in which they all have identical
           | data.
        
           | luhn wrote:
           | Honestly your expectations are too high. Conflict resolution
           | is row-level last-write-wins. It's not a globally distributed
           | database, it's just a pile of regional DynamoDB tables duct
           | taped together... They're not going to hire Aphyr for testing
           | because there's nothing for him to test.
        
         | vp8989 wrote:
         | "And way too many operational issues."
         | 
         | I've seen this kind of thing mentioned many times, pretty
         | baffling TBH based on Dynamo's pretty good reputation in
         | industry. Are these mostly to the stateless components of the
         | product, or do they see data loss?
        
           | salil999 wrote:
           | I can't say in risk of violating some NDA but a lot of it is
           | internal stuff that customers will never even be aware of or
           | it would require too much effort for them to break.
           | 
           | There are times when bad deployments happen and customers
           | were impacted.
        
         | digitalgangsta wrote:
         | I've worked at 5 different tech companies now - this is par for
         | course. And every single one, wished they could go back and do
         | it again, but at that point the product was too successful so
         | they ran with it.
        
         | 0xbadcafebee wrote:
         | I worked at a company who re-implemented the entire Dynamo
         | paper and API, and it was exactly the same story. Completely
         | eliminated all my illusions about the supposed superiority of
         | distributed systems. It was a mound of tires held together with
         | duct tape, with a tiki torch in each tire.
        
           | yftsui wrote:
           | Dynamo paper and DynamoDB are two very different things...
        
           | AtlasBarfed wrote:
           | Did they have a spare 100 million hanging around to burn?
           | That seems pretty ridiculous. Why did they not just run
           | cassandra?
        
             | 0xbadcafebee wrote:
             | They did have 100 million to burn, but my mostly-wild-guess
             | is it was closer to $1.5M/yr. But that gives you an in-
             | house SaaS DB used across a hundred other
             | teams/products/services, so it actually saved money (and
             | nothing else matched its performance/CAP/functionality).
             | 
             | Cassandra is too opinionated and its CAP behavior wasn't
             | great for a service like this, so they built on top of
             | Riak. (This also eliminated any thoughts I had about Erlang
             | being some uber-language for distributed systems, as there
             | were (are?) tons of bugs and missing edge cases in Riak)
        
               | staticassertion wrote:
               | Erlang gives you great primitives for building reliable
               | protocols, but they're just primitives, and there are
               | tons of footguns since building protocols is hard.
        
             | spookthesunset wrote:
             | > Why did they not just run cassandra?
             | 
             | Not Invented Here can run very deep in some branches of an
             | organization. Depending on how engineering performance
             | evaluations work, writing a homebrew database could totally
             | be something that aligns with the company incentives. It
             | might not make a single bit of sense from a business
             | standpoint but hey, if the company rewards such behavior
             | don't be surprised when engineers flush millions down the
             | tube "innovating" a brand new wheel.
        
         | sam0x17 wrote:
         | It's a shame they don't open source it. It's funny too, being
         | AWS they really don't have to worry about AWS running a cheaper
         | service, so at that point why not open source it.
        
           | [deleted]
        
           | js4ever wrote:
           | There is a compatible open source alternative here,
           | https://www.scylladb.com/alternator/
        
           | nexuist wrote:
           | They probably view it as a competitive advantage that Azure
           | or GCP would try to copy if they figured out the "secret
           | sauce."
        
             | staticassertion wrote:
             | I kinda doubt it. It's probably just that open sourcing it
             | won't provide much utility (I bet lots of code is aws
             | specific) and just adds a new maintenance burden for them.
        
             | sam0x17 wrote:
             | Azure has Cosmos and Google has Datastore. They would
             | never.
        
             | blowski wrote:
             | Azure has Cosmos which is arguably better than DynamoDB for
             | a lot of use cases.
        
       | dabfiend19 wrote:
       | knocking off mongo... 10 years later they still haven't caught
       | up.
        
       | artembugara wrote:
       | We realized how great Dynamo was only after we migrated off AWS.
       | 
       | Dynamo was a key factor to us when we were releasing the MVP of
       | our News API [0]. We used Dynamo, ElasticSearch, Lambda and could
       | make it running in 60 days while being full-time employed.
       | 
       | Also, the best tech talk I saw was given by Rick Houlihan on
       | re:Invent [1]
       | 
       | I highly recommend every engineer to watch it: it's a great
       | overview of SQL vs NoSQL
       | 
       | [0] https://newscatcherapi.com/blog/how-we-built-a-news-api-
       | beta...
       | 
       | [1] https://www.youtube.com/watch?v=HaEPXoXVf2k
        
         | pier25 wrote:
         | BTW Rick Houlihan left AWS recently to work for Mongo.
         | 
         | https://twitter.com/houlihan_rick/status/1472969503575265283
         | 
         | On that thread he criticizes AWS regarding DynamoDB openly.
         | 
         | > _I will always love DynamoDB, but the fact is it is losing
         | ground fast because AWS focuses most of their resources on the
         | half baked #builtfornopurpose database strategy. I always hated
         | that idea, I just bit my tongue instead of saying it._
         | 
         | > _The problem is the other half-baked database services that
         | all compete for the same business. DocumentDB, Keyspaces,
         | Timestream, Neptune, etc. Databases take decades to optimize,
         | the idea that you can pump them out like web apps is silly._
         | 
         | > _I was very tired of explaining over and over again that
         | DynamoDB is actually not the dumbed down Key-Value store that
         | the marketing message implied. When AWS created 6 different
         | NoSQL databases they had to make up reasons for each one and
         | the messaging makes no sense._
        
       | manishsharan wrote:
       | Quite a few of the teams that were early adopters of AWS DynamoDB
       | were not prepared for the pricing nuances that had to be taken
       | into consideration when building their solutions.
        
         | xtracto wrote:
         | I remember trying Dynamodb around 2015/2016: You had to specify
         | your expected read and write throughout and you would be billed
         | for that. At that time we had a pretty spikey traffic use case
         | which made using dynamodb efficiently impossible
        
           | musingsole wrote:
           | I had a similar experience, but ultimately wrote a service to
           | monitor our workloads and request increased provisioning
           | during spikes. You could reduce your provisioning like 10
           | times a day, but after that you could only increase it and
           | would be stuck with the higher rate for a time.
           | 
           | And then on-demand provisioning was released and it was cheap
           | enough to be worth simplifying our workflows.
        
         | jvanvleet wrote:
         | I was one of these. However I now understand that the pricing
         | nuances reflected a reality that I appreciate. We used DDB in a
         | way that was not the best fit and the cost was a reflection of
         | this.
        
       | balls187 wrote:
       | DynamoDB is one of my favorite AWS products.
        
       | victor106 wrote:
       | We tried to implement an application on DynamoDB about 2 years
       | ago.
       | 
       | We really struggled with implementing adhoc queries/search. For
       | e.g:- select * from employees where name = X and city = Y.
       | 
       | Any improvements in DynamoDB that make it easier to implement
       | such queries?
        
         | manigandham wrote:
         | DynamoDB (and other dynamo-like systems like Cassandra,
         | Bigtable) are just advanced key/value stores. They support
         | multiple levels of keys->values but fundamentally you need the
         | key to find the associated value.
         | 
         | If you want to search by parameters that aren't keys then you
         | need to store your data that way. Most of these systems have
         | secondary indexes now, and that's basically what they do for
         | you automatically in the backend, storing another copy of your
         | records using a different key.
         | 
         | If you need adhoc relational queries then you should use a
         | relational database.
        
           | ignoramous wrote:
           | > _If you want to search by parameters that aren 't keys then
           | you need to store your data that way._
           | 
           | Not that I recommend it, but by using space-filling curves,
           | one could to index multiple dimensions onto DynamoDB's bi-
           | dimensional (hash-key, range-key) primary-index:
           | https://aws.amazon.com/blogs/database/z-order-indexing-
           | for-m... and https://web.archive.org/web/20220120151929/https
           | ://citeseerx...
        
         | Jupe wrote:
         | DynamoDB is not meant for ad-hoc query patterns; as others have
         | said, plan your indexes around your access patterns.
         | 
         | However, so long as you add a global secondary index (GSI) with
         | name, city as the key, you can certainly do such things. But be
         | aware for large-scale solutions:
         | 
         | 1. There's a limit of 20 GSIs per table. You can increase with
         | a call to AWS support.
         | 
         | 2. GSIs are latently updated; read-after write is not
         | guaranteed, and there is no "consistent read" option on a GSI
         | like there is with tables.
         | 
         | 3. WCUs on GSIs should match (or surpass) the WCUs on the
         | original table, else throughput limit exceeded exceptions will
         | occur. So, 3 GSIs on a table means you pay 4x+ in WCU costs.
         | 
         | 4. The keys of the GSI should be evenly distributed, just like
         | the PK on a main table. If not, there is additional opportunity
         | for hot partitions on write.
         | 
         | Ref: https://aws.amazon.com/premiumsupport/knowledge-
         | center/dynam...
        
         | [deleted]
        
         | itsmemattchung wrote:
         | With DynamoDB, you can now execute SQL queries using PartiQL:
         | 
         | https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
        
           | augustl wrote:
           | Note that this is just a new syntax for the existing querying
           | capabilities. If you query something that's not in the
           | hash/sort key, you still need to filter on the "client" after
           | the 1mb data set size limit etc.
        
           | _hao wrote:
           | I'm really happy that Cosmos DB has this -
           | https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-
           | que...
           | 
           | I haven't used DynamoDB in a couple of years, so I'd be
           | curious to know how querying compares if anyone can share
           | some light that has used both Cosmos and Dynamo recently.
        
         | unfunco wrote:
         | I struggled at first but I watched Advanced Design Patterns for
         | DynamoDB[0] a few times and it clicked. As other responses have
         | suggested, generally you define your access patterns first and
         | then structure the data later to fit those access patterns.
         | 
         | [0]: https://www.youtube.com/watch?v=HaEPXoXVf2k
        
         | garydevenay wrote:
         | The most reliable way to build a system with DynamoDB is to
         | plan queries upfront. Trying to use it like a SQL database and
         | make Ad-Hoc queries won't work because it's not a SQL DB.
         | 
         | Data should be stored in the fashion you wish for it to be
         | read, and storing the same data in more than one configuration
         | is acceptable.
         | 
         | Good resource:
         | https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
        
         | jugg1es wrote:
         | That's not what DynamoDB is for. If you need to run queries
         | like that, you should be using RDBMS. DynamoDB should only
         | really be used for use cases where the queries are known up-
         | front. There are ways to design your data model in Dynamo so
         | that you could actually run queries like that, but you would
         | have had to that work from day 1. You won't be able to
         | retroactively support queries like that.
        
         | jon-wood wrote:
         | This will sound flippant, but that's not what Dynamo is for. If
         | you want to do freeform relational queries like that then put
         | it in a relational database.
         | 
         | Dynamo is primarily designed for high volume storage/querying
         | on well understood data sets with a few query patterns. If you
         | want to be able to query information on employees based on
         | their name and city you'll need to build another index keyed on
         | name and city (in practice Dynamo makes that reasonably simple
         | by adding a secondary index).
        
           | owenmarshall wrote:
           | Alternatively, practice single table design: structure your
           | table keys in such a way that they can represent all (or at
           | least most) of the queries you need to run.
           | 
           | This is often easier said than done, but it can be far less
           | expensive and more performant than adding an index for each
           | search.
        
             | ellimilial wrote:
             | It's always great fun compounding new, manual 'indexes'
             | when you discover you need another query.
        
           | willcipriano wrote:
           | Amazon has a perfect use case for this. You click on a
           | product in the search results, that url contains a UUID, that
           | UUID is used to search Dynamo and returns an object that has
           | all the information on the product, from that you build the
           | page.
           | 
           | If what you are trying to do looks more like "Give me all the
           | customers that live in Cuba and have spent more than $10 and
           | have green eyes", Dynamo isn't for you. You can query that
           | way but after you put all the work in to get it up and
           | running, you'd probably be better off with Postgres.
        
             | 8note wrote:
             | If that's one of 12 or less query patterns you need, I can
             | write you a simple dynamo table for it. Dynamo's limitation
             | is that it can only support n different query patterns, and
             | you have to hand craft an index for each one(well,
             | sometimes you can get multiple on one index)
        
       | gigatexal wrote:
       | If you follow Rick Houlihan (@houlihan_rick) then all the
       | accolades that AWS for DynamoDB pale in comparison to its current
       | team and execution in that the company seems to not be investing
       | in it so much so that Rick left to join MongoDB.
        
         | vslira wrote:
         | Man I love Rick's talks as much as anyone but let's be real, he
         | likely left AWS not for his love of first class geographical
         | indexes but because Mongo offered a giant pile of money for him
         | to evangelize their tech. Though I have no doubts that he
         | actually had a lot of reservations around Dynamo's DX before,
         | he likely has some around mongodb but those won't be the bulk
         | of his content
        
           | gigatexal wrote:
           | At his rank at AWS I don't know if money was such an issue.
           | He strikes me as a person who cares deeply about the
           | underlying tech. But I have no idea one way or the other.
        
             | awsthro00945 wrote:
             | I think I've seen you post something similar on r/aws about
             | how Rick was "top DynamoDb person at AWS" (apologies if
             | that wasn't you). I think you are overestimating Rick's
             | "rank".
             | 
             | I just looked him up (I had not heard of him before seeing
             | his name mentioned on r/aws a few days ago) and he was an
             | L7 TPM/Practice Manager in AWS's sales organization. That's
             | not really a notably high position, and in the grand scheme
             | of Amazon pay scales, isn't that high up. An L7 TPM gets
             | paid about the same as, or sometimes less than, an L6
             | software dev (L6 is "senior", which is ~5-10 years of
             | experience).
             | 
             | Also, him being in the sales org means he had practically
             | nothing to do with the engineering of the service. AWS
             | Sales is a revolving door of people. I mean no offense
             | towards Rick (again, I didn't know him or even know of him
             | before I read his name in a comment a few days ago), but I
             | would not read anything at all into the fact that an L7
             | Sales TPM left for another company.
        
               | belter wrote:
               | You never heard of Rick Houlihan? He is the 90% of
               | DynamoDB Evangelism... At the same time you are able to
               | this internal lookups? Do you work with DynamoDB?
               | 
               | AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced
               | Design Patterns for DynamoDB (DAT401)
               | https://youtu.be/HaEPXoXVf2k
               | 
               | AWS re:Invent 2019: [REPEAT 1] Amazon DynamoDB deep dive:
               | Advanced design patterns (DAT403-R1)
               | https://youtu.be/6yqfmXiZTlM
               | 
               | AWS re:Invent 2020: Amazon DynamoDB advanced design
               | patterns - Part 1 https://youtu.be/MF9a1UNOAQo
               | 
               | AWS re:Invent 2020: Amazon DynamoDB advanced design
               | patterns - Part 2 https://youtu.be/_KNrRdWD25M
               | 
               | AWS re:Invent 2021 - DynamoDB deep dive: Advanced design
               | patterns https://youtu.be/xfxBhvGpoa0
               | 
               | Amazon DynamoDB | Office Hours with Rick Houlihan:
               | Breaking down the design process for NoSQL applications
               | https://www.twitch.tv/videos/761425806
        
               | amzn-throw wrote:
               | Do you expect the engineers on your team to know the top
               | sales person at your company?
               | 
               | This person might be responsible for the majority of
               | evangelism and revenue for the company. Do you expect the
               | SDEs to know about him?
               | 
               | Again, no shot against against Rick - he is amazing,
               | smart, technical, competent, and a deep owner.
               | 
               | But the average SDE on the team won't know about these or
               | watch these talks. There are too many deep internal
               | engineering challenges to solve.
        
               | belter wrote:
               | Are you calling the person who did the core DynamoDB
               | Technical Deep Dive sessions at reInvent, for the last 4
               | years in a row, a sales person?
        
               | amzn-throw wrote:
               | What do you think Solutions Architects and Developer
               | Advocates (between the two groups who do most Re:invent
               | sessions) are?
               | 
               | Hell, what do you think re:Invent is? It's a sales
               | conference.
               | 
               | In any company you have two groups of people: Those that
               | build the product, and those that sell it. Ultimately,
               | solutions architects and developer advocates are there to
               | help sell the product.
               | 
               | Of course Amazon is customer obsessed. And genuinely
               | interested in ensuring customers have a good experience,
               | and their technical needs are met - through education,
               | support, and architectural guidance. But ultimately,
               | that's what it is.
        
               | gigatexal wrote:
               | Maybe that was the problem. He cited that there was
               | seemingly not enough effort in making DynamoDB better as
               | evidenced by the many orthogonally very close other DBs
               | that AWS promotes. If Rick was ears to the ground
               | listening to customers and sending back feedback but it
               | was falling on deaf ears that's enough ground for someone
               | as high up and as influential and productive as him to
               | leave. It also speaks to inner AWS turmoil at least at
               | DynamoDB.
        
               | amzn-throw wrote:
               | Based on what I know, that's not the case.
               | 
               | DDB is a steady ship. The explanation on
               | https://news.ycombinator.com/item?id=30009611 is likely
               | the best explanation. L7 TPMs make the same money as L6
               | SDEs.
               | 
               | Getting promoted to L8 - director - is a monumental
               | effort and likely seemed much harder than pursuing a
               | comprable position at MongoDB.
               | 
               | Good for him for doing it, and for making Amazon take a
               | long hard look at every way they failed in not keeping
               | him.
        
               | gigatexal wrote:
               | was not me at r/aws
               | 
               | unless he posts here about it we can't really know -- we
               | can only speculate but I think he had a higher amount of
               | influence than his title/rank might suggest. I think
               | Rick's influence with respect to DynamoDB is akin to that
               | of Kelsey Hightower's influence over k8s at Google.
        
       | tybit wrote:
       | For anyone else expecting this to be a paper given the domain
       | name, it's not. It's a non technical interview with a couple of
       | the original papers authors. Not bad, just not as exciting as I
       | imagine a paper detailing what they've learnt from a distributed
       | systems perspective etc operating Dynamo then DynamoDB for so
       | long now.
        
         | uvdn7 wrote:
         | https://brooker.co.za/blog/2022/01/19/predictability.html This
         | might be something you are looking for.
        
         | mjb wrote:
         | We don't have a paper on DynamoDB's internals (yet?), but
         | here's a talk you might find interesting from one of the folks
         | who built and ran DDB for a long time:
         | https://www.youtube.com/watch?v=yvBR71D0nAQ
         | 
         | And Doug Terry talking through the details of how DynamoDB's
         | transaction protocol works:
         | https://www.usenix.org/conference/fast19/presentation/terry
         | 
         | If we did publish more about the internals of DDB, what would
         | you be looking to learn? Architecture? Operational experience?
         | Developer experience? There's a lot of material we could share,
         | and it's useful to hear where people would like us to focus.
        
           | pow_pp_-1_v wrote:
           | All of it - architecture, operational experience, best
           | practices etc.
        
             | ldrndll wrote:
             | Just want to second this. All of the above sounds really
             | interesting to me!
        
       | cmollis wrote:
       | we use DynamoDB like a big hash table of s3 file locations.. we
       | look up these locations via a key (at the time, it sounded like a
       | pretty good use-case for it). I suppose we could have used some
       | other managed redis or memcached thing, but being an AWS shop, it
       | was, and is, pretty useful. I have to say, it's been pretty
       | effortless to configure.. read/write units are really the only
       | thing we've had to configure (other than the base index) The rest
       | of it has been easy. It has about a 100 million entries that are
       | read or written pretty quickly.
        
         | Jach wrote:
         | I remember talking to someone who was playing with AWS stuff
         | for the first time and they had a similar architecture, using
         | Dynamo for a lookup store. It still seems a bit odd to me
         | though. It's been a long time since I've worked with the S3
         | API, so maybe it just doesn't support the same sort of thing,
         | but wouldn't it be nicer to just query S3 with some key and get
         | back either the path/URL to render a link, or the content
         | itself? Why the Dynamo intermediary? (And on the other side, if
         | you don't need to render a link to serve the content, why not
         | use Dynamo as the actual document store and skip S3? Storage
         | cost?)
        
       | 0xbadcafebee wrote:
       | I haven't run into anyone who uses Dynamo for anything other than
       | managing Terraform backend state locking. And I think that's
       | still the best use case for it: you just want to store a couple
       | random key-values somewhere and have more functionality than AWS
       | Parameter Store. Trying to build anything large-scale with it
       | will probably leave you wanting.
        
       | thekozmo wrote:
       | Indeed quite a journey. If you love DynamoDB and like open
       | source, give Scylla a try (disclosure me==founder):
       | https://www.scylladb.com/alternator/
        
       | minaor wrote:
        
       | nathanfig wrote:
       | I wonder if DynamoDB would be met with less criticism had it
       | simply been named Dynamo Document Store.
        
       | afandian wrote:
       | We're at early stages of planning an architecture where we
       | offload pre-rendered JSON views of PostgreSQL onto a key value
       | store optimised for read only high volume. Considering DynamoDB,
       | S3, Elastic, etc. (We'll probably start without the pre-render
       | bit, or store it in PostgreSQL until it becomes a problem).
       | 
       | When looking at DynamoDB I noticed that there was a surprising
       | amount of discussion around the requirement for provisioning,
       | considering node read/write ratios, data characteristics, etc.
       | Basically, worrying about all the stuff you'd have to worry about
       | with a traditional database.
       | 
       | To be honest, I'd hoped that it could be a bit more 'magic', like
       | S3, and it AWS would take care of provisioning, scaling, sharding
       | etc. But it seemed disappointingly that you'd have to focus on
       | proactively worrying about operations and provisioning.
       | 
       | Is that sense correct? Is the dream of a self-managing, fire-and-
       | forget key value database completely naive?
        
         | eknkc wrote:
         | I believe it used to be static provisioning, you'd set the read
         | and limit capacity beforehand. Then obviously there is
         | autoscaling of those but it is still steps of capacity being
         | provisioned.
         | 
         | They now have a dynamic provisioning scheme, you simply don't
         | care but it is more expensive so if you have predictible
         | requirements it is still better to use static capacity
         | provisioning. There is an option though.
         | 
         | DynamoDB also requires the developer to know about its data
         | storage model. While this is generally a good practice for any
         | data storage solution, I feel like Dynamo requires a lot more
         | careful planning.
         | 
         | I also think that most of the best practices, articles etc
         | apply to giant datasets with huge scale issues etc. If you are
         | running a moderately active app, you probably can get away with
         | a lot of stupid design decisions.
        
           | paulgb wrote:
           | My experience with dynamic provisioning has been that it is
           | pretty inelastic, at least at the lower range of capacity.
           | E.g. if you have a few read units and then try to export the
           | data using AWS's cli client, you can pretty quickly hit the
           | capacity limit and have to start the export over again. Last
           | time, I ended up manually bumping the capacity way up,
           | waiting a few minutes for the new capacity to kick in, and
           | then exporting. Not what I had in mind when I wanted a
           | serverless database!
        
             | moduspol wrote:
             | I understand it's not really your point, but if you're
             | actually looking to export all the data from the table,
             | they've got an API call you can give to have DynamoDB write
             | the whole table to S3. This doesn't use any of your
             | available capacity.
             | 
             | https://docs.aws.amazon.com/amazondynamodb/latest/developer
             | g...
             | 
             | Beyond that, though, it's really not designed for that kind
             | of use case.
        
               | paulgb wrote:
               | Ah, fair point. Somehow I didn't encounter that when I
               | was trying to export, even though it existed at the time.
               | But it would have solved my problem.
        
         | restlake wrote:
         | If you know and understand S3 pretty well, and you purely need
         | to generate, store, and read materialized static views, I
         | highly recommend S3 for this use case. I say this as someone
         | who really likes working with DDB daily and understands the
         | tradeoffs with Dynamo. You can always layer on Athena or
         | (simpler) S3 Select later if a SQL query model is a better fit
         | than KV object lookups. S3 is loosely the fire and forget KV DB
         | you're describing IMO depending on your use case
        
         | k__ wrote:
         | After looking into solutions like Fauna, Upstash, and
         | Planetscale I don't understand why anyone is bothering with DDB
         | anymore.
         | 
         | I read "the dynamodb book" and almost got a stroke. So much
         | idiosyncrasies, for what?!
        
         | manigandham wrote:
         | Plenty of options already exist. DynamoDB has both autoscaling
         | and serverless modes. AWS also has managed Cassandra (runs on
         | top of DynamoDB) which doesn't need instance management.
         | 
         | Azure has CosmosDB, GCP has Cloud Datastore/Firestore, and
         | there are many DB vendors like Planetscale (mysql), CockroachDB
         | (postgres), FaunaDB (custom document/relational) that have
         | "serverless" options.
        
         | lkrubner wrote:
         | Exactly. This has been my experience with several AWS
         | technologies. Like with their ElasticSearch service, where I
         | had to constantly fine-tune various parameters, such as memory.
         | I was curious why they couldn't auto-scale the memory, why I
         | had to do that manually. There are several AWS services that
         | should be a bit more magical, but they are not.
        
         | Tehnix wrote:
         | A lot of things that used to be a concern (hot partitions, etc)
         | are not a concern anymore and most have been solved these days
         | :)
         | 
         | Put it on on-demand pricing (it'll be better and cheaper for
         | you most likely), and it will handle any load you throw at it.
         | Can you get it to throttle? Sure, if you absolutely blast it
         | without ever having had that high of a need before (and it can
         | actually be avoided[0]).
         | 
         | You will need to understand how to model things for the NoSQL
         | paradigm that DynamoDB uses, but that's a question of
         | familiarity and not much else (you didn't magically know SQL
         | either).
         | 
         | My experience comes from scaling DynamoDB in production for
         | several years, handling both massive IoT data ingestion in it
         | as well as the user data as well. We were able to replace all
         | things we _thought_ we would need a relational database for,
         | completely.
         | 
         | My comparison between a traditional RDS setup: - DynamoDB
         | issues? 0. Seriously. Only thing you need to monitor is
         | billing. - RDS? Oh boy, need to provision for peak capacity,
         | need to monitor replica lags, need to monitor the Replicas
         | themselves, constant monitoring and scaling of IOPS, suddenly
         | queries get slow as data increases, worrying about indexes and
         | the data size, and much more...
         | 
         | [0]: https://theburningmonk.com/2019/03/understanding-the-
         | scaling...
        
         | jugg1es wrote:
         | I do not recommend starting off with a decision to use DynamoDB
         | before you have worked with it directly for some time to
         | understand it. You could spend months trying to shoehorn your
         | use case into it before realizing you made a mistake. That
         | said, DynamoDB can be incredibly powerful and inexpensive tool
         | if used right.
        
           | rmbyrro wrote:
           | I think this can be said about any technology, really...
        
             | jugg1es wrote:
             | Yea, probably, but it is _especially_ true for DynamoDB
             | because it can initially appear as though your use cases
             | are all supported but that is only because you haven 't
             | internalized how it works yet. By the time you realize you
             | made a mistake, you are way too far in the weeds and have
             | to start over from scratch. I would venture that more than
             | 50% of DynamoDB users have had this happen to them early
             | on. Anecdotally, just look at the comments on this post.
             | There are so many horror stories with DynamoDB, but they're
             | basically all people who decided to use it before they
             | really understood it.
        
         | nesarkvechnep wrote:
         | If it's possible in your situation, instead of vendor lock-in,
         | invest in cacheability of your service and leverage HTTP cache
         | as much as possible.
        
         | Marazan wrote:
         | DyanmoDB is pretty much the opposite of magic.
         | 
         | It is a resource that can often be the right tool for the job
         | but you really have to understand what the job is and carefully
         | measure Dynamo up for what you are doing.
         | 
         | It is _easy_ to misunderstand or miss something that would make
         | Dynamo hideously expensive for your use case.
        
           | uberdru wrote:
           | What use cases would likely make it hideously expensive, in
           | your view? Like, what are the red flags?
        
             | Marazan wrote:
             | Hot keys are the primary one. They destroys your "average"
             | calculations for your throughput.
             | 
             | Bulk loading data is the other gotcha I've run into. Had a
             | beautiful use case for steady read performance of a batch
             | dataset that was incredibly economical on Dynamo but the
             | cost/time for loading the dataset into Dynamo was totally
             | prohibitive.
             | 
             | Basically Dynamo is great for constant read/write of very
             | small, randomly distributed documents. Once you are out of
             | thay zone things can hey dicey fast.
        
             | rmbyrro wrote:
             | Hot keys are much lesser of an issue nowadays. It'd been a
             | big one in old DDB architectures.
             | 
             | I'd say requiring scans or filters as opposed to queries is
             | one of the biggest issues that can bite your pocket.
             | 
             | Think carefully about how you'll access your data later.
             | You won't be able to change it drastically and cheaply
             | later.
        
         | aneil wrote:
         | Exactly my experience. I got sucked into using more than once,
         | thinking it would be better next time, but there are just so
         | many sharp edges.
         | 
         | At one company, someone accidentally set the write rate rate
         | high to transfer data into the db. This had the effect of
         | permanently increasing the shard count to a huge number,
         | basically making the DB useless.
        
         | gonzo41 wrote:
         | There's not really magic with s3, you still need to name things
         | with coherrent prefixes to spread around the load.
         | 
         | DynamoDB is almost simple enough to learn in a day. And if
         | you're doing nothing with it, you're only really paying for
         | storage. Good luck with your decisions.
        
           | ralusek wrote:
           | S3 naming no longer matters for performance. Rejoice.
        
           | PaywallBuster wrote:
           | Prefixes are not needed 90% of use cases
        
             | brodouevencode wrote:
             | I'm not going to speculate on the accuracy of 90% value,
             | but I will say that appropriately prefixed objects
             | substantially help with performance when you have tons of
             | small-ish files. Maybe most orgs don't have that need but
             | in operational realms doing this with your logs make the
             | response faster.
        
         | snorkel wrote:
         | If you don't need data persistence then consider redis instead
         | (which can also do persistence if you enable AOF)
        
         | amzn-throw wrote:
         | The key benefit with DDB is predictability:
         | https://brooker.co.za/blog/2022/01/19/predictability.html
         | 
         | Yes, you have to learn about all these things upfront. But once
         | you figure it out, test it, and configure it - it will work as
         | you expect. No surprises.
         | 
         | Whereas Relational Databases work until they don't. A developer
         | makes a tiny (even a no-op) change to a query or stored
         | procedure, a different SQL plan gets chosen, and suddenly your
         | performance/latency dramatically reduces, and you have no easy
         | way to roll it back through source control/deployment
         | pipelines. You have to page a DBA who has to go pull up the
         | hood.
         | 
         | With services like DDB, you maintain control.
        
         | redwood wrote:
         | Your example really summarizes the challenge with the AWS
         | paradigm: namely that they want you to believe that the thing
         | to do is to spread the the backend of your application across a
         | large number of distinct data systems. No one uses DynamoDB
         | alone: they bolt it onto Postgres after realizing they have
         | availability or scale needs beyond what a relational database
         | can do, then they bolt on Elasticsearch to enable querying, and
         | then they bolt on Redis to make the disjointed backend feel
         | fast. And I'm just talking operational use cases; ignoring
         | analytics here. Honestly it doesn't need to be these particular
         | technologies but this is the general phenomenon you see in so
         | many companies that adopt a relational database, key/value
         | store (could be Cassandra instead of DynamoDB eg like what
         | Netflix does), a search engine, and a caching layer because
         | they think that that's the only option
         | 
         | This inherently leads to a complexity debt explosion,
         | fragmentation in the experience, and an operationally brittle
         | posture that becomes very difficult to dig out of (this is
         | probably why AWS loves the paradigm).
        
           | ndm000 wrote:
           | > they bolt it onto Postgres
           | 
           | I am working with a company that is redesigning an enterprise
           | transactional system, currently backed by an Oracle database
           | with 3000 tables. It's B2B so loads are predictable and are
           | expected to grow no more than 10% per year.
           | 
           | They want to use DynamoDB as their primary data store, with
           | Postgres for edge cases it seems to me the opposite would be
           | more beneficial.
           | 
           | At what point does DynamoDB become a better choice than
           | Postgres? I know that at certain scales Postgres breaks down,
           | but what are those thresholds?
        
             | picardo wrote:
             | You can make Postgres scale, but there is an operational
             | cost to it. DynamoDB does that for you out of the box. (So
             | does Aurora, to be honest, but there is also an overhead to
             | setting up an Aurora cluster to the needs of your
             | business.)
             | 
             | I've found also that in Postgres the query performance does
             | not keep up with bursts of traffic -- you need to
             | overprovision your db servers to cope with the highest
             | traffic days. DynamoDB, in contrast, scales instantly.
             | (It's a bit more complicated that that, but the effect of
             | it is nearly instantaneous.) And what's really great about
             | DynamoDB is after the traffic levels go down, it does not
             | scale down your table and maintains it at the same capacity
             | at no additional cost to you, so if you receive a burst of
             | traffic at the same throughput, you can handle it even
             | faster.
             | 
             | DynamoDB does a lot of magic under the hood, as well. My
             | favorite is auto-sharding, i.e. it automatically moves your
             | hot keys around so the demand is evenly distributed across
             | your table.
             | 
             | So DynamoDB is pretty great. But to get the the best
             | experience from DynamoDB, you need to have a stable
             | codebase, and design your tables around your access
             | patterns. Because joining two tables isn't fun.
        
               | rmbyrro wrote:
               | Using +1 DynamoDB table is a bad idea in the first place.
        
               | eropple wrote:
               | _> So DynamoDB is pretty great. But to get the the best
               | experience from DynamoDB, you need to have a stable
               | codebase, and design your tables around your access
               | patterns. Because joining two tables isn 't fun._
               | 
               | More than just joining--you're in the unenviable place of
               | reinventing (in most environments, anyway) a _lot_ of
               | what are just online problems in the SQL universe. Stuff
               | you 'd do with a case statement in Postgres becomes some
               | on-the-worker shenanigans, stuff you'd do with a
               | materialized view in Postgres becomes a batch process
               | that itself has to be babysat and managed and introduces
               | new and exciting flavors of contention.
               | 
               | There are really good reasons to use DynamoDB out there,
               | but there are also an absolute ton of land mines. If your
               | data model isn't _trivial_ , DynamoDB's best use case is
               | in making faster subsets of your data model that you can
               | _make_ trivial.
        
             | vosper wrote:
             | They should be looking at Aurora, not Dynamo. Using Dynamo
             | as the primary store for relational data (3000 tables!)
             | sounds like an awful idea to me. I'd rather stay on Oracle.
             | 
             | https://aws.amazon.com/rds/aurora/?aurora-whats-new.sort-
             | by=...
        
               | rmbyrro wrote:
               | It really depends much more on the access patterns than
               | data shape.
               | 
               | Certain access patterns can do pretty well with 3,000
               | relational tables denormalized to a single DynamoDB
               | table.
        
           | sebastialonso wrote:
           | > they bolt it onto Postgres after realizing they have
           | availability or scale needs beyond what a relational database
           | can do, then they bolt on Elasticsearch to enable querying,
           | and then they bolt on Redis to make the disjointed backend
           | feel fast.
           | 
           | This made my head explode. Why would you explicitly join two
           | systems made to solve different issues together? This sounds
           | rather like a lack of architectural vision. Postgres's zero
           | access-design inherently clashes with DynamoDB's; same goes
           | with ElasticSearch scenario: DynamoDB's was not made to query
           | everything, it's made to query specifically what you designed
           | to be queried and nothing else. Redis sort-of make sense to
           | gain a bit of speed for some particular access, but you still
           | lack collection level querying with it.
           | 
           | In my experience, leave DynamoDB alone and it will work
           | great. Automatic scaling is cheaper eventually if you've done
           | your homework about knowing your traffic.
        
             | 300bps wrote:
             | _In my experience, leave DynamoDB alone and it will work
             | great._
             | 
             | My experience agrees with yours and I'm likewise puzzled by
             | the grandparent comment. But just a shout out to DAX
             | (DyanmoDB Accelerator) which makes it scale through the
             | roof:
             | 
             | https://aws.amazon.com/dynamodb/dax/
        
               | jamesblonde wrote:
               | If you add DAX you are not guaranteed to read your
               | writes. Terrible consistency model. https://docs.aws.amaz
               | on.com/amazondynamodb/latest/developerg...
        
               | 300bps wrote:
               | _Terrible consistency model._
               | 
               | Judging a consistency model as "terrible" implies that it
               | does not fit any use case and therefore is objectively
               | bad.
               | 
               | On the contrary, there are plenty of use cases where
               | "eventually consistent writes" is the perfect use case.
               | To judge this as true, you only have to look and see that
               | every major database server offers this as an option -
               | just one example:
               | 
               | https://www.compose.com/articles/postgresql-and-per-
               | connecti...
        
               | tmitchel2 wrote:
               | You choose your consistency on reads. However, Dax won't
               | help you much on a write heavy workload.
        
               | rmbyrro wrote:
               | I think main advantage of DDB is being serverless. Adding
               | a server-based layer on top of it doesn't make sense to
               | me.
               | 
               | I have a theory it would be better to have multiple
               | table-replicas for read access. At application level, you
               | randomize access to those tables according to your read
               | scale needs.
               | 
               | Use main table streams and lambda to keep replicas in
               | sync.
               | 
               | Depending on your traffic, this might end more expensive
               | than DAX, but you remain fully serverless, using the
               | exact same technology model, and have control over the
               | consistency model.
               | 
               | Haven't had the chance to test this in practice, though.
        
               | SomeCallMeTim wrote:
               | In my experience, NoSQL is almost never the right answer.
               | 
               | And DynamoDB is worse than most.
               | 
               | My prediction is that the future is in scalable SQL;
               | CockroachDB or Yugabase or similar.
               | 
               | NoSQL actually causes more problems than it solves, in my
               | experience.
        
           | tokamak-teapot wrote:
           | We use DynamoDB alone. Microservices generally use one or two
           | tables each.
        
           | awsthro00945 wrote:
           | >No one uses DynamoDB alone
           | 
           | Almost every single team at Amazon that I can think of off
           | the top of my head uses DynamoDB (or DDB + S3) as its sole
           | data store. I know that there _are_ teams out there using
           | relational DBs as well (especially in analytics), but in my
           | day-to-day working with a constantly changing variety of
           | teams that run customer-facing apps, I haven 't seen
           | RDS/Redis/etc being used in months.
        
             | goostavos wrote:
             | The thing about Amazon is that it is _massive_. In my neck
             | of the woods, I 've got the complete opposite experience.
             | So many teams have the exact DDB induced infrastructure
             | sprawl as described by the GP (e.g. supplemental RDBMS,
             | Elastic, caching layers, etc..).
             | 
             | Which says nothing of DDB. It's an god-tier tool if what
             | you need matches what it's selling. However, I see too many
             | teams reach for it by default without doing any actual
             | analysis (including young me!), thus leading to the "oh
             | shit, how will we...?" soup of ad-hoc supporting infra. Big
             | machines look great on the promo-doc tho. So, I don't
             | expect it to stop.
        
               | [deleted]
        
           | jerf wrote:
           | It seems to me that what this is saying is that storage has
           | become so cheap that if another database provides even slight
           | advantages over another for some workload it is likely to be
           | deployed and have all the data copied over to it.
           | 
           | HN entrepreneurs take note, this also suggests to me that
           | there _may_ be a market for a database (or a  "metadatabase")
           | that takes care of this for you. I'd love to be able to have
           | a "relational database" that is also some "NoSQL" databases
           | (since there's a few major useful paradigms there) that just
           | takes care of this for me. I imagine I'd have to declare my
           | schemas, but I'd love it if that's all I had to do and then
           | the DB handled keeping sync and such. Bonus points if you can
           | give me cross-paradigm transactionality, especially in terms
           | of coherent insert sets (so "today's load of data" appears in
           | one lump instantly from clients point of view and they don't
           | see the load in progress).
           | 
           | At least at first, this wouldn't have to be best-of-breed
           | necessarily at anything. I'd need good SQL joining support,
           | but I think I wouldn't need every last feature Postgres has
           | ever had out of the box.
           | 
           | If such a product exists, I'm all ears. Though I am thinking
           | of this as a unified database, not a collection of databases
           | and products that merely manages data migrations and such.
           | I'm looking to run "CREATE CASSANDRA-LIKE VIEW gotta_go_fast
           | ON SELECT a.x, a.y, b.z FROM ...", maybe it takes some time
           | of course but that's all I really have to do to keep things
           | in sync. (Barring resource overconsumption.)
        
             | jgraettinger1 wrote:
             | > I'd love to be able to have a "relational database" that
             | is also some "NoSQL" databases (since there's a few major
             | useful paradigms there) that just takes care of this for
             | me. I imagine I'd have to declare my schemas, but I'd love
             | it if that's all I had to do and then the DB handled
             | keeping sync and such.
             | 
             | You might be interested in what we're building [0]
             | 
             | It synchronizes your data systems so that, for example, you
             | can CDC tables from your Postgres DB, transform them in
             | interesting ways, and then materialize the result in a view
             | within Elastic or DynamoDB that updates continuously and
             | with millisecond latency.
             | 
             | It will even propagate your sourced SQL schemas into JSON
             | schemas, and from there to, say, equivalent Elastic Search
             | schema.
             | 
             | [0]: https://github.com/estuary/flow
        
             | andy_ppp wrote:
             | Postgres with Cassandra built in and scaled separately
             | would be really great.
        
             | grncdr wrote:
             | I think there was a project like this a few years ago
             | (wrapping a relational DB + ElasticSearch into one box) and
             | I _thought_ it was CrateDB, but from looking at their
             | current website I think I 'm misremembering.
             | 
             | The concept didn't appeal to me very much then, so I never
             | looked into it further.
             | 
             | ---
             | 
             | To address your larger point, I think Postgres has a better
             | chance of absorbing other datastores (via FDW and/or custom
             | index types) and updating them in sync with it's own
             | transactions (as far as those databases support some sort
             | of atomic swap operation) than a new contender has of
             | getting near Postgres' level of reliability and feature
             | richness.
        
               | neuronexmachina wrote:
               | Were you thinking of ZomboDB?
               | https://github.com/zombodb/zombodb
        
             | rmbyrro wrote:
             | I'm afraid it's not feasible to develop a single general
             | purpose implementation for that.
             | 
             | The amount of complexity to guarantee data integrity while
             | covering all possible use cases will be just unmanageable.
             | 
             | I'd be extremely happy to be proven wrong, though...
        
             | mwarkentin wrote:
             | AWS tried building this with Glue Elastic Views:
             | https://aws.amazon.com/glue/features/elastic-views/
             | 
             | It's been in preview forever though, not sure when it's
             | going to officially launch.
        
           | nine_k wrote:
           | If this is not the only option, what would you suggest
           | instead? How to simplify it?
        
             | urthor wrote:
             | The alternative is to go to GCP and use the big GCP selling
             | point, which is Big Table/Big Query.
             | 
             | Those databases build most of that in, and it's all one
             | fairly excellent distributed monolith.
        
               | onlyrealcuzzo wrote:
               | Wouldn't Spanner be closer to what you're talking about?
        
               | Keyframe wrote:
               | It's still a marriage.
        
         | whalesalad wrote:
         | Dynamo is incredibly hard to use _correctly_
         | 
         | I'd urge you to start writing a prototype, a lot of your
         | assumptions might get thrown out the window. Dynamo is not
         | necessarily good for reading high volume. You'll end up needing
         | to use a parallel scan approach which is not fast.
        
           | qvrjuec wrote:
           | I'd say Dynamo is extremely good at reading high volume, with
           | the appropriate access pattern. It's very efficient at
           | retrieving huge amounts of _well partitioned_ data using the
           | data 's keys, but scanning isn't so efficient.
        
           | mythrwy wrote:
           | Also can be _very_ expensive if you do not use it correctly.
        
         | qaq wrote:
         | Save yourself a ton of pain and don't use DynamoDB
        
         | jedberg wrote:
         | > Is the dream of a self-managing, fire-and-forget key value
         | database completely naive?
         | 
         | It's not, if you plan it right. Learn about single table design
         | for DynamoDB before you start. There are a lot of good
         | resources from Amazon and the community.
         | 
         | Here is a very accessible video from the community:
         | 
         | https://www.youtube.com/watch?v=BnDKD_Zv0og
         | 
         | Here is a video from Rick Houlihan, a senior leader from AWS
         | who basically helps companies convert to single table design:
         | 
         | https://www.youtube.com/watch?v=KYy8X8t4MB8
         | 
         | And a good book on the topic:
         | 
         | https://www.dynamodbbook.com
         | 
         | If you use single table design, you can turn on all of the
         | auto-tuning features of DynamoDB and they will work as expected
         | and get better and more efficient with more data.
         | 
         | Some people worry that this breaks the cardinal rule of
         | microservices: One database per service. But the actual rule is
         | never have one service directly access the data of another,
         | always use the API. So as long as your services use different
         | keyspaces and never access each other's data, it can still work
         | (but does require extra discipline).
        
         | phamilton wrote:
         | I don't know your scaling needs, but I would highly recommend
         | just using Aurora postgresql for read-only workloads. We have
         | some workloads that are essentially K/V store lookups that were
         | previously slated for dynamodb. On an Aurora cluster of
         | 3*r6g.xlarge we easily handle 25k qps with p99 in the single-
         | digit ms range. Aurora can scale up to 15 instances and up to
         | 24xlarge, so it would not be unreasonable to see 100x the read
         | workload with similar latencies.
         | 
         | Happy to talk more. We're actively moving a bunch of workloads
         | away from DynamoDB and to Aurora so this is fresh on our minds.
        
           | afandian wrote:
           | Thanks, that's what I _hope_ will work. I might drop you a
           | mail at some point.
        
         | ignoramous wrote:
         | > _We 're at early stages of planning an architecture where we
         | offload pre-rendered JSON views of PostgreSQL onto a key value
         | store optimised for read only high volume._
         | 
         | If possible, put the json in Workers KV, and access it through
         | Cloudflare Workers. You can also optionally cache reads from
         | Workers KV into Cloudflare's zonal caches.
         | 
         | > _To be honest, I 'd hoped that it could be a bit more
         | 'magic', like S3_
         | 
         | You could opt to use the slightly more expensive DynamoDB On-
         | Demand, or the free DynamoDB Auto-Scaling modes, which are
         | relatively no-config. For a _very_ ready-heavy workload, you 'd
         | probably want to add DynamoDB Accelerator (an write-through in-
         | memory cache) in front of your tables. Or, use S3 itself (but a
         | S3 bucket doesn't really like when you load it with a _tonne_
         | of small files) accelerated by CloudFront (which is what AWS
         | Hyperplane, tech underpinning ALB and NLB, does:
         | https://aws.amazon.com/builders-library/reliability-and-
         | cons...)
         | 
         | S3, much like DynamoDB, is a KV store:
         | https://news.ycombinator.com/item?id=11161667 and
         | https://www.allthingsdistributed.com/2009/03/keeping_your_da...
        
         | _pdp_ wrote:
         | DynamoDB is like S3 but with query features. It is not a
         | relational db. It is a document storage. So you need to use it
         | for what it is.
         | 
         | Our entire solution is basically based on top of lambda and
         | dynamodb tables and it works really as long as you don't threat
         | the tables like SQL.
        
         | zurn wrote:
         | Your impressions are cordect: DynamoDB is quite low-level and
         | more like a DB kit than ready to use DB, for most applications
         | it's better to use something else.
        
         | giaour wrote:
         | If you use the "pay per request" billing model instead of
         | provisioned throughput, DynamoDB scaling is self-managing, and
         | you can treat your DB as a fire-and-forget key/value store. You
         | need to plan how you'll query your data and structure the keys
         | accordingly, but honestly, that applies even more to S3 than it
         | does to Dynamo.
        
         | pbalau wrote:
         | Thinking like this both baffles me, but also makes me happy
         | because there will always be a need for people like me, infra.
         | AWS is not a magical tool that will replace your infra team, it
         | is a magical tool that will allow your infra team to do more. I
         | am the infra team of my startup and I estimate that only 50% of
         | my time is doing infra work. The rest is supporting my peers,
         | work in frameworky stuff, solve dev efficiency issues bla bla.
         | 
         | Lets say that you operate in an AWS-less environment, with
         | everything bare metal, in a datacenter. Your GOOD infra team
         | has to do the following:
         | 
         | Hardware:
         | 
         | - make sure there is a channel to get new hardware, both for
         | capacity increase and spares. What are you going to do? Buy 1
         | server and 2 spares? If one of the servers has an issue, isn't
         | it quite likely that the other servers, from the same batch, to
         | have the same issue? Is this affecting you, or not? Where do
         | you store the spares? In a warehouse somewhere, making it
         | harder to deploy? In the rack with the one in use, wasting
         | rackspace/switch space? Are you going to rely on the datacenter
         | to provide you with the hardware? What if you are one of their
         | smaller customers and your requests get pushed back because
         | some larger customer requests get higher priority?
         | 
         | - make sure there is a way to deploy said hardware. You don't
         | want to not be able to deploy a new server because there is no
         | space in the rack, or no space in the switch. Where are your
         | spares? In a warehouse miles away from the datacenter? Do you
         | have access to said warehouse at midnight, on Thanksgiving? Oh
         | shit, someone lost the key to your rack! Oh noes, we don't have
         | any spare network cable/connectors/screws...
         | 
         | Software:
         | 
         | - did you patch your servers? did you patch your switches?
         | 
         | - new server, we need to install the os. And a base set of
         | software, including the agent we use to remote manage the
         | server.
         | 
         | - oh, we also need to run and maintain the management infra,
         | say the control plane for k8.
         | 
         | - oh, we want some read replicas for this db, not only we need
         | the hardware to run the replicas on (and see above for what
         | that means), now you need to add a bunch of monitoring and have
         | plans in place to handle things like: replicas lagging, network
         | links between master and replicas being full, failover for the
         | above, master crapping out yada yada.
         | 
         | I bet there are many other aspects I'm missing.
         | 
         | Choices:
         | 
         | Your GOOD infra team will have to decide things like: how many
         | spares do we need, is the capacity we have atm enough for the
         | launch of our next world-changing feature that half the
         | internet wants to use? Are we lucky enough to survive a few
         | months without spares or should we get estra capacity in
         | another datacenter? Do we want to have replicas on the west
         | coast or is the latency acceptable?
         | 
         | These are the main areas of what an infra team is supposed to
         | do: Hardware, Software and Choices. AWS (and most other cloud
         | providers) is making the first 2 points non issues. For the
         | last area you can do 2 things: get an infra team (could be a
         | full fledged team, could be 1 person, you could do it) and
         | teoretically you will get choices tailored to what your
         | business needs OR let AWS do it for you. *AWS might make these
         | choices based on a metric you disagree with and this is the
         | main reason people complain*.
        
         | AtlasBarfed wrote:
         | The salespeople always promise magic and handwave CAP away.
         | 
         | But data at scale is about:
         | 
         | 1) knowing your queries ahead of time (since you've presumably
         | reached the limit of PG/maybesql/o-rackle.
         | 
         | 2) dealing with CAP at the application level: distributed
         | transactions, eventual consistency, network partitions.
         | 
         | 3) dealing with a lot more operational complexity, not less.
         | 
         | So if the snake oil salesmen say it will be seamless, they are
         | very very very much lying. Either that, or you are paying a LOT
         | of money for other people to do the hard work.
         | 
         | Which is what happens with managing your own NoSQL vs DynamoDB.
         | You'll pay through the roof for DynamoDB at true big data
         | scales.
        
         | twodayrice wrote:
         | I think this is a good summary, and it even gets more
         | complicated if you start using the DAX cache. Your read/write
         | provisioning for DAX is totally different than the underlying
         | dynamodb tables. The write throughput for Dax is limited by the
         | size of the master node in the cluster. Can you say bottleneck?
        
         | augustl wrote:
         | I have no direct experience with scaling DynamoDB in
         | production, so take this with a grain of salt. But it seems to
         | me that the on-demand scaling mode in DynamoDB has gotten
         | _really_ good the last couple of years.
         | 
         | For example, you used to have to manually set RCU/WCU to a high
         | number when you expected a spike in traffic, since the ramp-up
         | for on-demand scaling was pretty slow (could take up to 30
         | minutes). But these days, on-demand can handle spikes from 10s
         | of requests a minute to 100s/1000s per second gracefully.
         | 
         | The downside of on-demand is the pricing - it's more expensive
         | if you have continuous load. But it can easily become _much_
         | cheaper if you have naturally spiky load patterns.
         | 
         | Example: https://aws.amazon.com/blogs/database/running-spiky-
         | workload...
        
           | moduspol wrote:
           | > The downside of on-demand is the pricing - it's more
           | expensive if you have continuous load.
           | 
           | True, although you don't have to make that choice
           | permanently. You can switch from provisioned to on demand
           | once every 24 hours.
           | 
           | And you can also set up application autoscaling in
           | provisioned mode, which'll allow you to set parameters under
           | which it'll scale your provisioned capacity up or down for
           | you. This doesn't require any code and works pretty well if
           | you can accept autoscaling adjustments being made in the
           | timeframe of a minute or two.
        
             | PaywallBuster wrote:
             | scaling down is limited to 4x a day
        
               | chrisoverzero wrote:
               | It's up to 27 times a day, if you time it well: "4
               | decreases in the first hour, and 1 decrease for each of
               | the subsequent 1-hour windows in a day".
        
               | PaywallBuster wrote:
               | gotcha, it's been awhile since I was looking at that
        
               | tmitchel2 wrote:
               | They upped it when they're own autoscaler needed the
               | ability to back it down more :-/
        
           | PaywallBuster wrote:
           | Indeed
           | 
           | We've some regular jobs that require scaling up dynamodb in
           | advance few times per day, but then dynamo is only able to
           | scale down 4x per day, so we're probably paying for over
           | capacity unnecessarily (10x or more) for a couple hours a day
           | 
           | Now we just moved ondemand and let them handle it, works fine
        
         | jpgvm wrote:
         | It is for now but it doesn't have to be. Dynamo's design isn't
         | particularly amenable to dynamic and heterogenous shard
         | topologies however.
         | 
         | There could exist a fantasy database where you still tell it
         | your hash and range keys, which are roughly how you tell the
         | database which data isn't closely related to each other and
         | which data is (and which you may want to scan) but instead of
         | hard provisioning shard capacity it automagically splits shards
         | when they hotspot and doesn't rely consistent hashing so that
         | every shard can be sized differently depending on how hot it
         | is.
         | 
         | Right now such a database doesn't exist AFAICT as most places
         | that need something the scales big enough also generally have
         | the skill to avoid most of the pitfalls that cause problems on
         | simple databases like Dynamo.
        
       | [deleted]
        
       | jugg1es wrote:
       | Every system I've built on DynamoDB just works. The APIs that use
       | it have had virtually 100% uptime and have never needed database
       | maintenance. It is not a replacement for a RDBMS, but for some
       | use cases, it's a killer service.
        
       | didip wrote:
       | To be honest, as a customer, it is hard for me to justify using
       | DynamoDB. Some of this criticism can be out of date:
       | 
       | 1. DynamoDB is not as convenient. There are a bit too many dials
       | to turn.
       | 
       | 2. DynamoDB does not have a SQL facade on top.
       | 
       | 3. DynamoDB is proprietary, I believe there's no OSS API
       | equivalent if you want to migrate out.
       | 
       | 4. DynamoDB was kind of expensive. But it has been a while since
       | I last check the pricing page.
       | 
       | It's simply much better to start with PostgreSQL Aurora and move
       | to a more scalable storage based on specific uses-cases later.
       | For example: Cassandra, Elastic, Druid, or CockroachDB.
        
         | Zimnx wrote:
         | Ad 3. Lot of people don't know about it, but there's a open
         | source, free, and DynamoDB compatibile databse called ScyllaDB
         | - API it's called Alternator to be specific.
        
         | nathanfig wrote:
         | That's surprising to me, I consider DynamoDB to be far simpler
         | than any relational DB including the alternatives you list
        
           | [deleted]
        
         | arecurrence wrote:
         | I strongly agree that most early stage businesses should be on
         | Postgres. There's simply too much churn in early stage data
         | models. Also, unforeseen esoteric needs jumping out of the wood
         | work that you can knock out a SQL query for instead of having
         | to build a solution come up constantly. However, this does
         | assume that your development team has a competent understanding
         | of SQL.
         | 
         | I've been in a couple startups that went Dynamo first and
         | development velocity was a pale shadow of velocity with
         | Postgres. When one of those startups dumped dynamo for Postgres
         | velocity multiplied immediately. I'd estimate we were moving at
         | around 1000% and the complete transition took less time than
         | even I expected (about a month). Once the business matures,
         | moving tables onto dynamo and wrapping them in a microservice
         | makes a lot of sense. Dynamo does solve a lot of problems that
         | become increasingly material as the business evolves.
         | 
         | Eventually, SQL's presence declines and transitions into an
         | analytics system as narrower, but easier for ops, options
         | proliferate.
        
       | sakopov wrote:
       | We landed on DynamoDB when we migrated a monolith to microservice
       | architecture. I have to say that DynamoDB fits fairly well in the
       | microservices world where the service scope is small, query
       | patterns are pretty narrow and don't really change much. Building
       | new things using DynamoDB when query patterns aren't necessarily
       | known is very painful and require tedious migration strategies
       | unless you don't mind paying for GSIs.
        
       | ddoolin wrote:
       | As a developer, I really have a love-hate relationship with
       | Dynamo. I love how fast and easy it is to setup and get rolling.
       | 
       | The partitioning scheme came off as confusing and opaque but I
       | think that says more about Amazon's documentation than the scheme
       | itself.
       | 
       | I do not like that there's no really third party tooling
       | integration to be able to query. Their UI in the console is _so
       | freaking terrible_ yet you have no other way than code to query
       | it. This problem is so bad that I will avoid using it where I can
       | despite it being a good option, performance-wise.
        
         | ejb999 wrote:
         | My experience exactly.
        
         | [deleted]
        
         | rmbyrro wrote:
         | Don't think this is a fair reason to avoid DynamoDB. There are
         | reasonably better alternatives:
         | 
         | - DynamoDB Workbench (free, AWS official):
         | https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
         | 
         | - Dynobase (paid, third-party): https://dynobase.dev/
        
           | belter wrote:
           | Plus there is the local install ( Only for Dev and Testing
           | purposes...)
           | 
           | "Deploying DynamoDB Locally on Your Computer" https://docs.aw
           | s.amazon.com/amazondynamodb/latest/developerg...
        
         | wnolens wrote:
         | Yes the console UI is mind numbing, and was just made 10x worse
         | in a recent redesign.
         | 
         | But i like the python boto3 library
         | https://boto3.amazonaws.com/v1/documentation/api/latest/guid...
         | 
         | Build yourself a few wrappers to make querying more convenient,
         | and i query straight from a python repl pretty effectively
        
       | anonymousDan wrote:
       | Can anyone recommend me a good paper (or other resource)
       | describing the designs of DynamoDB and S3? Ideally something in
       | the spirt of the original Dynamo paper (i.e.
       | https://www.allthingsdistributed.com/files/amazon-dynamo-sos... )
        
         | schwarzmx wrote:
         | This one is pretty good for DynamoDB:
         | https://youtu.be/yvBR71D0nAQ
        
       ___________________________________________________________________
       (page generated 2022-01-20 23:01 UTC)