[HN Gopher] The DynamoDB Paper
       ___________________________________________________________________
        
       The DynamoDB Paper
        
       Author : krnaveen14
       Score  : 223 points
       Date   : 2022-07-14 10:09 UTC (12 hours ago)
        
 (HTM) web link (brooker.co.za)
 (TXT) w3m dump (brooker.co.za)
        
       | Patrol8394 wrote:
       | These days I'd probable take a closer look at spanner. It is a
       | consistent and scalable db. It makes life much easier for
       | developers.
       | 
       | Like Cassandra, dynamodb requires the data model to be designed
       | very carefully to be able to get the max out of them.
       | 
       | More often than not, that simply adds more complexity; people
       | often underestimate how much a sharded mysql/Postgres can scale.
       | 
       | My default choice for the longest time: Postgres for the data I
       | care about, ES as secondary index and S3 as blob storage.
        
         | deepGem wrote:
         | True. Spanner and the likes of Spanner, CockroachDB, YugaByte
         | all are strongly consistent and scalable dbs. The greatest
         | advantage IMO is the ability to just use SQL without having to
         | worry about carefully designing a data model. What bothers me
         | however is that these data stores are not truly relational data
         | stores. They spin a relational layer on top of a scalable key-
         | value data store.
         | 
         | Is it necessary to use a strongly consistent transactional data
         | store if your needs don't demand transactions, by transactions
         | I mean 2PC. IMO you are still better off with
         | DynamoDB/Cosmos/MongoDB for eventual consistency use cases. The
         | reason being, you have to resort to a data model if you don't
         | need the relational layer in YugaByte at least, not sure about
         | Spanner. So why bother with Yugabyte if am resorting to a data
         | model. Might as well stick with DynamoDB.
        
           | jd_mongodb wrote:
           | You seem to think MongoDB is eventually consistent. MongoDB
           | is designed as strongly consistent database. You can choose
           | to query a secondary and that will be eventually consistent
           | but that is not the default behaviour.
        
           | manigandham wrote:
           | What part of SQL requires not having to design a data model?
           | What exactly do you mean by that?
           | 
           | And technically all relational databases are relational
           | layers on top of a key/value subsystem. Splitting that apart
           | and scaling the storage is how most of the NewSQL databases
           | scale , from CRDB to Yugabyte to Neon.
        
           | dboreham wrote:
           | > What bothers me however is that these data stores are not
           | truly relational data stores
           | 
           | Suggests there may be an impossibility theory lurking
           | somewhere.
        
           | arminiusreturns wrote:
           | I think VoltDb (and SciDB) are worth checking out also. I'm
           | seeing some very impressive ACID compliant TPS with elixir
           | connected to voltdb. I don't like having to pay to get
           | distributed features however (open source community edition
           | is feature gimped compared to payed.)
        
         | _benedict wrote:
         | Global strict serializability is coming to Cassandra very soon
         | [1]
         | 
         | [1]
         | https://cwiki.apache.org/confluence/download/attachments/188...
        
       | ctvo wrote:
       | I've found DDB to be exceptional for use cases where eventual
       | consistency is OK and you have a few well defined query patterns.
       | This is a large number of use cases so it's not too limiting. As
       | the number of query patterns grow, indices grow, and costs grow
       | (or pray for your soul you attempt to use DDB transactions to
       | write multiple keys to support differing query patterns). If you
       | need strong consistency, your cost and latency also increases.
       | 
       | Oh, and I'd avoid DAX. Write your own cache layer. The query
       | cache vs. item cache separation[1] in DAX is a giant footgun.
       | It's also very under supported. There still isn't a DAX client
       | for AWS SDK v2 in Go for example[2].
       | 
       | 1 -
       | https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
       | 
       | 2 - https://github.com/aws/aws-dax-go/issues/2
        
       | dboreham wrote:
       | I'd like to learn more about their MemDS. Afaik nothing has been
       | made public.
        
       | nathas wrote:
       | Nice write-up from Marc. This definitely hits on the most common
       | problems distributed systems face. I haven't read the paper yet
       | but it _is_ pretty cool they published this and talk about
       | changes over time.
       | 
       | 1. Managing 'heat' in the system (or assuming that you'll have an
       | uniform distribution of requests)
       | 
       | 2. Recovering a distributed system from a cold state and what
       | that implies for your caches.
       | 
       | 3. The obvious one that people that do this type of thing spend a
       | lot of time thinking about: CAP theorem shenanigans and using
       | Paxos.
       | 
       | Reminds me of the Grugbrained developer on microservices:
       | https://grugbrain.dev/#grug-on-microservices
       | 
       | Good luck getting every piece working on the first major
       | recovery. My 100% unscientific hunch is that most folks aren't
       | testing their cold state recovery from a big failure, much how
       | folks don't test their database restoration solutions (or
       | historically haven't).
        
       | ruoranwang wrote:
       | I wonder how's Cassandra doing? I heard companies are migrating
       | away from it.
        
       | sudhirj wrote:
       | The way that I learnt the ins and outs of DynamoDB (and there is
       | a lot to learn if you want to use it effectively) is by
       | implementing all the Redis data structures and commands on it.
       | That helped understand both systems in one shot.
       | 
       | The key concept in Dynamo is that you use a partition key on all
       | your bits of data (my mental model is that you get one server per
       | partition) and you then can arrange data using a sort key in that
       | partition. You can then range/inequality query over the sort
       | keys. That's the gist of it.
       | 
       | The power and scalability comes from the fact that each partition
       | can be individually allocated and scaled, so as long as you
       | spread over partitions you have practically no limits.
       | 
       | And you can do quite a bit with that sort key range/inequality
       | thing. I was pleasantly surprised by how much of Redis I could
       | implement: https://github.com/dbProjectRED/redimo.go
        
       | jerryjerryjerry wrote:
       | Good job! But I'm wondering when Amazon can start to contribute
       | to open source world...
        
         | manigandham wrote:
         | They already do: https://aws.amazon.com/opensource/
        
       | no_wizard wrote:
       | How well does DyanmoDB scale when paired with AppSync and
       | GraphQL? The selling point here being you can use GQL as your
       | schema for the DB too and get automatic APIs for free
        
         | ledauphin wrote:
         | i've done this. it works really, really well to start off with
         | - your API basically is your schema, and you're done.
         | 
         | There's definitely more work later on when your API and data
         | model start diverging (which they always will). Overall it was
         | a decent experience, and DynamoDB has made some really
         | important QOL improvements over the last 5 years, too.
         | 
         | It's still not relational, which means it's very different and
         | you'll be committed to a totally different way of thinking
         | about things for a while.
        
         | dboreham wrote:
         | Just fine?
        
           | no_wizard wrote:
           | I should have made it clear: I was hoping to get some folks
           | to talk to their experience using it this way. I haven't find
           | alot in terms of real world evaluation of it.
           | 
           | It can also use Aurora Serverless V2, and I am curious about
           | that as well, FWIW
        
       | ledauphin wrote:
       | An underrated part of DynamoDB are its streams. You can subscribe
       | to changes and reliably process those in a distributed way. If
       | you're comfortable with the terms "at-least once delivery" and
       | "eventual consistency", you can build some truly amazing systems
       | by letting events propagate reactively through your system, never
       | touching a data store or messaging broker other than DynamoDB
       | itself.
       | 
       | It's not for everyone, but when you get a team up and running
       | with it, it can be shockingly powerful.
        
         | mikhmha wrote:
         | Yeah we make use of streams at my work. Really useful. You can
         | hook up streams to a Lambda and have it process events and flow
         | them downstream to a Data Lake or Data Warehouse for analytic
         | workloads. What works really well is pushing data to an S3
         | bucket with object versioning and replication enabled.
         | 
         | I think dynamoDB streams and kinesis streams work similar under
         | the hood? But dynamoDB streams are way cheaper, pricing is on-
         | demand compared to hourly for Kinesis.
        
       | ignoramous wrote:
       | > From the paper [0]: _DynamoDB consists of tens of
       | microservices._
       | 
       | Ha! For folks who think two-pizza teams mean 100s of
       | microservices... this is probably the second most scaled-out
       | storage service at AWS (behind S3?), and it runs _tens_ of
       | microservices (pretty sure these aren 't _micro_ the way most
       | folks would presume  'em to be).
       | 
       | > _What 's exciting for me about this paper is that it covers
       | DynamoDB's journey..._
       | 
       | Assuming these comments are true [1][2], in a classic Amazon
       | fashion [3], the paper fails to acknowledge a FOSS database
       | (once?) underneath it: MySQL/InnoDB (and references it as B-Tree
       | instead).
       | 
       | [0]
       | https://web.archive.org/web/20220712155558/https://www.useni...
       | 
       | [1] https://news.ycombinator.com/item?id=13173927
       | 
       | [2] https://news.ycombinator.com/item?id=18871854
       | 
       | [3] https://archive.is/T1ZNJ
        
         | deanCommie wrote:
         | Lots can change over the years. Your links are from 2016 - it's
         | not conceivable that in the last 6 years, Amazon has changed
         | some of the implementation?
        
         | hintymad wrote:
         | I'm not sure about DDB, but I know in AWS in general building a
         | new service does not give you credit by default. It's not like
         | the shit Uber promoted: Yeh! We have 8000 services. Look how
         | great we are! In fact, people usually question if someone
         | proposes to create a new service. Working Backwards (i.e.,
         | solving real user problems) and Invent and Simplify are indeed
         | two powerful leadership principles. And of course, the sheer
         | amount of work involved in setting up a new service is so much
         | that people have to think twice between starting a new service.
        
       | bistablesulphur wrote:
       | I'be been working with DynamoDB daily for a few years now, and
       | whilst I like working with it and the specific scenario it solves
       | for us, I'd still urge anyone thinking about using it to
       | carefully reconsider whether their problem is truly unique enough
       | that a traditional RDBMS couldn't handle it with some tuning.
       | Theycan be unbelievably performant and give so much stuff for
       | free.
       | 
       | Designing application specifically for DynamoDB will take _a lot_
       | of time and effort. I think we could have saved almost a third of
       | our entire development time had we used more of the boring stuff.
        
         | josevalerio wrote:
         | +1
         | 
         | Discovered this while building
         | https://github.com/plutomi/plutomi as I was enamored by Rick's
         | talks and guarantees of `performance at any scale`. In reality,
         | Dynamo was solving scaling issues that we didn't have and the
         | amount of times I've had to rework something to get around some
         | of the quirks of Dynamo led to a lot of lost dev time.
         | 
         | Now that the project is getting more complex, doing simple
         | things such as "searching" (for our use case) are virtually
         | impossible without hosting an ElasticSearch cluster where a
         | simple like %email% in postgres would have sufficed.
         | 
         | Not saying it's a bad DB at all, but you really need to know
         | your access patterns and plan accordingly. Dynamo streams are a
         | godsend and combined with EventBridge you can do some powerful
         | things for asynchronous events. Not paying while it's not
         | running with on demand is awesome, and the performance is truly
         | off the charts. Just please know what you are getting into. In
         | fact, I'd recommend only using Dynamo if you are migrating a
         | "finished" app vs using it for apps that are still evolving
        
         | solatic wrote:
         | > give so much stuff for free
         | 
         | Interesting choice of words. Performance wise, sure. Money
         | wise? I'm still waiting for a SQL database with pay-per-request
         | pricing. The cost difference is enormous, particularly when you
         | remember that you don't need to spend manpower managing the
         | underlying hardware.
         | 
         | Engineering tradeoffs are more complicated than only
         | considering raw scalability performance and "I can run it
         | myself on a cheap Raspberry Pi".
        
           | david38 wrote:
           | Who manages hardware these days? Aurora works quite well.
        
           | 0xthrowaway wrote:
           | >Interesting choice of words. Performance wise, sure. Money
           | wise? I'm still waiting for a SQL database with pay-per-
           | request pricing. The cost difference is enormous,
           | particularly when you remember that you don't need to spend
           | manpower managing the underlying hardware.
           | 
           | I assume you're saying DynamoDB is _less_ expensive than SQL
           | because of pay-per-request.
           | 
           | Working on applications with a modest amount of data (a few
           | TB over a few years) pay per request has been incredibly
           | expensive even with scaled provisioning. I would much rather
           | have an SQL database and pay for the server/s. Then I could
           | afford a few more developers!
        
         | alexnewman wrote:
         | totally, or s3
        
         | cnlwsu wrote:
         | To be fair, you can end up spending a lot of time on the boring
         | stuff as well.
        
         | blantonl wrote:
         | _Designing application specifically for DynamoDB will take _a
         | lot_ of time and effort._
         | 
         | If you can write, read, and query a JSON document using an API
         | in your application, it's literally that simple.
         | 
         | The only real time and effort is the architectural decisions
         | you make up front, and that's about it. And there are some
         | great guides out there that cover 99% of those architectural
         | decisions.
         | 
         | As a user of both, I find MySQL replication and clusters to be
         | far more complex and time and effort intensive.
        
           | lumost wrote:
           | It's a question of change resilience. You _can_ implement
           | crud on a single object with ddb trivially. You can't
           | implement 5 different list by X property apis trivially, or
           | filter the objects, or deal with foreign keys...
        
           | i_love_limes wrote:
           | Have to disagree on this one. Something as basic and out of
           | the box as a migration / data backfill is not only
           | complicated but also very expensive (both time and cost wise)
           | on Dynamo. Not to mention all the other things that come
           | nicely with an relational db (type checking, auto increments,
           | uniform data)
        
             | blantonl wrote:
             | To be fair, the parent discusses designing an application
             | to use Dynamo, not data migration.
             | 
             | I'll completely agree with you on migration / backfill.
             | You're going to pay a lot of money to migrate a ton of data
             | into Dynamo, and you'll also definitely increase the
             | complexity in provisioning and setting up that migration
             | pattern.
             | 
             | But my comment stands pretty well considering greefield
             | application development around Dynamo.
        
           | kdazzle wrote:
           | > The only real time and effort is the architectural
           | decisions you make up front, and that's about it
           | 
           | And dont forget about the time spent fixing what could have
           | been caught by types and regular old db constraints (for most
           | applications)
        
           | nkozyra wrote:
           | > If you can write, read, and query a JSON document using an
           | API in your application, it's literally that simple
           | 
           | You could say that of Elasticsearch or Mongo, too. And it
           | might be _technically_ true, but you haven 't scratched the
           | surface of mappings, design, limitations, etc.
           | 
           | You can dump a bunch of data into Dynamo very easily, but
           | what about getting data via secondary indices when you can't
           | get your data with the views you've built without scanning?
           | How do you use partition keys in it? And so on.
        
         | codetiger wrote:
         | Is there a specific reason why you say "Designing application
         | specifically for DynamoDB will take _a lot_ of time and
         | effort". Are you talking about migrating from RDBMS to
         | DynamoDB? Coz, my experience with DynamoDB designing was very
         | similar to any other NoSQL DB.
        
           | taspeotis wrote:
           | I mean if you go the single table route...
           | https://aws.amazon.com/blogs/compute/creating-a-single-
           | table...
        
           | bistablesulphur wrote:
           | A lot is transferrable to NoSQL and key-value in general,
           | though DDB has plenty of quirks of it's own. Understanding
           | your problem really is the key. A lot of problems turn out to
           | be quite relational after all
           | 
           | You definitely can build just about anything with DDB, it's
           | often just not worth the time when most can be solved by
           | existing tools
        
           | cebert wrote:
           | You really need to consider you access patterns up front with
           | DynamoDB. Any changes of those during application development
           | can be very time consuming. There are limitations on how many
           | local and global secondary indexes you can have. You also
           | can't easily add them to existing tables. However, you can
           | use multiple databases to get the best of both worlds. At my
           | employer, we typically store domain entities in DynamoDB as
           | the source of truth. However, we replicate some entities to
           | secondary databases like OpenSearch when we have access
           | patterns that require adhoc querying.
        
         | lliamander wrote:
         | I think there are good reasons to choose DynamoDB over a RDBMS
         | that have nothing to do with scalability.
         | 
         | I've used DynamoDB several times over the past several years in
         | the context of providing a datastore for a microservice. In all
         | cases it was cheaper and easier than RDS, and the ability to
         | add GSIs has enabled me to adapt to all of the new access
         | patterns I've had to deal with.
         | 
         | For us, DynamoDB has become a 'boring' option.
        
         | fubbyy wrote:
         | I think it also depends on the system you're using it on. I
         | think one of the biggest advantages of DDB is that is scales so
         | well (with good design to avoid hot partitions). Afaik, RDBMS
         | simply cannot scale in the same way due to their design. Yes,
         | they can scale somewhat, but as you said it requires lots of
         | tuning, and you'll still reach a hardish limit.
        
           | grogers wrote:
           | One partition of DDB is incredibly tiny compared to one
           | partition of an RDBMS. You can push that one partition of
           | RDBMS pretty far before you're forced to design sharding into
           | your system. With DDB you are basically forced to design
           | sharding into your partition keys up front or you _will_ have
           | hot partition issues. This is by far the most common problem
           | I see with teams using DDB, so brushing it off as  "with good
           | design to avoid hot partitions" is understating the scope of
           | the problem.
        
           | manigandham wrote:
           | All databases scale the same way - by partitioning and
           | sharding the dataspace. RDBMS have harder restrictions due to
           | the features they provide and the performance expectations,
           | but you can just as easily use a bunch of relational servers
           | to partition a table (or several) across them by range or
           | hashes of the primary key.
           | 
           | That's basically what key/value stores like DynamoDB do, and
           | why DynamoDB was even built on MySQL (at least originally).
        
             | kixiQu wrote:
             | "just as easily" would be the contested part, I'd guess
        
             | dalyons wrote:
             | "can just as easily use a bunch of relational servers to
             | partition a table" is not true at all. Managing,
             | maintaining and tuning a sharded relational cluster is an
             | astonishing amount of operations work. partition
             | management, re-partioning, partition failover / promotions
             | / demotions, query routing, shard discovery, upgrades... it
             | goes on an on. All this work is gone if you pick dynamo.
             | Not saying that dynamo is always better, but IMHO people
             | very much underestimate the ops cost of running a sharded
             | relational cluster at scale.
        
         | jerf wrote:
         | "I'd still urge anyone thinking about using it to carefully
         | reconsider whether their problem is truly unique enough that a
         | traditional RDBMS couldn't handle it with some tuning."
         | 
         | Lately, the problem I've seen is people who haven't even
         | considered whether their problem is truly unique enough that a
         | traditional RDBMS couldn't handle it _without_ some tuning.
         | (Here I don 't count "set up the obvious index" as "tuning",
         | because if you're using a non-RDBMS the same work is
         | encompassed in figuring out what to use as keys. No escaping
         | that one regardless of technology.)
         | 
         | I'm losing track of the number of teams in my company I've seen
         | switching databases after they rolled to production because it
         | turns out they picked a database that doesn't support the
         | primary access pattern for their data in some cases, or in
         | other cases, a very common secondary access pattern. In all the
         | cases I've seen so far, it's been for quantities of data that
         | an RDMBS would have chewed up and spat out without even
         | noticing. It's amazing how much trouble you can get yourself
         | into with non-relational databases with just a few hundred
         | megabytes of data, or even a few _tens_ of megabytes of data if
         | you fall particularly hard for the  "it's fast and easy!" hype
         | too hard and end up accidentally writing a pessimal schema
         | because you thought using a non-relational database meant you
         | got to think less about your schema than a relational DB.
         | 
         | That is precisely backwards; NoSQL-type DBs get their power
         | from you spending a lot more time and care in thinking about
         | exactly how you plan on accessing data. Many NoSQL databases
         | loosen the constraints on _what_ you can store in a given
         | record, but in return they are a great deal more fussy about
         | _how_ you access records. If you want to skip careful design of
         | _how_ you access records, you want the relational DB. And
         | nowadays, tossing a JSON field into a relational row is quite
         | cheap and effective for those  "catch alls" in the schema.
         | 
         | There's some interesting hybrids out there now if you want a
         | bit of both worlds. For instance, Clickhouse is not an SQL
         | database, but it more gracefully handles a lot of SQL-esque
         | workloads than many other NoSQL-esque databases. You can get
         | much farther with "I need a NoSQL-style database, but every
         | once in a while I need an SQL-like bit of functionality", than
         | you can in something like Cassandra.
        
         | ramraj07 wrote:
         | Could you elaborate on your (or a hypothetical) use case where
         | dynamo db makes sense? I for one can never come up with
         | something better served by rdbms or s3.
        
           | cogman10 wrote:
           | Lots of records (Billions), low/no relational linkage, The
           | need to query/update records in different ways (IE, you need
           | indexes), The need for HA and scaling (IE, perhaps you can be
           | VERY bursty and read heavy).
           | 
           | It's not one size fits all, but at least in my line of work
           | there are few instances where it's a pretty good fit.
        
           | ignoramous wrote:
           | When you require point and range queries. For example, given
           | a cart-id, fetch the skus; given a authz-token, fetch scopes;
           | given a user-id and a time-range, fetch a list of pending
           | order-ids.
           | 
           | There's a lot more you could do though, DynamoDB, is after
           | all, a wide-column KV store. Ref this re:invent talk from
           | 2018: https://www.youtube-nocookie.com/embed/HaEPXoXVf2k
           | 
           | Apart from being fully-managed, the key selling points of
           | DynamoDB are its consistent performance for a given query
           | type, read-your-writes consistency semantics, auto-
           | replication, auto-disaster recovery.
           | 
           | See also: https://martinfowler.com/bliki/AggregateOrientedDat
           | abase.htm... (mirror: https://archive.is/lc2eO)
        
             | ramraj07 wrote:
             | The aws reinvent lecture was great and answered exactly
             | when to use dynamodb. I might seriously consider it for
             | some of my applications for sure.
        
           | abd12 wrote:
           | I always tell people there are two clear areas where DynamoDB
           | has some major benefits:
           | 
           | - Very high scale applications that can be tough for an RDBMS
           | to handle
           | 
           | - Serverless applications (e.g. w/ AWS Lambda) due to how the
           | connection model (and other factors) work better with that
           | model.
           | 
           | Then, for about 80% of OLTP applications, you can choose
           | either DynamoDB or RDBMS, and it really comes down to which
           | tradeoffs you prefer.
           | 
           | DynamoDB will give you consistent, predictable performance
           | basically forever, and there's not the long-term maintenance
           | drag of tuning your database as your usage grows. The
           | downside, as others have mentioned, is more planning upfront
           | and some loss of flexibility.
        
           | blantonl wrote:
           | I'll give you two use cases that I use for DynamoDB, where
           | otherwise I'm primarily a MySQL shop
           | 
           | 1) Simple: I have a system that constantly records and stores
           | 30 minute MP3 files of audio streams (1000's of them) in S3.
           | We write the referencing metadata to a table in DynamoDB
           | where users can query by date/time. Given the sheer amount of
           | items (hundreds of millions), we saw far worse performance
           | vs. cost on MySQL vs Dynamo.
           | 
           | 2) Complex: I have a system that ingests thousands of tiny
           | MP3 files a minute into S3 and writes the associated metadata
           | to DynamoDB. DynamoDB then has a stream associated with it
           | that runs a lambda to consolidate statistics to another table
           | _and_ stream that metadata to clients via other lambdas or
           | data streams.
           | 
           | Those are two great use cases where we saw better usage
           | patterns with Dynamo vs MySQL.
        
         | magic_hamster wrote:
         | If you have a database access layer then structuring your
         | application shouldn't be that different. I wouldn't deal with
         | the database directly unless I had a really good reason or the
         | abstraction layer didn't support the query I was trying to run.
        
           | Dobbs wrote:
           | DynamoDB usage is heavily based around correctly structuring
           | your keys. Allowing you to do things like query sub-sets
           | easily. This in turn means you need to know what your usage
           | patterns will be like so you can correctly structure your
           | keys.
           | 
           | God help you if you need to make major changes to this down
           | the road.
           | 
           | Database Access Layer can't do this for you, that just isn't
           | what they do.
        
           | manigandham wrote:
           | An access layer doesn't change your access patterns, which is
           | what actually determines the database model to use.
           | 
           | DynamoDB (and other similar key/value stores) make very big
           | trade-offs for speed and scale that most applications don't
           | need.
        
           | ciberado wrote:
           | DynamoDB is amazing, but not very flexible once you have
           | designed your database. No abstraction layer will allow you
           | to run queries ad-hoc in a performant way.
        
             | whalesalad wrote:
             | It's true. 400kb max item size, too. 1mb max query size I
             | believe. Good luck grabbing a shit load of data at once
             | without a parallel scan.
             | 
             | Dynamo is a precision tool and it's great at those specific
             | workloads but it's not a one size fits all by any means.
        
               | shepherdjerred wrote:
               | 400kb is the max item size, the pattern to get around
               | that is to store objects in S3 and URLs/keys to those
               | objects in DDB
        
               | [deleted]
        
             | eatonphil wrote:
             | > No abstraction layer will allow you to run queries ad-hoc
             | in a performant way.
             | 
             | Depends on the size of the data. Run analytics queries
             | (i.e. things that return summary data not all rows) on 10GB
             | of data through clickhouse or duckdb or datafusion and
             | they'll generally return in milliseconds.
        
               | nightpool wrote:
               | What does this have to do with DynamoDB? The point is
               | that once you've gotten your data into DynamoDB, you're
               | strongly limited in how you can use it until you load it
               | into something else.
        
               | eatonphil wrote:
               | I didn't see an obvious connection between the two
               | sentences.
        
         | kumarvvr wrote:
         | > Designing application specifically for DynamoDB will take _a
         | lot_ of time and effort
         | 
         | Disagree with this. Your team could think of it as a document
         | database, and you can have utility libraries that filter and
         | sort based on PK / SK combinations to provide a seamless
         | experience.
        
           | shepherdjerred wrote:
           | If you want your DynamoDB table to scale well you'll have to
           | put in a lot of upfront effort.
        
       | kumarvvr wrote:
       | A word of caution. The default limit for number of tables per AWS
       | account, for DynamoDB is 2500.
       | 
       | Tables are a scarce resource and you want to use single table
       | designs for each app.
       | 
       | The design of tables with DDB is fascinating. Once you understand
       | the PK / SK / GSI dance, design becomes so intuitive.
        
       | revicon wrote:
       | One big benefit of DynamoDB over RDS on AWS is that the access
       | layer is API based so you don't have issues with held open
       | connections when accessing via AWS Lambda.
        
         | coredog64 wrote:
         | RDS proxy should fix this, but the proxy team is out of sync
         | with the RDS team. I've seen RDS ahead of proxy by two major
         | versions.
        
       | mabbo wrote:
       | Rick Houlihan did a talk a few years ago about designing the data
       | later for an application using dynamodb. The most common reaction
       | I get from people I show it to- most of them Amazon SDEs who
       | operate services that use Dynamodb- is "Holy shit what is this
       | wizardry?!"
       | 
       | https://youtu.be/HaEPXoXVf2k
       | 
       | One of the biggest mistakes people make with dynamo is thinking
       | that it's just a relational database with no relations. It's not.
       | 
       | It's an incredible system, but it requires a lot of deep
       | knowledge to get the full benefits, and it requires you, often,
       | to design your data layer very well up-front. I actually don't
       | recommend using it for a system that hasn't mostly stabilized in
       | design.
       | 
       | But when used right, it's an incredibly performant beast of a
       | data store.
        
         | ngc248 wrote:
         | Indeed with the GSI's etc you can implement a priority queue or
         | store data in the order you want etc. Once you are clear on the
         | access patterns of your app DynamoDB is amazing to model for
         | and will scale with your app. But if you are not clear about
         | your app's access patterns or need adhoc queries, then dynamoDB
         | is not a good fit.
        
         | LAC-Tech wrote:
         | Thanks, bookmarked this. It's good to see a proper take on data
         | modelling on document stores instead of just "through any old
         | JSON in there it'll be fine!!!"
        
         | itsmemattchung wrote:
         | Definitely one of my favorite talks by Rick and I apply lessons
         | learned in that video on a daily basis.
         | 
         | Must of watched that video...about 4-5 times, before I really
         | grasp the topics since I started my career that burned the
         | concept of relational databases into my head. Breaking from
         | that pattern of thought was difficult, initially.
        
         | time0ut wrote:
         | I also recommend Alex DeBrie's "The DynamoDB Book"
         | (https://www.dynamodbbook.com/). It is a great resource that
         | talks about these design patterns in depth. It has served me
         | and my team well over the past few years.
        
           | aarondf wrote:
           | Seconded! Alex DeBrie is a great teacher.
        
         | [deleted]
        
         | 0xbadcafebee wrote:
         | _Can_ be performant, nowadays anyway. Worked with a team who
         | built their own implementation because Amazon 's was too slow
         | and expensive.
         | 
         | It's a weird model. Too small of a dataset and it doesn't quite
         | make sense to use Dynamo. Too big of a dataset and it's full of
         | footguns. Medium-sized may be too expensive.
        
           | coredog64 wrote:
           | Too-small seems to be the perfect use case for DDB. I need
           | someplace to stash stuff and look it up by key. A full RDS is
           | overkill, as is anything else that requires nodes that charge
           | by the hour.
        
         | ronjouch wrote:
         | For explicitness & searchability, commenting with the title of
         | this talk, which is indeed _excellent_ , not limited to
         | DynamoDB, and which was kind of a revelation after years of
         | using DynamoDB suboptimally:
         | 
         | Rick Houlihan - AWS re:Invent 2018: Amazon DynamoDB Deep Dive:
         | Advanced Design Patterns for DynamoDB (DAT401) ,
         | https://www.youtube.com/watch?v=HaEPXoXVf2k
         | 
         | It should be watched along with reading the associated doc:
         | https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
        
         | davidjfelix wrote:
         | It's worth noting that a lot of the early database designs,
         | including this 2018 video pre-date some dramatic improvements
         | to dynamodb usability.
         | 
         | I think the biggest ones were:
         | 
         | - an increase in the number of GSIs you can create (Dec 2018)
         | [1]
         | 
         | - making on-demand possible [2]
         | 
         | - an increase in the default limit for number of tables you can
         | create (Mar 2022) [3]
         | 
         | I don't think these new features necessarily make the single-
         | table, overloaded GSI strategy that's discussed in the video
         | obsolete, but they enable applications which are growing to
         | adopt an incremental GSI approach and use multiple tables as
         | their data access patterns mature.
         | 
         | Some other posters have recommended Alex DeBrie's dynamodb book
         | and I also think that's an excellent resource, but I'd caution
         | people who are getting into dynamodb not to be scared by the
         | claims that dynamodb is inflexible to data access changes,
         | since AWS has been adding a lot of functionality to support
         | multi-table, unknown access patterns, emerging secondary
         | indexes, etc.
         | 
         | - [1] https://aws.amazon.com/about-aws/whats-
         | new/2018/12/amazon-dy...
         | 
         | - [2] https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-
         | demand-n...
         | 
         | - [3] https://aws.amazon.com/about-aws/whats-
         | new/2022/03/amazon-dy...
        
           | pdhborges wrote:
           | People don't need to be scared they just need to do their
           | homework.
           | 
           | In my opinion having more tables and more GSIs available
           | won't help you very much if you started with flawed data
           | model (unless you kept making the same design mistakes 256
           | times). A team that tries to claw back from a flawed table
           | design by pilling up GSIs is just in for a world of pain.
           | 
           | So if you are planing to go with Dynamo: - Read about the
           | data modeling tecniques - Figure out your access patterns -
           | Check if your application and model can withstand the
           | eventual consistency of GSIs - Have a plan to rework your
           | data model if requirements change: Are you going to
           | incrementally rewrite your table? Are you going to export it
           | and bulk load a fixed data model? How much is that going to
           | cost?
        
           | Twirrim wrote:
           | Something else important to mention is that dynamodb now re-
           | consolidates tables.
           | 
           | This is a lousy explanation, but Read/Write quota is split
           | evenly over all partitions. Each partition is created based
           | on the hash-key used, and there's an upper limit on how much
           | data can be stored in any given partition. So if you end up
           | with a hot hash-key, lots of stuff in it, that data gets
           | split over more and more and more partitions, and the overall
           | throughput goes _down_ (quota is split evenly over
           | partitions).
           | 
           | I believe this is still a general risk, and you need to be
           | extremely canny about your use of hash key to avoid it, but
           | historically they couldn't reconsolidate partitions. So you'd
           | end up with a table in a terrible state with quota having to
           | be sky high to still get effective performance. The only
           | option then was to completely rotate tables. New table with a
           | better hash-key, migrate data (or whatever else you needed to
           | do).
           | 
           | Now at least, once the data is gone, the partitions will
           | reconsolidate, so an entire table isn't a complete loss.
        
             | GauntletWizard wrote:
             | This bit me badly - An application that did significant
             | autoscaling, and hit a peak of 30,000 read/write requests
             | per second - But typically did more like 300.
             | 
             | The conversation with the Amazon support engineer told us
             | that we had over a hundred partitions (which even he
             | admitted was high for that number), and so our quota was
             | effectively giving us 0 iops per partition. This obviously
             | didn't work, and their only solution was "scale it back up,
             | copy everything to a new table". Which we did, but was an
             | engineering effort I'd rather have avoided.
        
       | 0xthrowaway wrote:
       | DynamoDB is (edit: can be) _extremely_ expensive compared to
       | alternatives (e.g. self hosted SQL).
       | 
       | Make sure the benifits (performance, managed, scale) outweigh the
       | costs!
        
       ___________________________________________________________________
       (page generated 2022-07-14 23:01 UTC)