[HN Gopher] The DynamoDB Paper
___________________________________________________________________
The DynamoDB Paper
Author : krnaveen14
Score : 223 points
Date : 2022-07-14 10:09 UTC (12 hours ago)
(HTM) web link (brooker.co.za)
(TXT) w3m dump (brooker.co.za)
| Patrol8394 wrote:
| These days I'd probable take a closer look at spanner. It is a
| consistent and scalable db. It makes life much easier for
| developers.
|
| Like Cassandra, dynamodb requires the data model to be designed
| very carefully to be able to get the max out of them.
|
| More often than not, that simply adds more complexity; people
| often underestimate how much a sharded mysql/Postgres can scale.
|
| My default choice for the longest time: Postgres for the data I
| care about, ES as secondary index and S3 as blob storage.
| deepGem wrote:
| True. Spanner and the likes of Spanner, CockroachDB, YugaByte
| all are strongly consistent and scalable dbs. The greatest
| advantage IMO is the ability to just use SQL without having to
| worry about carefully designing a data model. What bothers me
| however is that these data stores are not truly relational data
| stores. They spin a relational layer on top of a scalable key-
| value data store.
|
| Is it necessary to use a strongly consistent transactional data
| store if your needs don't demand transactions, by transactions
| I mean 2PC. IMO you are still better off with
| DynamoDB/Cosmos/MongoDB for eventual consistency use cases. The
| reason being, you have to resort to a data model if you don't
| need the relational layer in YugaByte at least, not sure about
| Spanner. So why bother with Yugabyte if am resorting to a data
| model. Might as well stick with DynamoDB.
| jd_mongodb wrote:
| You seem to think MongoDB is eventually consistent. MongoDB
| is designed as strongly consistent database. You can choose
| to query a secondary and that will be eventually consistent
| but that is not the default behaviour.
| manigandham wrote:
| What part of SQL requires not having to design a data model?
| What exactly do you mean by that?
|
| And technically all relational databases are relational
| layers on top of a key/value subsystem. Splitting that apart
| and scaling the storage is how most of the NewSQL databases
| scale , from CRDB to Yugabyte to Neon.
| dboreham wrote:
| > What bothers me however is that these data stores are not
| truly relational data stores
|
| Suggests there may be an impossibility theory lurking
| somewhere.
| arminiusreturns wrote:
| I think VoltDb (and SciDB) are worth checking out also. I'm
| seeing some very impressive ACID compliant TPS with elixir
| connected to voltdb. I don't like having to pay to get
| distributed features however (open source community edition
| is feature gimped compared to payed.)
| _benedict wrote:
| Global strict serializability is coming to Cassandra very soon
| [1]
|
| [1]
| https://cwiki.apache.org/confluence/download/attachments/188...
| ctvo wrote:
| I've found DDB to be exceptional for use cases where eventual
| consistency is OK and you have a few well defined query patterns.
| This is a large number of use cases so it's not too limiting. As
| the number of query patterns grow, indices grow, and costs grow
| (or pray for your soul you attempt to use DDB transactions to
| write multiple keys to support differing query patterns). If you
| need strong consistency, your cost and latency also increases.
|
| Oh, and I'd avoid DAX. Write your own cache layer. The query
| cache vs. item cache separation[1] in DAX is a giant footgun.
| It's also very under supported. There still isn't a DAX client
| for AWS SDK v2 in Go for example[2].
|
| 1 -
| https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
|
| 2 - https://github.com/aws/aws-dax-go/issues/2
| dboreham wrote:
| I'd like to learn more about their MemDS. Afaik nothing has been
| made public.
| nathas wrote:
| Nice write-up from Marc. This definitely hits on the most common
| problems distributed systems face. I haven't read the paper yet
| but it _is_ pretty cool they published this and talk about
| changes over time.
|
| 1. Managing 'heat' in the system (or assuming that you'll have an
| uniform distribution of requests)
|
| 2. Recovering a distributed system from a cold state and what
| that implies for your caches.
|
| 3. The obvious one that people that do this type of thing spend a
| lot of time thinking about: CAP theorem shenanigans and using
| Paxos.
|
| Reminds me of the Grugbrained developer on microservices:
| https://grugbrain.dev/#grug-on-microservices
|
| Good luck getting every piece working on the first major
| recovery. My 100% unscientific hunch is that most folks aren't
| testing their cold state recovery from a big failure, much how
| folks don't test their database restoration solutions (or
| historically haven't).
| ruoranwang wrote:
| I wonder how's Cassandra doing? I heard companies are migrating
| away from it.
| sudhirj wrote:
| The way that I learnt the ins and outs of DynamoDB (and there is
| a lot to learn if you want to use it effectively) is by
| implementing all the Redis data structures and commands on it.
| That helped understand both systems in one shot.
|
| The key concept in Dynamo is that you use a partition key on all
| your bits of data (my mental model is that you get one server per
| partition) and you then can arrange data using a sort key in that
| partition. You can then range/inequality query over the sort
| keys. That's the gist of it.
|
| The power and scalability comes from the fact that each partition
| can be individually allocated and scaled, so as long as you
| spread over partitions you have practically no limits.
|
| And you can do quite a bit with that sort key range/inequality
| thing. I was pleasantly surprised by how much of Redis I could
| implement: https://github.com/dbProjectRED/redimo.go
| jerryjerryjerry wrote:
| Good job! But I'm wondering when Amazon can start to contribute
| to open source world...
| manigandham wrote:
| They already do: https://aws.amazon.com/opensource/
| no_wizard wrote:
| How well does DyanmoDB scale when paired with AppSync and
| GraphQL? The selling point here being you can use GQL as your
| schema for the DB too and get automatic APIs for free
| ledauphin wrote:
| i've done this. it works really, really well to start off with
| - your API basically is your schema, and you're done.
|
| There's definitely more work later on when your API and data
| model start diverging (which they always will). Overall it was
| a decent experience, and DynamoDB has made some really
| important QOL improvements over the last 5 years, too.
|
| It's still not relational, which means it's very different and
| you'll be committed to a totally different way of thinking
| about things for a while.
| dboreham wrote:
| Just fine?
| no_wizard wrote:
| I should have made it clear: I was hoping to get some folks
| to talk to their experience using it this way. I haven't find
| alot in terms of real world evaluation of it.
|
| It can also use Aurora Serverless V2, and I am curious about
| that as well, FWIW
| ledauphin wrote:
| An underrated part of DynamoDB are its streams. You can subscribe
| to changes and reliably process those in a distributed way. If
| you're comfortable with the terms "at-least once delivery" and
| "eventual consistency", you can build some truly amazing systems
| by letting events propagate reactively through your system, never
| touching a data store or messaging broker other than DynamoDB
| itself.
|
| It's not for everyone, but when you get a team up and running
| with it, it can be shockingly powerful.
| mikhmha wrote:
| Yeah we make use of streams at my work. Really useful. You can
| hook up streams to a Lambda and have it process events and flow
| them downstream to a Data Lake or Data Warehouse for analytic
| workloads. What works really well is pushing data to an S3
| bucket with object versioning and replication enabled.
|
| I think dynamoDB streams and kinesis streams work similar under
| the hood? But dynamoDB streams are way cheaper, pricing is on-
| demand compared to hourly for Kinesis.
| ignoramous wrote:
| > From the paper [0]: _DynamoDB consists of tens of
| microservices._
|
| Ha! For folks who think two-pizza teams mean 100s of
| microservices... this is probably the second most scaled-out
| storage service at AWS (behind S3?), and it runs _tens_ of
| microservices (pretty sure these aren 't _micro_ the way most
| folks would presume 'em to be).
|
| > _What 's exciting for me about this paper is that it covers
| DynamoDB's journey..._
|
| Assuming these comments are true [1][2], in a classic Amazon
| fashion [3], the paper fails to acknowledge a FOSS database
| (once?) underneath it: MySQL/InnoDB (and references it as B-Tree
| instead).
|
| [0]
| https://web.archive.org/web/20220712155558/https://www.useni...
|
| [1] https://news.ycombinator.com/item?id=13173927
|
| [2] https://news.ycombinator.com/item?id=18871854
|
| [3] https://archive.is/T1ZNJ
| deanCommie wrote:
| Lots can change over the years. Your links are from 2016 - it's
| not conceivable that in the last 6 years, Amazon has changed
| some of the implementation?
| hintymad wrote:
| I'm not sure about DDB, but I know in AWS in general building a
| new service does not give you credit by default. It's not like
| the shit Uber promoted: Yeh! We have 8000 services. Look how
| great we are! In fact, people usually question if someone
| proposes to create a new service. Working Backwards (i.e.,
| solving real user problems) and Invent and Simplify are indeed
| two powerful leadership principles. And of course, the sheer
| amount of work involved in setting up a new service is so much
| that people have to think twice between starting a new service.
| bistablesulphur wrote:
| I'be been working with DynamoDB daily for a few years now, and
| whilst I like working with it and the specific scenario it solves
| for us, I'd still urge anyone thinking about using it to
| carefully reconsider whether their problem is truly unique enough
| that a traditional RDBMS couldn't handle it with some tuning.
| Theycan be unbelievably performant and give so much stuff for
| free.
|
| Designing application specifically for DynamoDB will take _a lot_
| of time and effort. I think we could have saved almost a third of
| our entire development time had we used more of the boring stuff.
| josevalerio wrote:
| +1
|
| Discovered this while building
| https://github.com/plutomi/plutomi as I was enamored by Rick's
| talks and guarantees of `performance at any scale`. In reality,
| Dynamo was solving scaling issues that we didn't have and the
| amount of times I've had to rework something to get around some
| of the quirks of Dynamo led to a lot of lost dev time.
|
| Now that the project is getting more complex, doing simple
| things such as "searching" (for our use case) are virtually
| impossible without hosting an ElasticSearch cluster where a
| simple like %email% in postgres would have sufficed.
|
| Not saying it's a bad DB at all, but you really need to know
| your access patterns and plan accordingly. Dynamo streams are a
| godsend and combined with EventBridge you can do some powerful
| things for asynchronous events. Not paying while it's not
| running with on demand is awesome, and the performance is truly
| off the charts. Just please know what you are getting into. In
| fact, I'd recommend only using Dynamo if you are migrating a
| "finished" app vs using it for apps that are still evolving
| solatic wrote:
| > give so much stuff for free
|
| Interesting choice of words. Performance wise, sure. Money
| wise? I'm still waiting for a SQL database with pay-per-request
| pricing. The cost difference is enormous, particularly when you
| remember that you don't need to spend manpower managing the
| underlying hardware.
|
| Engineering tradeoffs are more complicated than only
| considering raw scalability performance and "I can run it
| myself on a cheap Raspberry Pi".
| david38 wrote:
| Who manages hardware these days? Aurora works quite well.
| 0xthrowaway wrote:
| >Interesting choice of words. Performance wise, sure. Money
| wise? I'm still waiting for a SQL database with pay-per-
| request pricing. The cost difference is enormous,
| particularly when you remember that you don't need to spend
| manpower managing the underlying hardware.
|
| I assume you're saying DynamoDB is _less_ expensive than SQL
| because of pay-per-request.
|
| Working on applications with a modest amount of data (a few
| TB over a few years) pay per request has been incredibly
| expensive even with scaled provisioning. I would much rather
| have an SQL database and pay for the server/s. Then I could
| afford a few more developers!
| alexnewman wrote:
| totally, or s3
| cnlwsu wrote:
| To be fair, you can end up spending a lot of time on the boring
| stuff as well.
| blantonl wrote:
| _Designing application specifically for DynamoDB will take _a
| lot_ of time and effort._
|
| If you can write, read, and query a JSON document using an API
| in your application, it's literally that simple.
|
| The only real time and effort is the architectural decisions
| you make up front, and that's about it. And there are some
| great guides out there that cover 99% of those architectural
| decisions.
|
| As a user of both, I find MySQL replication and clusters to be
| far more complex and time and effort intensive.
| lumost wrote:
| It's a question of change resilience. You _can_ implement
| crud on a single object with ddb trivially. You can't
| implement 5 different list by X property apis trivially, or
| filter the objects, or deal with foreign keys...
| i_love_limes wrote:
| Have to disagree on this one. Something as basic and out of
| the box as a migration / data backfill is not only
| complicated but also very expensive (both time and cost wise)
| on Dynamo. Not to mention all the other things that come
| nicely with an relational db (type checking, auto increments,
| uniform data)
| blantonl wrote:
| To be fair, the parent discusses designing an application
| to use Dynamo, not data migration.
|
| I'll completely agree with you on migration / backfill.
| You're going to pay a lot of money to migrate a ton of data
| into Dynamo, and you'll also definitely increase the
| complexity in provisioning and setting up that migration
| pattern.
|
| But my comment stands pretty well considering greefield
| application development around Dynamo.
| kdazzle wrote:
| > The only real time and effort is the architectural
| decisions you make up front, and that's about it
|
| And dont forget about the time spent fixing what could have
| been caught by types and regular old db constraints (for most
| applications)
| nkozyra wrote:
| > If you can write, read, and query a JSON document using an
| API in your application, it's literally that simple
|
| You could say that of Elasticsearch or Mongo, too. And it
| might be _technically_ true, but you haven 't scratched the
| surface of mappings, design, limitations, etc.
|
| You can dump a bunch of data into Dynamo very easily, but
| what about getting data via secondary indices when you can't
| get your data with the views you've built without scanning?
| How do you use partition keys in it? And so on.
| codetiger wrote:
| Is there a specific reason why you say "Designing application
| specifically for DynamoDB will take _a lot_ of time and
| effort". Are you talking about migrating from RDBMS to
| DynamoDB? Coz, my experience with DynamoDB designing was very
| similar to any other NoSQL DB.
| taspeotis wrote:
| I mean if you go the single table route...
| https://aws.amazon.com/blogs/compute/creating-a-single-
| table...
| bistablesulphur wrote:
| A lot is transferrable to NoSQL and key-value in general,
| though DDB has plenty of quirks of it's own. Understanding
| your problem really is the key. A lot of problems turn out to
| be quite relational after all
|
| You definitely can build just about anything with DDB, it's
| often just not worth the time when most can be solved by
| existing tools
| cebert wrote:
| You really need to consider you access patterns up front with
| DynamoDB. Any changes of those during application development
| can be very time consuming. There are limitations on how many
| local and global secondary indexes you can have. You also
| can't easily add them to existing tables. However, you can
| use multiple databases to get the best of both worlds. At my
| employer, we typically store domain entities in DynamoDB as
| the source of truth. However, we replicate some entities to
| secondary databases like OpenSearch when we have access
| patterns that require adhoc querying.
| lliamander wrote:
| I think there are good reasons to choose DynamoDB over a RDBMS
| that have nothing to do with scalability.
|
| I've used DynamoDB several times over the past several years in
| the context of providing a datastore for a microservice. In all
| cases it was cheaper and easier than RDS, and the ability to
| add GSIs has enabled me to adapt to all of the new access
| patterns I've had to deal with.
|
| For us, DynamoDB has become a 'boring' option.
| fubbyy wrote:
| I think it also depends on the system you're using it on. I
| think one of the biggest advantages of DDB is that is scales so
| well (with good design to avoid hot partitions). Afaik, RDBMS
| simply cannot scale in the same way due to their design. Yes,
| they can scale somewhat, but as you said it requires lots of
| tuning, and you'll still reach a hardish limit.
| grogers wrote:
| One partition of DDB is incredibly tiny compared to one
| partition of an RDBMS. You can push that one partition of
| RDBMS pretty far before you're forced to design sharding into
| your system. With DDB you are basically forced to design
| sharding into your partition keys up front or you _will_ have
| hot partition issues. This is by far the most common problem
| I see with teams using DDB, so brushing it off as "with good
| design to avoid hot partitions" is understating the scope of
| the problem.
| manigandham wrote:
| All databases scale the same way - by partitioning and
| sharding the dataspace. RDBMS have harder restrictions due to
| the features they provide and the performance expectations,
| but you can just as easily use a bunch of relational servers
| to partition a table (or several) across them by range or
| hashes of the primary key.
|
| That's basically what key/value stores like DynamoDB do, and
| why DynamoDB was even built on MySQL (at least originally).
| kixiQu wrote:
| "just as easily" would be the contested part, I'd guess
| dalyons wrote:
| "can just as easily use a bunch of relational servers to
| partition a table" is not true at all. Managing,
| maintaining and tuning a sharded relational cluster is an
| astonishing amount of operations work. partition
| management, re-partioning, partition failover / promotions
| / demotions, query routing, shard discovery, upgrades... it
| goes on an on. All this work is gone if you pick dynamo.
| Not saying that dynamo is always better, but IMHO people
| very much underestimate the ops cost of running a sharded
| relational cluster at scale.
| jerf wrote:
| "I'd still urge anyone thinking about using it to carefully
| reconsider whether their problem is truly unique enough that a
| traditional RDBMS couldn't handle it with some tuning."
|
| Lately, the problem I've seen is people who haven't even
| considered whether their problem is truly unique enough that a
| traditional RDBMS couldn't handle it _without_ some tuning.
| (Here I don 't count "set up the obvious index" as "tuning",
| because if you're using a non-RDBMS the same work is
| encompassed in figuring out what to use as keys. No escaping
| that one regardless of technology.)
|
| I'm losing track of the number of teams in my company I've seen
| switching databases after they rolled to production because it
| turns out they picked a database that doesn't support the
| primary access pattern for their data in some cases, or in
| other cases, a very common secondary access pattern. In all the
| cases I've seen so far, it's been for quantities of data that
| an RDMBS would have chewed up and spat out without even
| noticing. It's amazing how much trouble you can get yourself
| into with non-relational databases with just a few hundred
| megabytes of data, or even a few _tens_ of megabytes of data if
| you fall particularly hard for the "it's fast and easy!" hype
| too hard and end up accidentally writing a pessimal schema
| because you thought using a non-relational database meant you
| got to think less about your schema than a relational DB.
|
| That is precisely backwards; NoSQL-type DBs get their power
| from you spending a lot more time and care in thinking about
| exactly how you plan on accessing data. Many NoSQL databases
| loosen the constraints on _what_ you can store in a given
| record, but in return they are a great deal more fussy about
| _how_ you access records. If you want to skip careful design of
| _how_ you access records, you want the relational DB. And
| nowadays, tossing a JSON field into a relational row is quite
| cheap and effective for those "catch alls" in the schema.
|
| There's some interesting hybrids out there now if you want a
| bit of both worlds. For instance, Clickhouse is not an SQL
| database, but it more gracefully handles a lot of SQL-esque
| workloads than many other NoSQL-esque databases. You can get
| much farther with "I need a NoSQL-style database, but every
| once in a while I need an SQL-like bit of functionality", than
| you can in something like Cassandra.
| ramraj07 wrote:
| Could you elaborate on your (or a hypothetical) use case where
| dynamo db makes sense? I for one can never come up with
| something better served by rdbms or s3.
| cogman10 wrote:
| Lots of records (Billions), low/no relational linkage, The
| need to query/update records in different ways (IE, you need
| indexes), The need for HA and scaling (IE, perhaps you can be
| VERY bursty and read heavy).
|
| It's not one size fits all, but at least in my line of work
| there are few instances where it's a pretty good fit.
| ignoramous wrote:
| When you require point and range queries. For example, given
| a cart-id, fetch the skus; given a authz-token, fetch scopes;
| given a user-id and a time-range, fetch a list of pending
| order-ids.
|
| There's a lot more you could do though, DynamoDB, is after
| all, a wide-column KV store. Ref this re:invent talk from
| 2018: https://www.youtube-nocookie.com/embed/HaEPXoXVf2k
|
| Apart from being fully-managed, the key selling points of
| DynamoDB are its consistent performance for a given query
| type, read-your-writes consistency semantics, auto-
| replication, auto-disaster recovery.
|
| See also: https://martinfowler.com/bliki/AggregateOrientedDat
| abase.htm... (mirror: https://archive.is/lc2eO)
| ramraj07 wrote:
| The aws reinvent lecture was great and answered exactly
| when to use dynamodb. I might seriously consider it for
| some of my applications for sure.
| abd12 wrote:
| I always tell people there are two clear areas where DynamoDB
| has some major benefits:
|
| - Very high scale applications that can be tough for an RDBMS
| to handle
|
| - Serverless applications (e.g. w/ AWS Lambda) due to how the
| connection model (and other factors) work better with that
| model.
|
| Then, for about 80% of OLTP applications, you can choose
| either DynamoDB or RDBMS, and it really comes down to which
| tradeoffs you prefer.
|
| DynamoDB will give you consistent, predictable performance
| basically forever, and there's not the long-term maintenance
| drag of tuning your database as your usage grows. The
| downside, as others have mentioned, is more planning upfront
| and some loss of flexibility.
| blantonl wrote:
| I'll give you two use cases that I use for DynamoDB, where
| otherwise I'm primarily a MySQL shop
|
| 1) Simple: I have a system that constantly records and stores
| 30 minute MP3 files of audio streams (1000's of them) in S3.
| We write the referencing metadata to a table in DynamoDB
| where users can query by date/time. Given the sheer amount of
| items (hundreds of millions), we saw far worse performance
| vs. cost on MySQL vs Dynamo.
|
| 2) Complex: I have a system that ingests thousands of tiny
| MP3 files a minute into S3 and writes the associated metadata
| to DynamoDB. DynamoDB then has a stream associated with it
| that runs a lambda to consolidate statistics to another table
| _and_ stream that metadata to clients via other lambdas or
| data streams.
|
| Those are two great use cases where we saw better usage
| patterns with Dynamo vs MySQL.
| magic_hamster wrote:
| If you have a database access layer then structuring your
| application shouldn't be that different. I wouldn't deal with
| the database directly unless I had a really good reason or the
| abstraction layer didn't support the query I was trying to run.
| Dobbs wrote:
| DynamoDB usage is heavily based around correctly structuring
| your keys. Allowing you to do things like query sub-sets
| easily. This in turn means you need to know what your usage
| patterns will be like so you can correctly structure your
| keys.
|
| God help you if you need to make major changes to this down
| the road.
|
| Database Access Layer can't do this for you, that just isn't
| what they do.
| manigandham wrote:
| An access layer doesn't change your access patterns, which is
| what actually determines the database model to use.
|
| DynamoDB (and other similar key/value stores) make very big
| trade-offs for speed and scale that most applications don't
| need.
| ciberado wrote:
| DynamoDB is amazing, but not very flexible once you have
| designed your database. No abstraction layer will allow you
| to run queries ad-hoc in a performant way.
| whalesalad wrote:
| It's true. 400kb max item size, too. 1mb max query size I
| believe. Good luck grabbing a shit load of data at once
| without a parallel scan.
|
| Dynamo is a precision tool and it's great at those specific
| workloads but it's not a one size fits all by any means.
| shepherdjerred wrote:
| 400kb is the max item size, the pattern to get around
| that is to store objects in S3 and URLs/keys to those
| objects in DDB
| [deleted]
| eatonphil wrote:
| > No abstraction layer will allow you to run queries ad-hoc
| in a performant way.
|
| Depends on the size of the data. Run analytics queries
| (i.e. things that return summary data not all rows) on 10GB
| of data through clickhouse or duckdb or datafusion and
| they'll generally return in milliseconds.
| nightpool wrote:
| What does this have to do with DynamoDB? The point is
| that once you've gotten your data into DynamoDB, you're
| strongly limited in how you can use it until you load it
| into something else.
| eatonphil wrote:
| I didn't see an obvious connection between the two
| sentences.
| kumarvvr wrote:
| > Designing application specifically for DynamoDB will take _a
| lot_ of time and effort
|
| Disagree with this. Your team could think of it as a document
| database, and you can have utility libraries that filter and
| sort based on PK / SK combinations to provide a seamless
| experience.
| shepherdjerred wrote:
| If you want your DynamoDB table to scale well you'll have to
| put in a lot of upfront effort.
| kumarvvr wrote:
| A word of caution. The default limit for number of tables per AWS
| account, for DynamoDB is 2500.
|
| Tables are a scarce resource and you want to use single table
| designs for each app.
|
| The design of tables with DDB is fascinating. Once you understand
| the PK / SK / GSI dance, design becomes so intuitive.
| revicon wrote:
| One big benefit of DynamoDB over RDS on AWS is that the access
| layer is API based so you don't have issues with held open
| connections when accessing via AWS Lambda.
| coredog64 wrote:
| RDS proxy should fix this, but the proxy team is out of sync
| with the RDS team. I've seen RDS ahead of proxy by two major
| versions.
| mabbo wrote:
| Rick Houlihan did a talk a few years ago about designing the data
| later for an application using dynamodb. The most common reaction
| I get from people I show it to- most of them Amazon SDEs who
| operate services that use Dynamodb- is "Holy shit what is this
| wizardry?!"
|
| https://youtu.be/HaEPXoXVf2k
|
| One of the biggest mistakes people make with dynamo is thinking
| that it's just a relational database with no relations. It's not.
|
| It's an incredible system, but it requires a lot of deep
| knowledge to get the full benefits, and it requires you, often,
| to design your data layer very well up-front. I actually don't
| recommend using it for a system that hasn't mostly stabilized in
| design.
|
| But when used right, it's an incredibly performant beast of a
| data store.
| ngc248 wrote:
| Indeed with the GSI's etc you can implement a priority queue or
| store data in the order you want etc. Once you are clear on the
| access patterns of your app DynamoDB is amazing to model for
| and will scale with your app. But if you are not clear about
| your app's access patterns or need adhoc queries, then dynamoDB
| is not a good fit.
| LAC-Tech wrote:
| Thanks, bookmarked this. It's good to see a proper take on data
| modelling on document stores instead of just "through any old
| JSON in there it'll be fine!!!"
| itsmemattchung wrote:
| Definitely one of my favorite talks by Rick and I apply lessons
| learned in that video on a daily basis.
|
| Must of watched that video...about 4-5 times, before I really
| grasp the topics since I started my career that burned the
| concept of relational databases into my head. Breaking from
| that pattern of thought was difficult, initially.
| time0ut wrote:
| I also recommend Alex DeBrie's "The DynamoDB Book"
| (https://www.dynamodbbook.com/). It is a great resource that
| talks about these design patterns in depth. It has served me
| and my team well over the past few years.
| aarondf wrote:
| Seconded! Alex DeBrie is a great teacher.
| [deleted]
| 0xbadcafebee wrote:
| _Can_ be performant, nowadays anyway. Worked with a team who
| built their own implementation because Amazon 's was too slow
| and expensive.
|
| It's a weird model. Too small of a dataset and it doesn't quite
| make sense to use Dynamo. Too big of a dataset and it's full of
| footguns. Medium-sized may be too expensive.
| coredog64 wrote:
| Too-small seems to be the perfect use case for DDB. I need
| someplace to stash stuff and look it up by key. A full RDS is
| overkill, as is anything else that requires nodes that charge
| by the hour.
| ronjouch wrote:
| For explicitness & searchability, commenting with the title of
| this talk, which is indeed _excellent_ , not limited to
| DynamoDB, and which was kind of a revelation after years of
| using DynamoDB suboptimally:
|
| Rick Houlihan - AWS re:Invent 2018: Amazon DynamoDB Deep Dive:
| Advanced Design Patterns for DynamoDB (DAT401) ,
| https://www.youtube.com/watch?v=HaEPXoXVf2k
|
| It should be watched along with reading the associated doc:
| https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
| davidjfelix wrote:
| It's worth noting that a lot of the early database designs,
| including this 2018 video pre-date some dramatic improvements
| to dynamodb usability.
|
| I think the biggest ones were:
|
| - an increase in the number of GSIs you can create (Dec 2018)
| [1]
|
| - making on-demand possible [2]
|
| - an increase in the default limit for number of tables you can
| create (Mar 2022) [3]
|
| I don't think these new features necessarily make the single-
| table, overloaded GSI strategy that's discussed in the video
| obsolete, but they enable applications which are growing to
| adopt an incremental GSI approach and use multiple tables as
| their data access patterns mature.
|
| Some other posters have recommended Alex DeBrie's dynamodb book
| and I also think that's an excellent resource, but I'd caution
| people who are getting into dynamodb not to be scared by the
| claims that dynamodb is inflexible to data access changes,
| since AWS has been adding a lot of functionality to support
| multi-table, unknown access patterns, emerging secondary
| indexes, etc.
|
| - [1] https://aws.amazon.com/about-aws/whats-
| new/2018/12/amazon-dy...
|
| - [2] https://aws.amazon.com/blogs/aws/amazon-dynamodb-on-
| demand-n...
|
| - [3] https://aws.amazon.com/about-aws/whats-
| new/2022/03/amazon-dy...
| pdhborges wrote:
| People don't need to be scared they just need to do their
| homework.
|
| In my opinion having more tables and more GSIs available
| won't help you very much if you started with flawed data
| model (unless you kept making the same design mistakes 256
| times). A team that tries to claw back from a flawed table
| design by pilling up GSIs is just in for a world of pain.
|
| So if you are planing to go with Dynamo: - Read about the
| data modeling tecniques - Figure out your access patterns -
| Check if your application and model can withstand the
| eventual consistency of GSIs - Have a plan to rework your
| data model if requirements change: Are you going to
| incrementally rewrite your table? Are you going to export it
| and bulk load a fixed data model? How much is that going to
| cost?
| Twirrim wrote:
| Something else important to mention is that dynamodb now re-
| consolidates tables.
|
| This is a lousy explanation, but Read/Write quota is split
| evenly over all partitions. Each partition is created based
| on the hash-key used, and there's an upper limit on how much
| data can be stored in any given partition. So if you end up
| with a hot hash-key, lots of stuff in it, that data gets
| split over more and more and more partitions, and the overall
| throughput goes _down_ (quota is split evenly over
| partitions).
|
| I believe this is still a general risk, and you need to be
| extremely canny about your use of hash key to avoid it, but
| historically they couldn't reconsolidate partitions. So you'd
| end up with a table in a terrible state with quota having to
| be sky high to still get effective performance. The only
| option then was to completely rotate tables. New table with a
| better hash-key, migrate data (or whatever else you needed to
| do).
|
| Now at least, once the data is gone, the partitions will
| reconsolidate, so an entire table isn't a complete loss.
| GauntletWizard wrote:
| This bit me badly - An application that did significant
| autoscaling, and hit a peak of 30,000 read/write requests
| per second - But typically did more like 300.
|
| The conversation with the Amazon support engineer told us
| that we had over a hundred partitions (which even he
| admitted was high for that number), and so our quota was
| effectively giving us 0 iops per partition. This obviously
| didn't work, and their only solution was "scale it back up,
| copy everything to a new table". Which we did, but was an
| engineering effort I'd rather have avoided.
| 0xthrowaway wrote:
| DynamoDB is (edit: can be) _extremely_ expensive compared to
| alternatives (e.g. self hosted SQL).
|
| Make sure the benifits (performance, managed, scale) outweigh the
| costs!
___________________________________________________________________
(page generated 2022-07-14 23:01 UTC)