[HN Gopher] DynamoDB 10 years later
___________________________________________________________________
DynamoDB 10 years later
Author : mariuz
Score : 212 points
Date : 2022-01-20 11:26 UTC (11 hours ago)
(HTM) web link (www.amazon.science)
(TXT) w3m dump (www.amazon.science)
| graderjs wrote:
| The in-the-trenches, technical, battle-tested/battle-scarred
| comments on this thread are why I come to HN for the comments.
| tmitchel2 wrote:
| DynamoDB for me is the perfect database for my serverless /
| graphql API. My only gripe is the limitation of items in a
| transaction of 25. I've had to resort to layering another
| transaction mgmt system on the top of it.
| salil999 wrote:
| I can safely say that the team members working in DynamoDB are
| very skilled and they care deeply about the product. They really
| work hard and think of interesting solutions to a lot of problems
| that their biggest customers face which is great from a product
| standpoint. There are some pretty smart people working there.
|
| Engineering, however, was a disaster story. Code is horribly
| written and very few tests are maintained to make sure
| deployments go without issues. There was too much emphasis on
| deployment and getting fixes/features out over making sure it
| won't break anything else. It was a common scenario to release a
| new feature and put duct tape all around it to make sure it
| "works". And way too many operational issues. There are a lot of
| ways to break DynamoDB :)
|
| Overall, though, the product is very solid and it's one of the
| few database that you can say "just works" when it comes to
| scalability and reliability (as most AWS services are)
|
| I worked at DynamoDB for over 2 years.
| deanCommie wrote:
| If the developers are happy about the code and testing quality
| of a project, then you waited too long to ship.
|
| If the customers don't have any feedback or missed feature asks
| at launch, you waited too long to ship.
|
| You know who has great internal code and test quality? Google.
| Which is why Google doesn't ship. They're a wealth distribution
| charity for talented engineers. And their competitive advantage
| is that they lure talented people away from other companies
| where they might actually ship something and compete with
| Google, to instead park them, distract them with toys, beer
| kegs, readability reviews, and monorepo upgrades.
| geodel wrote:
| Very interesting!
|
| To me the takeaway is large/interesting/challenging engineering
| projects are pretty close to disasters generally. Some time
| they do become disaster actually.
|
| On the other hand if a project looks like straight up designed,
| neatly put into JIRA stories, and developers deliver code
| consistently week after week then it may be a successfully
| planned and delivered project. But it would mostly be doing
| stuff that has already been many times over and likely by same
| people on team.
|
| At least this has been my experience while working on
| standardized / templated projects vs something new.
| notyourwork wrote:
| Challenging the cutting edge of your product domain is what I
| get from this. Easy things are easy and predictable. Hard
| things and unpredictable evolving requirements are a tension
| against the initial system design which is the foundation of
| your code base. Over time the larger projects get the perhaps
| further they deviate from the original design. If you could
| predict it up front in many cases its not all that
| interesting or challenging of a problem. Duct tape is fine to
| use as long as you understand when you've gone too far and
| might want to re-design from scratch based on prior
| learnings.
| otterley wrote:
| Customers care about the outcome, not the internal process.
| Besides, I've never worked at any sizable company in my
| 20+-year-long career where I didn't conclude, "it's a miracle
| this garbage works at all."
|
| Enjoy the sausage, but if you have a weak stomach, don't watch
| how it's made.
|
| (I work for AWS but not on the DynamoDB team and I have no
| first-hand knowledge of the above claim. Opinions are my own
| and not those of my employer.)
| listenallyall wrote:
| Just curious, why do you mention you work at AWS if you're
| just disclaiming that fact in the next sentence? Besides,
| nothing you stated is specific to AWS or any of its products.
| sokoloff wrote:
| I don't work at Amazon, but our company's social media
| policy requires us to be transparent about a possible
| conflict of interest when speaking about things "close to"
| our company/our position in the industry and also need to
| be clear about whether we're speaking in an official
| capacity or in a personal capacity.
|
| This is designed to reduce the chances of eager employees
| going out and astro-turfing or otherwise acting in trust-
| damaging ways while thinking they're "helping".
| sealjam wrote:
| Crudely speaking, the fact that they work at AWS means that
| it's in their best interests for AWS to be perceived
| positively.
|
| When this is the case it's often nice to state this
| conflict of interest, so others can take your appraisal in
| the appropriate context.
|
| I'm not implying anything about the post, just stating what
| I assume to be the reason for the disclosure.
| chrisfosterelli wrote:
| > Customers care about the outcome, not the internal process.
|
| This is true though there's only so much technical debt and
| internal process chaos you can create before it affects the
| outcome. It's a leading indicator, so by the time customers
| are feeling that pain you've got a lot of work in front of
| you before you can turn it around, if at all, and customers
| are not going to be happy for that duration.
|
| Technical debt is not something to completely defeat or
| completely ignore, instead it's a tradeoff to manage.
| jsdalton wrote:
| This article from Martin Fowler explores your point in
| greater depth. It's a good read:
| https://martinfowler.com/articles/is-quality-worth-
| cost.html
|
| One concrete problem with technical debt the article
| highlights is it that negatively impacts the time to
| deliver new features. Customers today usually expect not
| only a great initial feature set from a product, but also a
| steady stream of improvements and growth, along with
| responsiveness to feedback and pain points.
| salil999 wrote:
| Exactly this. I was too young at the time to grasp this idea.
| wnolens wrote:
| > Customers care about the outcome, not the internal process
|
| Additionally, the business cares about the outcome, not the
| internal process.
|
| Ostensibly, the business should care about process but it
| actually doesn't matter as long as the product is just good
| enough to obtain/retain customers, and the people spending
| the money (managers) aren't incentivized to make costs any
| lower than previously promised (status quo).
| vasili111 wrote:
| >Engineering, however, was a disaster story. Code is horribly
| written and very few tests are maintained to make sure
| deployments go without issues. There was too much emphasis on
| deployment and getting fixes/features out over making sure it
| won't break anything else. It was a common scenario to release
| a new feature and put duct tape all around it to make sure it
| "works". And way too many operational issues. There are a lot
| of ways to break DynamoDB :)
|
| >Overall, though, the product is very solid and it's one of the
| few database that you can say "just works" when it comes to
| scalability and reliability (as most AWS services are)
|
| How those two can coexist?
| [deleted]
| badhombres wrote:
| I mean eventually enough duct tape can be solid like a tank
| :)
| jeffreygoesto wrote:
| Or a bridge...
|
| https://www.popularmechanics.com/science/a5732/mythbusters-
| b...
| bastardoperator wrote:
| Or a boat...
|
| https://flexsealproducts.com/products/flex-tape
| meepmorp wrote:
| You need the really good duct tape.
| 0xbadcafebee wrote:
| You throw bodies at it. A small bunch of people will be
| overworked, stressed, constantly fighting fires and
| struggling to fight technical debt, implement features, and
| keep the thing afloat. Production is always a hair away from
| falling over but luck and grit keeps it running. To the team
| it's a nightmare, to the business everything is fine.
| dustingetz wrote:
| literally every co
|
| if you want to know why capitalism causes this, start a
| startup and prioritize quality, do not get to market, do
| not raise money, do not pass go, watch dumpster fires with
| millions of betrayed and angry users raise their series d
| wnolens wrote:
| this is the answer.
|
| source: currently being burned out on an adjacent aws
| team..
| 0xbadcafebee wrote:
| That sucks, man. If they won't move you to another team,
| just get out of there. We don't benefit by suffering for
| them, and they're not gonna change.
| indogooner wrote:
| It sounds incredulous but I have heard similar things about
| Oracle. May be a large dev team can duct tape enough so that
| the product is solid.
| gfd wrote:
| You're probably thinking of this comment, oracle and 25
| million lines of c code:
| https://news.ycombinator.com/item?id=18442941
| eternalban wrote:
| They both likely have _solid_ 80% solutions (design) and
| incrementally cover the 20% gap as need arises. This in
| turn adds to operational complexity.
|
| Alternative would be to attempt a near 'perfect' solution
| for the product requirements and that may either hit an
| impossibility wall or may require substantial long term
| effort that would impede product development cycles. So
| likely the former approach is the smarter choice.
| AtlasBarfed wrote:
| So they just rolled out global replication, and I can't for the
| life of me figure out how they resolve write conflicts without
| cell timestamps or any other obvious CRDT measures.
|
| Questions were handwaved away, and the usual Amazon black box
| non-answers which always smells like they are hiding problems.
|
| Any ideas how this is working? It seems bolt-on and not well
| thought out, and I doubt they'll ever pay for Aphyr to put it
| through his torture tests.
| greiskul wrote:
| From: https://aws.amazon.com/dynamodb/global-tables/
|
| Consistency and conflict resolution
|
| Any changes made to any item in any replica table are
| replicated to all the other replicas within the same global
| table. In a global table, a newly written item is usually
| propagated to all replica tables within a second. With a
| global table, each replica table stores the same set of data
| items. DynamoDB does not support partial replication of only
| some of the items. If applications update the same item in
| different Regions at about the same time, conflicts can
| arise. To help ensure eventual consistency, DynamoDB global
| tables use a last-writer-wins reconciliation between
| concurrent updates, in which DynamoDB makes a best effort to
| determine the last writer. With this conflict resolution
| mechanism, all replicas agree on the latest update and
| converge toward a state in which they all have identical
| data.
| luhn wrote:
| Honestly your expectations are too high. Conflict resolution
| is row-level last-write-wins. It's not a globally distributed
| database, it's just a pile of regional DynamoDB tables duct
| taped together... They're not going to hire Aphyr for testing
| because there's nothing for him to test.
| vp8989 wrote:
| "And way too many operational issues."
|
| I've seen this kind of thing mentioned many times, pretty
| baffling TBH based on Dynamo's pretty good reputation in
| industry. Are these mostly to the stateless components of the
| product, or do they see data loss?
| salil999 wrote:
| I can't say in risk of violating some NDA but a lot of it is
| internal stuff that customers will never even be aware of or
| it would require too much effort for them to break.
|
| There are times when bad deployments happen and customers
| were impacted.
| digitalgangsta wrote:
| I've worked at 5 different tech companies now - this is par for
| course. And every single one, wished they could go back and do
| it again, but at that point the product was too successful so
| they ran with it.
| 0xbadcafebee wrote:
| I worked at a company who re-implemented the entire Dynamo
| paper and API, and it was exactly the same story. Completely
| eliminated all my illusions about the supposed superiority of
| distributed systems. It was a mound of tires held together with
| duct tape, with a tiki torch in each tire.
| yftsui wrote:
| Dynamo paper and DynamoDB are two very different things...
| AtlasBarfed wrote:
| Did they have a spare 100 million hanging around to burn?
| That seems pretty ridiculous. Why did they not just run
| cassandra?
| 0xbadcafebee wrote:
| They did have 100 million to burn, but my mostly-wild-guess
| is it was closer to $1.5M/yr. But that gives you an in-
| house SaaS DB used across a hundred other
| teams/products/services, so it actually saved money (and
| nothing else matched its performance/CAP/functionality).
|
| Cassandra is too opinionated and its CAP behavior wasn't
| great for a service like this, so they built on top of
| Riak. (This also eliminated any thoughts I had about Erlang
| being some uber-language for distributed systems, as there
| were (are?) tons of bugs and missing edge cases in Riak)
| staticassertion wrote:
| Erlang gives you great primitives for building reliable
| protocols, but they're just primitives, and there are
| tons of footguns since building protocols is hard.
| spookthesunset wrote:
| > Why did they not just run cassandra?
|
| Not Invented Here can run very deep in some branches of an
| organization. Depending on how engineering performance
| evaluations work, writing a homebrew database could totally
| be something that aligns with the company incentives. It
| might not make a single bit of sense from a business
| standpoint but hey, if the company rewards such behavior
| don't be surprised when engineers flush millions down the
| tube "innovating" a brand new wheel.
| sam0x17 wrote:
| It's a shame they don't open source it. It's funny too, being
| AWS they really don't have to worry about AWS running a cheaper
| service, so at that point why not open source it.
| [deleted]
| js4ever wrote:
| There is a compatible open source alternative here,
| https://www.scylladb.com/alternator/
| nexuist wrote:
| They probably view it as a competitive advantage that Azure
| or GCP would try to copy if they figured out the "secret
| sauce."
| staticassertion wrote:
| I kinda doubt it. It's probably just that open sourcing it
| won't provide much utility (I bet lots of code is aws
| specific) and just adds a new maintenance burden for them.
| sam0x17 wrote:
| Azure has Cosmos and Google has Datastore. They would
| never.
| blowski wrote:
| Azure has Cosmos which is arguably better than DynamoDB for
| a lot of use cases.
| dabfiend19 wrote:
| knocking off mongo... 10 years later they still haven't caught
| up.
| artembugara wrote:
| We realized how great Dynamo was only after we migrated off AWS.
|
| Dynamo was a key factor to us when we were releasing the MVP of
| our News API [0]. We used Dynamo, ElasticSearch, Lambda and could
| make it running in 60 days while being full-time employed.
|
| Also, the best tech talk I saw was given by Rick Houlihan on
| re:Invent [1]
|
| I highly recommend every engineer to watch it: it's a great
| overview of SQL vs NoSQL
|
| [0] https://newscatcherapi.com/blog/how-we-built-a-news-api-
| beta...
|
| [1] https://www.youtube.com/watch?v=HaEPXoXVf2k
| pier25 wrote:
| BTW Rick Houlihan left AWS recently to work for Mongo.
|
| https://twitter.com/houlihan_rick/status/1472969503575265283
|
| On that thread he criticizes AWS regarding DynamoDB openly.
|
| > _I will always love DynamoDB, but the fact is it is losing
| ground fast because AWS focuses most of their resources on the
| half baked #builtfornopurpose database strategy. I always hated
| that idea, I just bit my tongue instead of saying it._
|
| > _The problem is the other half-baked database services that
| all compete for the same business. DocumentDB, Keyspaces,
| Timestream, Neptune, etc. Databases take decades to optimize,
| the idea that you can pump them out like web apps is silly._
|
| > _I was very tired of explaining over and over again that
| DynamoDB is actually not the dumbed down Key-Value store that
| the marketing message implied. When AWS created 6 different
| NoSQL databases they had to make up reasons for each one and
| the messaging makes no sense._
| manishsharan wrote:
| Quite a few of the teams that were early adopters of AWS DynamoDB
| were not prepared for the pricing nuances that had to be taken
| into consideration when building their solutions.
| xtracto wrote:
| I remember trying Dynamodb around 2015/2016: You had to specify
| your expected read and write throughout and you would be billed
| for that. At that time we had a pretty spikey traffic use case
| which made using dynamodb efficiently impossible
| musingsole wrote:
| I had a similar experience, but ultimately wrote a service to
| monitor our workloads and request increased provisioning
| during spikes. You could reduce your provisioning like 10
| times a day, but after that you could only increase it and
| would be stuck with the higher rate for a time.
|
| And then on-demand provisioning was released and it was cheap
| enough to be worth simplifying our workflows.
| jvanvleet wrote:
| I was one of these. However I now understand that the pricing
| nuances reflected a reality that I appreciate. We used DDB in a
| way that was not the best fit and the cost was a reflection of
| this.
| balls187 wrote:
| DynamoDB is one of my favorite AWS products.
| victor106 wrote:
| We tried to implement an application on DynamoDB about 2 years
| ago.
|
| We really struggled with implementing adhoc queries/search. For
| e.g:- select * from employees where name = X and city = Y.
|
| Any improvements in DynamoDB that make it easier to implement
| such queries?
| manigandham wrote:
| DynamoDB (and other dynamo-like systems like Cassandra,
| Bigtable) are just advanced key/value stores. They support
| multiple levels of keys->values but fundamentally you need the
| key to find the associated value.
|
| If you want to search by parameters that aren't keys then you
| need to store your data that way. Most of these systems have
| secondary indexes now, and that's basically what they do for
| you automatically in the backend, storing another copy of your
| records using a different key.
|
| If you need adhoc relational queries then you should use a
| relational database.
| ignoramous wrote:
| > _If you want to search by parameters that aren 't keys then
| you need to store your data that way._
|
| Not that I recommend it, but by using space-filling curves,
| one could to index multiple dimensions onto DynamoDB's bi-
| dimensional (hash-key, range-key) primary-index:
| https://aws.amazon.com/blogs/database/z-order-indexing-
| for-m... and https://web.archive.org/web/20220120151929/https
| ://citeseerx...
| Jupe wrote:
| DynamoDB is not meant for ad-hoc query patterns; as others have
| said, plan your indexes around your access patterns.
|
| However, so long as you add a global secondary index (GSI) with
| name, city as the key, you can certainly do such things. But be
| aware for large-scale solutions:
|
| 1. There's a limit of 20 GSIs per table. You can increase with
| a call to AWS support.
|
| 2. GSIs are latently updated; read-after write is not
| guaranteed, and there is no "consistent read" option on a GSI
| like there is with tables.
|
| 3. WCUs on GSIs should match (or surpass) the WCUs on the
| original table, else throughput limit exceeded exceptions will
| occur. So, 3 GSIs on a table means you pay 4x+ in WCU costs.
|
| 4. The keys of the GSI should be evenly distributed, just like
| the PK on a main table. If not, there is additional opportunity
| for hot partitions on write.
|
| Ref: https://aws.amazon.com/premiumsupport/knowledge-
| center/dynam...
| [deleted]
| itsmemattchung wrote:
| With DynamoDB, you can now execute SQL queries using PartiQL:
|
| https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
| augustl wrote:
| Note that this is just a new syntax for the existing querying
| capabilities. If you query something that's not in the
| hash/sort key, you still need to filter on the "client" after
| the 1mb data set size limit etc.
| _hao wrote:
| I'm really happy that Cosmos DB has this -
| https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-
| que...
|
| I haven't used DynamoDB in a couple of years, so I'd be
| curious to know how querying compares if anyone can share
| some light that has used both Cosmos and Dynamo recently.
| unfunco wrote:
| I struggled at first but I watched Advanced Design Patterns for
| DynamoDB[0] a few times and it clicked. As other responses have
| suggested, generally you define your access patterns first and
| then structure the data later to fit those access patterns.
|
| [0]: https://www.youtube.com/watch?v=HaEPXoXVf2k
| garydevenay wrote:
| The most reliable way to build a system with DynamoDB is to
| plan queries upfront. Trying to use it like a SQL database and
| make Ad-Hoc queries won't work because it's not a SQL DB.
|
| Data should be stored in the fashion you wish for it to be
| read, and storing the same data in more than one configuration
| is acceptable.
|
| Good resource:
| https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
| jugg1es wrote:
| That's not what DynamoDB is for. If you need to run queries
| like that, you should be using RDBMS. DynamoDB should only
| really be used for use cases where the queries are known up-
| front. There are ways to design your data model in Dynamo so
| that you could actually run queries like that, but you would
| have had to that work from day 1. You won't be able to
| retroactively support queries like that.
| jon-wood wrote:
| This will sound flippant, but that's not what Dynamo is for. If
| you want to do freeform relational queries like that then put
| it in a relational database.
|
| Dynamo is primarily designed for high volume storage/querying
| on well understood data sets with a few query patterns. If you
| want to be able to query information on employees based on
| their name and city you'll need to build another index keyed on
| name and city (in practice Dynamo makes that reasonably simple
| by adding a secondary index).
| owenmarshall wrote:
| Alternatively, practice single table design: structure your
| table keys in such a way that they can represent all (or at
| least most) of the queries you need to run.
|
| This is often easier said than done, but it can be far less
| expensive and more performant than adding an index for each
| search.
| ellimilial wrote:
| It's always great fun compounding new, manual 'indexes'
| when you discover you need another query.
| willcipriano wrote:
| Amazon has a perfect use case for this. You click on a
| product in the search results, that url contains a UUID, that
| UUID is used to search Dynamo and returns an object that has
| all the information on the product, from that you build the
| page.
|
| If what you are trying to do looks more like "Give me all the
| customers that live in Cuba and have spent more than $10 and
| have green eyes", Dynamo isn't for you. You can query that
| way but after you put all the work in to get it up and
| running, you'd probably be better off with Postgres.
| 8note wrote:
| If that's one of 12 or less query patterns you need, I can
| write you a simple dynamo table for it. Dynamo's limitation
| is that it can only support n different query patterns, and
| you have to hand craft an index for each one(well,
| sometimes you can get multiple on one index)
| gigatexal wrote:
| If you follow Rick Houlihan (@houlihan_rick) then all the
| accolades that AWS for DynamoDB pale in comparison to its current
| team and execution in that the company seems to not be investing
| in it so much so that Rick left to join MongoDB.
| vslira wrote:
| Man I love Rick's talks as much as anyone but let's be real, he
| likely left AWS not for his love of first class geographical
| indexes but because Mongo offered a giant pile of money for him
| to evangelize their tech. Though I have no doubts that he
| actually had a lot of reservations around Dynamo's DX before,
| he likely has some around mongodb but those won't be the bulk
| of his content
| gigatexal wrote:
| At his rank at AWS I don't know if money was such an issue.
| He strikes me as a person who cares deeply about the
| underlying tech. But I have no idea one way or the other.
| awsthro00945 wrote:
| I think I've seen you post something similar on r/aws about
| how Rick was "top DynamoDb person at AWS" (apologies if
| that wasn't you). I think you are overestimating Rick's
| "rank".
|
| I just looked him up (I had not heard of him before seeing
| his name mentioned on r/aws a few days ago) and he was an
| L7 TPM/Practice Manager in AWS's sales organization. That's
| not really a notably high position, and in the grand scheme
| of Amazon pay scales, isn't that high up. An L7 TPM gets
| paid about the same as, or sometimes less than, an L6
| software dev (L6 is "senior", which is ~5-10 years of
| experience).
|
| Also, him being in the sales org means he had practically
| nothing to do with the engineering of the service. AWS
| Sales is a revolving door of people. I mean no offense
| towards Rick (again, I didn't know him or even know of him
| before I read his name in a comment a few days ago), but I
| would not read anything at all into the fact that an L7
| Sales TPM left for another company.
| belter wrote:
| You never heard of Rick Houlihan? He is the 90% of
| DynamoDB Evangelism... At the same time you are able to
| this internal lookups? Do you work with DynamoDB?
|
| AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced
| Design Patterns for DynamoDB (DAT401)
| https://youtu.be/HaEPXoXVf2k
|
| AWS re:Invent 2019: [REPEAT 1] Amazon DynamoDB deep dive:
| Advanced design patterns (DAT403-R1)
| https://youtu.be/6yqfmXiZTlM
|
| AWS re:Invent 2020: Amazon DynamoDB advanced design
| patterns - Part 1 https://youtu.be/MF9a1UNOAQo
|
| AWS re:Invent 2020: Amazon DynamoDB advanced design
| patterns - Part 2 https://youtu.be/_KNrRdWD25M
|
| AWS re:Invent 2021 - DynamoDB deep dive: Advanced design
| patterns https://youtu.be/xfxBhvGpoa0
|
| Amazon DynamoDB | Office Hours with Rick Houlihan:
| Breaking down the design process for NoSQL applications
| https://www.twitch.tv/videos/761425806
| amzn-throw wrote:
| Do you expect the engineers on your team to know the top
| sales person at your company?
|
| This person might be responsible for the majority of
| evangelism and revenue for the company. Do you expect the
| SDEs to know about him?
|
| Again, no shot against against Rick - he is amazing,
| smart, technical, competent, and a deep owner.
|
| But the average SDE on the team won't know about these or
| watch these talks. There are too many deep internal
| engineering challenges to solve.
| belter wrote:
| Are you calling the person who did the core DynamoDB
| Technical Deep Dive sessions at reInvent, for the last 4
| years in a row, a sales person?
| amzn-throw wrote:
| What do you think Solutions Architects and Developer
| Advocates (between the two groups who do most Re:invent
| sessions) are?
|
| Hell, what do you think re:Invent is? It's a sales
| conference.
|
| In any company you have two groups of people: Those that
| build the product, and those that sell it. Ultimately,
| solutions architects and developer advocates are there to
| help sell the product.
|
| Of course Amazon is customer obsessed. And genuinely
| interested in ensuring customers have a good experience,
| and their technical needs are met - through education,
| support, and architectural guidance. But ultimately,
| that's what it is.
| gigatexal wrote:
| Maybe that was the problem. He cited that there was
| seemingly not enough effort in making DynamoDB better as
| evidenced by the many orthogonally very close other DBs
| that AWS promotes. If Rick was ears to the ground
| listening to customers and sending back feedback but it
| was falling on deaf ears that's enough ground for someone
| as high up and as influential and productive as him to
| leave. It also speaks to inner AWS turmoil at least at
| DynamoDB.
| amzn-throw wrote:
| Based on what I know, that's not the case.
|
| DDB is a steady ship. The explanation on
| https://news.ycombinator.com/item?id=30009611 is likely
| the best explanation. L7 TPMs make the same money as L6
| SDEs.
|
| Getting promoted to L8 - director - is a monumental
| effort and likely seemed much harder than pursuing a
| comprable position at MongoDB.
|
| Good for him for doing it, and for making Amazon take a
| long hard look at every way they failed in not keeping
| him.
| gigatexal wrote:
| was not me at r/aws
|
| unless he posts here about it we can't really know -- we
| can only speculate but I think he had a higher amount of
| influence than his title/rank might suggest. I think
| Rick's influence with respect to DynamoDB is akin to that
| of Kelsey Hightower's influence over k8s at Google.
| tybit wrote:
| For anyone else expecting this to be a paper given the domain
| name, it's not. It's a non technical interview with a couple of
| the original papers authors. Not bad, just not as exciting as I
| imagine a paper detailing what they've learnt from a distributed
| systems perspective etc operating Dynamo then DynamoDB for so
| long now.
| uvdn7 wrote:
| https://brooker.co.za/blog/2022/01/19/predictability.html This
| might be something you are looking for.
| mjb wrote:
| We don't have a paper on DynamoDB's internals (yet?), but
| here's a talk you might find interesting from one of the folks
| who built and ran DDB for a long time:
| https://www.youtube.com/watch?v=yvBR71D0nAQ
|
| And Doug Terry talking through the details of how DynamoDB's
| transaction protocol works:
| https://www.usenix.org/conference/fast19/presentation/terry
|
| If we did publish more about the internals of DDB, what would
| you be looking to learn? Architecture? Operational experience?
| Developer experience? There's a lot of material we could share,
| and it's useful to hear where people would like us to focus.
| pow_pp_-1_v wrote:
| All of it - architecture, operational experience, best
| practices etc.
| ldrndll wrote:
| Just want to second this. All of the above sounds really
| interesting to me!
| cmollis wrote:
| we use DynamoDB like a big hash table of s3 file locations.. we
| look up these locations via a key (at the time, it sounded like a
| pretty good use-case for it). I suppose we could have used some
| other managed redis or memcached thing, but being an AWS shop, it
| was, and is, pretty useful. I have to say, it's been pretty
| effortless to configure.. read/write units are really the only
| thing we've had to configure (other than the base index) The rest
| of it has been easy. It has about a 100 million entries that are
| read or written pretty quickly.
| Jach wrote:
| I remember talking to someone who was playing with AWS stuff
| for the first time and they had a similar architecture, using
| Dynamo for a lookup store. It still seems a bit odd to me
| though. It's been a long time since I've worked with the S3
| API, so maybe it just doesn't support the same sort of thing,
| but wouldn't it be nicer to just query S3 with some key and get
| back either the path/URL to render a link, or the content
| itself? Why the Dynamo intermediary? (And on the other side, if
| you don't need to render a link to serve the content, why not
| use Dynamo as the actual document store and skip S3? Storage
| cost?)
| 0xbadcafebee wrote:
| I haven't run into anyone who uses Dynamo for anything other than
| managing Terraform backend state locking. And I think that's
| still the best use case for it: you just want to store a couple
| random key-values somewhere and have more functionality than AWS
| Parameter Store. Trying to build anything large-scale with it
| will probably leave you wanting.
| thekozmo wrote:
| Indeed quite a journey. If you love DynamoDB and like open
| source, give Scylla a try (disclosure me==founder):
| https://www.scylladb.com/alternator/
| minaor wrote:
| nathanfig wrote:
| I wonder if DynamoDB would be met with less criticism had it
| simply been named Dynamo Document Store.
| afandian wrote:
| We're at early stages of planning an architecture where we
| offload pre-rendered JSON views of PostgreSQL onto a key value
| store optimised for read only high volume. Considering DynamoDB,
| S3, Elastic, etc. (We'll probably start without the pre-render
| bit, or store it in PostgreSQL until it becomes a problem).
|
| When looking at DynamoDB I noticed that there was a surprising
| amount of discussion around the requirement for provisioning,
| considering node read/write ratios, data characteristics, etc.
| Basically, worrying about all the stuff you'd have to worry about
| with a traditional database.
|
| To be honest, I'd hoped that it could be a bit more 'magic', like
| S3, and it AWS would take care of provisioning, scaling, sharding
| etc. But it seemed disappointingly that you'd have to focus on
| proactively worrying about operations and provisioning.
|
| Is that sense correct? Is the dream of a self-managing, fire-and-
| forget key value database completely naive?
| eknkc wrote:
| I believe it used to be static provisioning, you'd set the read
| and limit capacity beforehand. Then obviously there is
| autoscaling of those but it is still steps of capacity being
| provisioned.
|
| They now have a dynamic provisioning scheme, you simply don't
| care but it is more expensive so if you have predictible
| requirements it is still better to use static capacity
| provisioning. There is an option though.
|
| DynamoDB also requires the developer to know about its data
| storage model. While this is generally a good practice for any
| data storage solution, I feel like Dynamo requires a lot more
| careful planning.
|
| I also think that most of the best practices, articles etc
| apply to giant datasets with huge scale issues etc. If you are
| running a moderately active app, you probably can get away with
| a lot of stupid design decisions.
| paulgb wrote:
| My experience with dynamic provisioning has been that it is
| pretty inelastic, at least at the lower range of capacity.
| E.g. if you have a few read units and then try to export the
| data using AWS's cli client, you can pretty quickly hit the
| capacity limit and have to start the export over again. Last
| time, I ended up manually bumping the capacity way up,
| waiting a few minutes for the new capacity to kick in, and
| then exporting. Not what I had in mind when I wanted a
| serverless database!
| moduspol wrote:
| I understand it's not really your point, but if you're
| actually looking to export all the data from the table,
| they've got an API call you can give to have DynamoDB write
| the whole table to S3. This doesn't use any of your
| available capacity.
|
| https://docs.aws.amazon.com/amazondynamodb/latest/developer
| g...
|
| Beyond that, though, it's really not designed for that kind
| of use case.
| paulgb wrote:
| Ah, fair point. Somehow I didn't encounter that when I
| was trying to export, even though it existed at the time.
| But it would have solved my problem.
| restlake wrote:
| If you know and understand S3 pretty well, and you purely need
| to generate, store, and read materialized static views, I
| highly recommend S3 for this use case. I say this as someone
| who really likes working with DDB daily and understands the
| tradeoffs with Dynamo. You can always layer on Athena or
| (simpler) S3 Select later if a SQL query model is a better fit
| than KV object lookups. S3 is loosely the fire and forget KV DB
| you're describing IMO depending on your use case
| k__ wrote:
| After looking into solutions like Fauna, Upstash, and
| Planetscale I don't understand why anyone is bothering with DDB
| anymore.
|
| I read "the dynamodb book" and almost got a stroke. So much
| idiosyncrasies, for what?!
| manigandham wrote:
| Plenty of options already exist. DynamoDB has both autoscaling
| and serverless modes. AWS also has managed Cassandra (runs on
| top of DynamoDB) which doesn't need instance management.
|
| Azure has CosmosDB, GCP has Cloud Datastore/Firestore, and
| there are many DB vendors like Planetscale (mysql), CockroachDB
| (postgres), FaunaDB (custom document/relational) that have
| "serverless" options.
| lkrubner wrote:
| Exactly. This has been my experience with several AWS
| technologies. Like with their ElasticSearch service, where I
| had to constantly fine-tune various parameters, such as memory.
| I was curious why they couldn't auto-scale the memory, why I
| had to do that manually. There are several AWS services that
| should be a bit more magical, but they are not.
| Tehnix wrote:
| A lot of things that used to be a concern (hot partitions, etc)
| are not a concern anymore and most have been solved these days
| :)
|
| Put it on on-demand pricing (it'll be better and cheaper for
| you most likely), and it will handle any load you throw at it.
| Can you get it to throttle? Sure, if you absolutely blast it
| without ever having had that high of a need before (and it can
| actually be avoided[0]).
|
| You will need to understand how to model things for the NoSQL
| paradigm that DynamoDB uses, but that's a question of
| familiarity and not much else (you didn't magically know SQL
| either).
|
| My experience comes from scaling DynamoDB in production for
| several years, handling both massive IoT data ingestion in it
| as well as the user data as well. We were able to replace all
| things we _thought_ we would need a relational database for,
| completely.
|
| My comparison between a traditional RDS setup: - DynamoDB
| issues? 0. Seriously. Only thing you need to monitor is
| billing. - RDS? Oh boy, need to provision for peak capacity,
| need to monitor replica lags, need to monitor the Replicas
| themselves, constant monitoring and scaling of IOPS, suddenly
| queries get slow as data increases, worrying about indexes and
| the data size, and much more...
|
| [0]: https://theburningmonk.com/2019/03/understanding-the-
| scaling...
| jugg1es wrote:
| I do not recommend starting off with a decision to use DynamoDB
| before you have worked with it directly for some time to
| understand it. You could spend months trying to shoehorn your
| use case into it before realizing you made a mistake. That
| said, DynamoDB can be incredibly powerful and inexpensive tool
| if used right.
| rmbyrro wrote:
| I think this can be said about any technology, really...
| jugg1es wrote:
| Yea, probably, but it is _especially_ true for DynamoDB
| because it can initially appear as though your use cases
| are all supported but that is only because you haven 't
| internalized how it works yet. By the time you realize you
| made a mistake, you are way too far in the weeds and have
| to start over from scratch. I would venture that more than
| 50% of DynamoDB users have had this happen to them early
| on. Anecdotally, just look at the comments on this post.
| There are so many horror stories with DynamoDB, but they're
| basically all people who decided to use it before they
| really understood it.
| nesarkvechnep wrote:
| If it's possible in your situation, instead of vendor lock-in,
| invest in cacheability of your service and leverage HTTP cache
| as much as possible.
| Marazan wrote:
| DyanmoDB is pretty much the opposite of magic.
|
| It is a resource that can often be the right tool for the job
| but you really have to understand what the job is and carefully
| measure Dynamo up for what you are doing.
|
| It is _easy_ to misunderstand or miss something that would make
| Dynamo hideously expensive for your use case.
| uberdru wrote:
| What use cases would likely make it hideously expensive, in
| your view? Like, what are the red flags?
| Marazan wrote:
| Hot keys are the primary one. They destroys your "average"
| calculations for your throughput.
|
| Bulk loading data is the other gotcha I've run into. Had a
| beautiful use case for steady read performance of a batch
| dataset that was incredibly economical on Dynamo but the
| cost/time for loading the dataset into Dynamo was totally
| prohibitive.
|
| Basically Dynamo is great for constant read/write of very
| small, randomly distributed documents. Once you are out of
| thay zone things can hey dicey fast.
| rmbyrro wrote:
| Hot keys are much lesser of an issue nowadays. It'd been a
| big one in old DDB architectures.
|
| I'd say requiring scans or filters as opposed to queries is
| one of the biggest issues that can bite your pocket.
|
| Think carefully about how you'll access your data later.
| You won't be able to change it drastically and cheaply
| later.
| aneil wrote:
| Exactly my experience. I got sucked into using more than once,
| thinking it would be better next time, but there are just so
| many sharp edges.
|
| At one company, someone accidentally set the write rate rate
| high to transfer data into the db. This had the effect of
| permanently increasing the shard count to a huge number,
| basically making the DB useless.
| gonzo41 wrote:
| There's not really magic with s3, you still need to name things
| with coherrent prefixes to spread around the load.
|
| DynamoDB is almost simple enough to learn in a day. And if
| you're doing nothing with it, you're only really paying for
| storage. Good luck with your decisions.
| ralusek wrote:
| S3 naming no longer matters for performance. Rejoice.
| PaywallBuster wrote:
| Prefixes are not needed 90% of use cases
| brodouevencode wrote:
| I'm not going to speculate on the accuracy of 90% value,
| but I will say that appropriately prefixed objects
| substantially help with performance when you have tons of
| small-ish files. Maybe most orgs don't have that need but
| in operational realms doing this with your logs make the
| response faster.
| snorkel wrote:
| If you don't need data persistence then consider redis instead
| (which can also do persistence if you enable AOF)
| amzn-throw wrote:
| The key benefit with DDB is predictability:
| https://brooker.co.za/blog/2022/01/19/predictability.html
|
| Yes, you have to learn about all these things upfront. But once
| you figure it out, test it, and configure it - it will work as
| you expect. No surprises.
|
| Whereas Relational Databases work until they don't. A developer
| makes a tiny (even a no-op) change to a query or stored
| procedure, a different SQL plan gets chosen, and suddenly your
| performance/latency dramatically reduces, and you have no easy
| way to roll it back through source control/deployment
| pipelines. You have to page a DBA who has to go pull up the
| hood.
|
| With services like DDB, you maintain control.
| redwood wrote:
| Your example really summarizes the challenge with the AWS
| paradigm: namely that they want you to believe that the thing
| to do is to spread the the backend of your application across a
| large number of distinct data systems. No one uses DynamoDB
| alone: they bolt it onto Postgres after realizing they have
| availability or scale needs beyond what a relational database
| can do, then they bolt on Elasticsearch to enable querying, and
| then they bolt on Redis to make the disjointed backend feel
| fast. And I'm just talking operational use cases; ignoring
| analytics here. Honestly it doesn't need to be these particular
| technologies but this is the general phenomenon you see in so
| many companies that adopt a relational database, key/value
| store (could be Cassandra instead of DynamoDB eg like what
| Netflix does), a search engine, and a caching layer because
| they think that that's the only option
|
| This inherently leads to a complexity debt explosion,
| fragmentation in the experience, and an operationally brittle
| posture that becomes very difficult to dig out of (this is
| probably why AWS loves the paradigm).
| ndm000 wrote:
| > they bolt it onto Postgres
|
| I am working with a company that is redesigning an enterprise
| transactional system, currently backed by an Oracle database
| with 3000 tables. It's B2B so loads are predictable and are
| expected to grow no more than 10% per year.
|
| They want to use DynamoDB as their primary data store, with
| Postgres for edge cases it seems to me the opposite would be
| more beneficial.
|
| At what point does DynamoDB become a better choice than
| Postgres? I know that at certain scales Postgres breaks down,
| but what are those thresholds?
| picardo wrote:
| You can make Postgres scale, but there is an operational
| cost to it. DynamoDB does that for you out of the box. (So
| does Aurora, to be honest, but there is also an overhead to
| setting up an Aurora cluster to the needs of your
| business.)
|
| I've found also that in Postgres the query performance does
| not keep up with bursts of traffic -- you need to
| overprovision your db servers to cope with the highest
| traffic days. DynamoDB, in contrast, scales instantly.
| (It's a bit more complicated that that, but the effect of
| it is nearly instantaneous.) And what's really great about
| DynamoDB is after the traffic levels go down, it does not
| scale down your table and maintains it at the same capacity
| at no additional cost to you, so if you receive a burst of
| traffic at the same throughput, you can handle it even
| faster.
|
| DynamoDB does a lot of magic under the hood, as well. My
| favorite is auto-sharding, i.e. it automatically moves your
| hot keys around so the demand is evenly distributed across
| your table.
|
| So DynamoDB is pretty great. But to get the the best
| experience from DynamoDB, you need to have a stable
| codebase, and design your tables around your access
| patterns. Because joining two tables isn't fun.
| rmbyrro wrote:
| Using +1 DynamoDB table is a bad idea in the first place.
| eropple wrote:
| _> So DynamoDB is pretty great. But to get the the best
| experience from DynamoDB, you need to have a stable
| codebase, and design your tables around your access
| patterns. Because joining two tables isn 't fun._
|
| More than just joining--you're in the unenviable place of
| reinventing (in most environments, anyway) a _lot_ of
| what are just online problems in the SQL universe. Stuff
| you 'd do with a case statement in Postgres becomes some
| on-the-worker shenanigans, stuff you'd do with a
| materialized view in Postgres becomes a batch process
| that itself has to be babysat and managed and introduces
| new and exciting flavors of contention.
|
| There are really good reasons to use DynamoDB out there,
| but there are also an absolute ton of land mines. If your
| data model isn't _trivial_ , DynamoDB's best use case is
| in making faster subsets of your data model that you can
| _make_ trivial.
| vosper wrote:
| They should be looking at Aurora, not Dynamo. Using Dynamo
| as the primary store for relational data (3000 tables!)
| sounds like an awful idea to me. I'd rather stay on Oracle.
|
| https://aws.amazon.com/rds/aurora/?aurora-whats-new.sort-
| by=...
| rmbyrro wrote:
| It really depends much more on the access patterns than
| data shape.
|
| Certain access patterns can do pretty well with 3,000
| relational tables denormalized to a single DynamoDB
| table.
| sebastialonso wrote:
| > they bolt it onto Postgres after realizing they have
| availability or scale needs beyond what a relational database
| can do, then they bolt on Elasticsearch to enable querying,
| and then they bolt on Redis to make the disjointed backend
| feel fast.
|
| This made my head explode. Why would you explicitly join two
| systems made to solve different issues together? This sounds
| rather like a lack of architectural vision. Postgres's zero
| access-design inherently clashes with DynamoDB's; same goes
| with ElasticSearch scenario: DynamoDB's was not made to query
| everything, it's made to query specifically what you designed
| to be queried and nothing else. Redis sort-of make sense to
| gain a bit of speed for some particular access, but you still
| lack collection level querying with it.
|
| In my experience, leave DynamoDB alone and it will work
| great. Automatic scaling is cheaper eventually if you've done
| your homework about knowing your traffic.
| 300bps wrote:
| _In my experience, leave DynamoDB alone and it will work
| great._
|
| My experience agrees with yours and I'm likewise puzzled by
| the grandparent comment. But just a shout out to DAX
| (DyanmoDB Accelerator) which makes it scale through the
| roof:
|
| https://aws.amazon.com/dynamodb/dax/
| jamesblonde wrote:
| If you add DAX you are not guaranteed to read your
| writes. Terrible consistency model. https://docs.aws.amaz
| on.com/amazondynamodb/latest/developerg...
| 300bps wrote:
| _Terrible consistency model._
|
| Judging a consistency model as "terrible" implies that it
| does not fit any use case and therefore is objectively
| bad.
|
| On the contrary, there are plenty of use cases where
| "eventually consistent writes" is the perfect use case.
| To judge this as true, you only have to look and see that
| every major database server offers this as an option -
| just one example:
|
| https://www.compose.com/articles/postgresql-and-per-
| connecti...
| tmitchel2 wrote:
| You choose your consistency on reads. However, Dax won't
| help you much on a write heavy workload.
| rmbyrro wrote:
| I think main advantage of DDB is being serverless. Adding
| a server-based layer on top of it doesn't make sense to
| me.
|
| I have a theory it would be better to have multiple
| table-replicas for read access. At application level, you
| randomize access to those tables according to your read
| scale needs.
|
| Use main table streams and lambda to keep replicas in
| sync.
|
| Depending on your traffic, this might end more expensive
| than DAX, but you remain fully serverless, using the
| exact same technology model, and have control over the
| consistency model.
|
| Haven't had the chance to test this in practice, though.
| SomeCallMeTim wrote:
| In my experience, NoSQL is almost never the right answer.
|
| And DynamoDB is worse than most.
|
| My prediction is that the future is in scalable SQL;
| CockroachDB or Yugabase or similar.
|
| NoSQL actually causes more problems than it solves, in my
| experience.
| tokamak-teapot wrote:
| We use DynamoDB alone. Microservices generally use one or two
| tables each.
| awsthro00945 wrote:
| >No one uses DynamoDB alone
|
| Almost every single team at Amazon that I can think of off
| the top of my head uses DynamoDB (or DDB + S3) as its sole
| data store. I know that there _are_ teams out there using
| relational DBs as well (especially in analytics), but in my
| day-to-day working with a constantly changing variety of
| teams that run customer-facing apps, I haven 't seen
| RDS/Redis/etc being used in months.
| goostavos wrote:
| The thing about Amazon is that it is _massive_. In my neck
| of the woods, I 've got the complete opposite experience.
| So many teams have the exact DDB induced infrastructure
| sprawl as described by the GP (e.g. supplemental RDBMS,
| Elastic, caching layers, etc..).
|
| Which says nothing of DDB. It's an god-tier tool if what
| you need matches what it's selling. However, I see too many
| teams reach for it by default without doing any actual
| analysis (including young me!), thus leading to the "oh
| shit, how will we...?" soup of ad-hoc supporting infra. Big
| machines look great on the promo-doc tho. So, I don't
| expect it to stop.
| [deleted]
| jerf wrote:
| It seems to me that what this is saying is that storage has
| become so cheap that if another database provides even slight
| advantages over another for some workload it is likely to be
| deployed and have all the data copied over to it.
|
| HN entrepreneurs take note, this also suggests to me that
| there _may_ be a market for a database (or a "metadatabase")
| that takes care of this for you. I'd love to be able to have
| a "relational database" that is also some "NoSQL" databases
| (since there's a few major useful paradigms there) that just
| takes care of this for me. I imagine I'd have to declare my
| schemas, but I'd love it if that's all I had to do and then
| the DB handled keeping sync and such. Bonus points if you can
| give me cross-paradigm transactionality, especially in terms
| of coherent insert sets (so "today's load of data" appears in
| one lump instantly from clients point of view and they don't
| see the load in progress).
|
| At least at first, this wouldn't have to be best-of-breed
| necessarily at anything. I'd need good SQL joining support,
| but I think I wouldn't need every last feature Postgres has
| ever had out of the box.
|
| If such a product exists, I'm all ears. Though I am thinking
| of this as a unified database, not a collection of databases
| and products that merely manages data migrations and such.
| I'm looking to run "CREATE CASSANDRA-LIKE VIEW gotta_go_fast
| ON SELECT a.x, a.y, b.z FROM ...", maybe it takes some time
| of course but that's all I really have to do to keep things
| in sync. (Barring resource overconsumption.)
| jgraettinger1 wrote:
| > I'd love to be able to have a "relational database" that
| is also some "NoSQL" databases (since there's a few major
| useful paradigms there) that just takes care of this for
| me. I imagine I'd have to declare my schemas, but I'd love
| it if that's all I had to do and then the DB handled
| keeping sync and such.
|
| You might be interested in what we're building [0]
|
| It synchronizes your data systems so that, for example, you
| can CDC tables from your Postgres DB, transform them in
| interesting ways, and then materialize the result in a view
| within Elastic or DynamoDB that updates continuously and
| with millisecond latency.
|
| It will even propagate your sourced SQL schemas into JSON
| schemas, and from there to, say, equivalent Elastic Search
| schema.
|
| [0]: https://github.com/estuary/flow
| andy_ppp wrote:
| Postgres with Cassandra built in and scaled separately
| would be really great.
| grncdr wrote:
| I think there was a project like this a few years ago
| (wrapping a relational DB + ElasticSearch into one box) and
| I _thought_ it was CrateDB, but from looking at their
| current website I think I 'm misremembering.
|
| The concept didn't appeal to me very much then, so I never
| looked into it further.
|
| ---
|
| To address your larger point, I think Postgres has a better
| chance of absorbing other datastores (via FDW and/or custom
| index types) and updating them in sync with it's own
| transactions (as far as those databases support some sort
| of atomic swap operation) than a new contender has of
| getting near Postgres' level of reliability and feature
| richness.
| neuronexmachina wrote:
| Were you thinking of ZomboDB?
| https://github.com/zombodb/zombodb
| rmbyrro wrote:
| I'm afraid it's not feasible to develop a single general
| purpose implementation for that.
|
| The amount of complexity to guarantee data integrity while
| covering all possible use cases will be just unmanageable.
|
| I'd be extremely happy to be proven wrong, though...
| mwarkentin wrote:
| AWS tried building this with Glue Elastic Views:
| https://aws.amazon.com/glue/features/elastic-views/
|
| It's been in preview forever though, not sure when it's
| going to officially launch.
| nine_k wrote:
| If this is not the only option, what would you suggest
| instead? How to simplify it?
| urthor wrote:
| The alternative is to go to GCP and use the big GCP selling
| point, which is Big Table/Big Query.
|
| Those databases build most of that in, and it's all one
| fairly excellent distributed monolith.
| onlyrealcuzzo wrote:
| Wouldn't Spanner be closer to what you're talking about?
| Keyframe wrote:
| It's still a marriage.
| whalesalad wrote:
| Dynamo is incredibly hard to use _correctly_
|
| I'd urge you to start writing a prototype, a lot of your
| assumptions might get thrown out the window. Dynamo is not
| necessarily good for reading high volume. You'll end up needing
| to use a parallel scan approach which is not fast.
| qvrjuec wrote:
| I'd say Dynamo is extremely good at reading high volume, with
| the appropriate access pattern. It's very efficient at
| retrieving huge amounts of _well partitioned_ data using the
| data 's keys, but scanning isn't so efficient.
| mythrwy wrote:
| Also can be _very_ expensive if you do not use it correctly.
| qaq wrote:
| Save yourself a ton of pain and don't use DynamoDB
| jedberg wrote:
| > Is the dream of a self-managing, fire-and-forget key value
| database completely naive?
|
| It's not, if you plan it right. Learn about single table design
| for DynamoDB before you start. There are a lot of good
| resources from Amazon and the community.
|
| Here is a very accessible video from the community:
|
| https://www.youtube.com/watch?v=BnDKD_Zv0og
|
| Here is a video from Rick Houlihan, a senior leader from AWS
| who basically helps companies convert to single table design:
|
| https://www.youtube.com/watch?v=KYy8X8t4MB8
|
| And a good book on the topic:
|
| https://www.dynamodbbook.com
|
| If you use single table design, you can turn on all of the
| auto-tuning features of DynamoDB and they will work as expected
| and get better and more efficient with more data.
|
| Some people worry that this breaks the cardinal rule of
| microservices: One database per service. But the actual rule is
| never have one service directly access the data of another,
| always use the API. So as long as your services use different
| keyspaces and never access each other's data, it can still work
| (but does require extra discipline).
| phamilton wrote:
| I don't know your scaling needs, but I would highly recommend
| just using Aurora postgresql for read-only workloads. We have
| some workloads that are essentially K/V store lookups that were
| previously slated for dynamodb. On an Aurora cluster of
| 3*r6g.xlarge we easily handle 25k qps with p99 in the single-
| digit ms range. Aurora can scale up to 15 instances and up to
| 24xlarge, so it would not be unreasonable to see 100x the read
| workload with similar latencies.
|
| Happy to talk more. We're actively moving a bunch of workloads
| away from DynamoDB and to Aurora so this is fresh on our minds.
| afandian wrote:
| Thanks, that's what I _hope_ will work. I might drop you a
| mail at some point.
| ignoramous wrote:
| > _We 're at early stages of planning an architecture where we
| offload pre-rendered JSON views of PostgreSQL onto a key value
| store optimised for read only high volume._
|
| If possible, put the json in Workers KV, and access it through
| Cloudflare Workers. You can also optionally cache reads from
| Workers KV into Cloudflare's zonal caches.
|
| > _To be honest, I 'd hoped that it could be a bit more
| 'magic', like S3_
|
| You could opt to use the slightly more expensive DynamoDB On-
| Demand, or the free DynamoDB Auto-Scaling modes, which are
| relatively no-config. For a _very_ ready-heavy workload, you 'd
| probably want to add DynamoDB Accelerator (an write-through in-
| memory cache) in front of your tables. Or, use S3 itself (but a
| S3 bucket doesn't really like when you load it with a _tonne_
| of small files) accelerated by CloudFront (which is what AWS
| Hyperplane, tech underpinning ALB and NLB, does:
| https://aws.amazon.com/builders-library/reliability-and-
| cons...)
|
| S3, much like DynamoDB, is a KV store:
| https://news.ycombinator.com/item?id=11161667 and
| https://www.allthingsdistributed.com/2009/03/keeping_your_da...
| _pdp_ wrote:
| DynamoDB is like S3 but with query features. It is not a
| relational db. It is a document storage. So you need to use it
| for what it is.
|
| Our entire solution is basically based on top of lambda and
| dynamodb tables and it works really as long as you don't threat
| the tables like SQL.
| zurn wrote:
| Your impressions are cordect: DynamoDB is quite low-level and
| more like a DB kit than ready to use DB, for most applications
| it's better to use something else.
| giaour wrote:
| If you use the "pay per request" billing model instead of
| provisioned throughput, DynamoDB scaling is self-managing, and
| you can treat your DB as a fire-and-forget key/value store. You
| need to plan how you'll query your data and structure the keys
| accordingly, but honestly, that applies even more to S3 than it
| does to Dynamo.
| pbalau wrote:
| Thinking like this both baffles me, but also makes me happy
| because there will always be a need for people like me, infra.
| AWS is not a magical tool that will replace your infra team, it
| is a magical tool that will allow your infra team to do more. I
| am the infra team of my startup and I estimate that only 50% of
| my time is doing infra work. The rest is supporting my peers,
| work in frameworky stuff, solve dev efficiency issues bla bla.
|
| Lets say that you operate in an AWS-less environment, with
| everything bare metal, in a datacenter. Your GOOD infra team
| has to do the following:
|
| Hardware:
|
| - make sure there is a channel to get new hardware, both for
| capacity increase and spares. What are you going to do? Buy 1
| server and 2 spares? If one of the servers has an issue, isn't
| it quite likely that the other servers, from the same batch, to
| have the same issue? Is this affecting you, or not? Where do
| you store the spares? In a warehouse somewhere, making it
| harder to deploy? In the rack with the one in use, wasting
| rackspace/switch space? Are you going to rely on the datacenter
| to provide you with the hardware? What if you are one of their
| smaller customers and your requests get pushed back because
| some larger customer requests get higher priority?
|
| - make sure there is a way to deploy said hardware. You don't
| want to not be able to deploy a new server because there is no
| space in the rack, or no space in the switch. Where are your
| spares? In a warehouse miles away from the datacenter? Do you
| have access to said warehouse at midnight, on Thanksgiving? Oh
| shit, someone lost the key to your rack! Oh noes, we don't have
| any spare network cable/connectors/screws...
|
| Software:
|
| - did you patch your servers? did you patch your switches?
|
| - new server, we need to install the os. And a base set of
| software, including the agent we use to remote manage the
| server.
|
| - oh, we also need to run and maintain the management infra,
| say the control plane for k8.
|
| - oh, we want some read replicas for this db, not only we need
| the hardware to run the replicas on (and see above for what
| that means), now you need to add a bunch of monitoring and have
| plans in place to handle things like: replicas lagging, network
| links between master and replicas being full, failover for the
| above, master crapping out yada yada.
|
| I bet there are many other aspects I'm missing.
|
| Choices:
|
| Your GOOD infra team will have to decide things like: how many
| spares do we need, is the capacity we have atm enough for the
| launch of our next world-changing feature that half the
| internet wants to use? Are we lucky enough to survive a few
| months without spares or should we get estra capacity in
| another datacenter? Do we want to have replicas on the west
| coast or is the latency acceptable?
|
| These are the main areas of what an infra team is supposed to
| do: Hardware, Software and Choices. AWS (and most other cloud
| providers) is making the first 2 points non issues. For the
| last area you can do 2 things: get an infra team (could be a
| full fledged team, could be 1 person, you could do it) and
| teoretically you will get choices tailored to what your
| business needs OR let AWS do it for you. *AWS might make these
| choices based on a metric you disagree with and this is the
| main reason people complain*.
| AtlasBarfed wrote:
| The salespeople always promise magic and handwave CAP away.
|
| But data at scale is about:
|
| 1) knowing your queries ahead of time (since you've presumably
| reached the limit of PG/maybesql/o-rackle.
|
| 2) dealing with CAP at the application level: distributed
| transactions, eventual consistency, network partitions.
|
| 3) dealing with a lot more operational complexity, not less.
|
| So if the snake oil salesmen say it will be seamless, they are
| very very very much lying. Either that, or you are paying a LOT
| of money for other people to do the hard work.
|
| Which is what happens with managing your own NoSQL vs DynamoDB.
| You'll pay through the roof for DynamoDB at true big data
| scales.
| twodayrice wrote:
| I think this is a good summary, and it even gets more
| complicated if you start using the DAX cache. Your read/write
| provisioning for DAX is totally different than the underlying
| dynamodb tables. The write throughput for Dax is limited by the
| size of the master node in the cluster. Can you say bottleneck?
| augustl wrote:
| I have no direct experience with scaling DynamoDB in
| production, so take this with a grain of salt. But it seems to
| me that the on-demand scaling mode in DynamoDB has gotten
| _really_ good the last couple of years.
|
| For example, you used to have to manually set RCU/WCU to a high
| number when you expected a spike in traffic, since the ramp-up
| for on-demand scaling was pretty slow (could take up to 30
| minutes). But these days, on-demand can handle spikes from 10s
| of requests a minute to 100s/1000s per second gracefully.
|
| The downside of on-demand is the pricing - it's more expensive
| if you have continuous load. But it can easily become _much_
| cheaper if you have naturally spiky load patterns.
|
| Example: https://aws.amazon.com/blogs/database/running-spiky-
| workload...
| moduspol wrote:
| > The downside of on-demand is the pricing - it's more
| expensive if you have continuous load.
|
| True, although you don't have to make that choice
| permanently. You can switch from provisioned to on demand
| once every 24 hours.
|
| And you can also set up application autoscaling in
| provisioned mode, which'll allow you to set parameters under
| which it'll scale your provisioned capacity up or down for
| you. This doesn't require any code and works pretty well if
| you can accept autoscaling adjustments being made in the
| timeframe of a minute or two.
| PaywallBuster wrote:
| scaling down is limited to 4x a day
| chrisoverzero wrote:
| It's up to 27 times a day, if you time it well: "4
| decreases in the first hour, and 1 decrease for each of
| the subsequent 1-hour windows in a day".
| PaywallBuster wrote:
| gotcha, it's been awhile since I was looking at that
| tmitchel2 wrote:
| They upped it when they're own autoscaler needed the
| ability to back it down more :-/
| PaywallBuster wrote:
| Indeed
|
| We've some regular jobs that require scaling up dynamodb in
| advance few times per day, but then dynamo is only able to
| scale down 4x per day, so we're probably paying for over
| capacity unnecessarily (10x or more) for a couple hours a day
|
| Now we just moved ondemand and let them handle it, works fine
| jpgvm wrote:
| It is for now but it doesn't have to be. Dynamo's design isn't
| particularly amenable to dynamic and heterogenous shard
| topologies however.
|
| There could exist a fantasy database where you still tell it
| your hash and range keys, which are roughly how you tell the
| database which data isn't closely related to each other and
| which data is (and which you may want to scan) but instead of
| hard provisioning shard capacity it automagically splits shards
| when they hotspot and doesn't rely consistent hashing so that
| every shard can be sized differently depending on how hot it
| is.
|
| Right now such a database doesn't exist AFAICT as most places
| that need something the scales big enough also generally have
| the skill to avoid most of the pitfalls that cause problems on
| simple databases like Dynamo.
| [deleted]
| jugg1es wrote:
| Every system I've built on DynamoDB just works. The APIs that use
| it have had virtually 100% uptime and have never needed database
| maintenance. It is not a replacement for a RDBMS, but for some
| use cases, it's a killer service.
| didip wrote:
| To be honest, as a customer, it is hard for me to justify using
| DynamoDB. Some of this criticism can be out of date:
|
| 1. DynamoDB is not as convenient. There are a bit too many dials
| to turn.
|
| 2. DynamoDB does not have a SQL facade on top.
|
| 3. DynamoDB is proprietary, I believe there's no OSS API
| equivalent if you want to migrate out.
|
| 4. DynamoDB was kind of expensive. But it has been a while since
| I last check the pricing page.
|
| It's simply much better to start with PostgreSQL Aurora and move
| to a more scalable storage based on specific uses-cases later.
| For example: Cassandra, Elastic, Druid, or CockroachDB.
| Zimnx wrote:
| Ad 3. Lot of people don't know about it, but there's a open
| source, free, and DynamoDB compatibile databse called ScyllaDB
| - API it's called Alternator to be specific.
| nathanfig wrote:
| That's surprising to me, I consider DynamoDB to be far simpler
| than any relational DB including the alternatives you list
| [deleted]
| arecurrence wrote:
| I strongly agree that most early stage businesses should be on
| Postgres. There's simply too much churn in early stage data
| models. Also, unforeseen esoteric needs jumping out of the wood
| work that you can knock out a SQL query for instead of having
| to build a solution come up constantly. However, this does
| assume that your development team has a competent understanding
| of SQL.
|
| I've been in a couple startups that went Dynamo first and
| development velocity was a pale shadow of velocity with
| Postgres. When one of those startups dumped dynamo for Postgres
| velocity multiplied immediately. I'd estimate we were moving at
| around 1000% and the complete transition took less time than
| even I expected (about a month). Once the business matures,
| moving tables onto dynamo and wrapping them in a microservice
| makes a lot of sense. Dynamo does solve a lot of problems that
| become increasingly material as the business evolves.
|
| Eventually, SQL's presence declines and transitions into an
| analytics system as narrower, but easier for ops, options
| proliferate.
| sakopov wrote:
| We landed on DynamoDB when we migrated a monolith to microservice
| architecture. I have to say that DynamoDB fits fairly well in the
| microservices world where the service scope is small, query
| patterns are pretty narrow and don't really change much. Building
| new things using DynamoDB when query patterns aren't necessarily
| known is very painful and require tedious migration strategies
| unless you don't mind paying for GSIs.
| ddoolin wrote:
| As a developer, I really have a love-hate relationship with
| Dynamo. I love how fast and easy it is to setup and get rolling.
|
| The partitioning scheme came off as confusing and opaque but I
| think that says more about Amazon's documentation than the scheme
| itself.
|
| I do not like that there's no really third party tooling
| integration to be able to query. Their UI in the console is _so
| freaking terrible_ yet you have no other way than code to query
| it. This problem is so bad that I will avoid using it where I can
| despite it being a good option, performance-wise.
| ejb999 wrote:
| My experience exactly.
| [deleted]
| rmbyrro wrote:
| Don't think this is a fair reason to avoid DynamoDB. There are
| reasonably better alternatives:
|
| - DynamoDB Workbench (free, AWS official):
| https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
|
| - Dynobase (paid, third-party): https://dynobase.dev/
| belter wrote:
| Plus there is the local install ( Only for Dev and Testing
| purposes...)
|
| "Deploying DynamoDB Locally on Your Computer" https://docs.aw
| s.amazon.com/amazondynamodb/latest/developerg...
| wnolens wrote:
| Yes the console UI is mind numbing, and was just made 10x worse
| in a recent redesign.
|
| But i like the python boto3 library
| https://boto3.amazonaws.com/v1/documentation/api/latest/guid...
|
| Build yourself a few wrappers to make querying more convenient,
| and i query straight from a python repl pretty effectively
| anonymousDan wrote:
| Can anyone recommend me a good paper (or other resource)
| describing the designs of DynamoDB and S3? Ideally something in
| the spirt of the original Dynamo paper (i.e.
| https://www.allthingsdistributed.com/files/amazon-dynamo-sos... )
| schwarzmx wrote:
| This one is pretty good for DynamoDB:
| https://youtu.be/yvBR71D0nAQ
___________________________________________________________________
(page generated 2022-01-20 23:01 UTC)