[HN Gopher] Diving Deep on S3 Consistency
       ___________________________________________________________________
        
       Diving Deep on S3 Consistency
        
       Author : themarkers
       Score  : 119 points
       Date   : 2021-04-28 12:36 UTC (10 hours ago)
        
 (HTM) web link (www.allthingsdistributed.com)
 (TXT) w3m dump (www.allthingsdistributed.com)
        
       | [deleted]
        
       | whydoineedthis wrote:
       | I'm confused...did you fix the caching issue in S3 or not?
       | 
       | The article seems to explain why there is a caching issue, and
       | that's understandable, but it also reads as if you wanted to fix
       | it. I would think the headliner and bold font if it was actually
       | fixed.
       | 
       | For those curious, the problem is that S3 is "eventually
       | consistent", which is normally not a problem. But consider a
       | scenario where you store a config file on S3, update that config
       | file, and redeploy your app. The way things are today you can
       | (and yes, sometimes do) get a cached version. So now there would
       | be uncertainty of what was actually released. Even worse, some of
       | your redeployed apps could get the new config and others the old
       | config.
       | 
       | Personally, I would be happy if there was simply an extra fee for
       | cache-busting the S3 objects on demand. That would prevent folks
       | from abusing it but also give the option when needed.
        
         | tyingq wrote:
         | It is supposedly fixed.
         | 
         |  _" After a successful write of a new object, or an overwrite
         | or delete of an existing object, any subsequent read request
         | immediately receives the latest version of the object."_
         | 
         | https://aws.amazon.com/s3/consistency/
        
         | cldellow wrote:
         | It was fixed in December of 2020. Announcement blog post:
         | https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...
        
         | JDTech123 wrote:
         | In that example, do you not see using S3 for that purpose as
         | trying to use the wrong tool for the task at hand. Using AWS
         | SSM parameter store [0] seems to me that it would be a tool
         | designed to fit that purpose nicely.
         | 
         | [0] https://docs.aws.amazon.com/systems-
         | manager/latest/userguide...
        
           | nelsonenzo wrote:
           | Complex config files suck in paramstore. Also, I've used this
           | for mobile app configs that are pulled from s3, so paramstore
           | wouldn't be an option.
        
         | jeffbarr wrote:
         | Yes, see my December 2020 post at
         | https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-rea...
         | :
         | 
         | "Effective immediately, all S3 GET, PUT, and LIST operations,
         | as well as operations that change object tags, ACLs, or
         | metadata, are now strongly consistent. What you write is what
         | you will read, and the results of a LIST will be an accurate
         | reflection of what's in the bucket. This applies to all
         | existing and new S3 objects, works in all regions, and is
         | available to you at no extra charge! There's no impact on
         | performance, you can update an object hundreds of times per
         | second if you'd like, and there are no global dependencies."
        
           | nelsonenzo wrote:
           | Oh, awesome, I missed that!
        
           | nindalf wrote:
           | Thanks for the link, it made the change being talked about
           | clearer. However, I still don't understand how it was
           | achieved. The explanation in the link appears truncated -
           | lots of talk about the problem, then something about a cache
           | and that's it. Is there an alternate link that talks about
           | the mechanics of the change?
        
         | jasonpeacock wrote:
         | This is a general problem in all distributed systems, not just
         | when pulling configuration from S3.
         | 
         | Let's assume you had strong consistency in S3. If your app is
         | distributed (tens, hundreds, or thousands of instances running)
         | then all instances are not going to update at the same time,
         | atomically.
         | 
         | You still need to design flexibility into your app to handle
         | the case where they are not all running the same config (or
         | software) version at the same time.
         | 
         | Thus, once you've built a distributed system that is able to
         | handle a phased rollout of software/config versions (and
         | rollback), then having cache inconsistency in S3 is no big
         | deal.
         | 
         | If you really need atomic updates across a distributed system
         | then you're looking at more expensive solutions, like DynamoDB
         | (which does offer consistent reads), or other distributed
         | caches.
        
           | mgdev wrote:
           | The deeper in your stack you fix the consistency problem, the
           | simpler the rest of your system needs to be. If you use S3 as
           | a canonical store for some use case, that's pretty deep in
           | the stack.
           | 
           | > Thus, once you've built a distributed system that is able
           | to handle a phased rollout of software/config versions (and
           | rollback), then having cache inconsistency in S3 is no big
           | deal.
           | 
           | But this would also mean you can't use S3 as your source of
           | truth for config, which is precisely what a lot of people
           | want to do.
        
           | nelsonenzo wrote:
           | What I need is that when I make a call to a service, it gives
           | back consistent results. Ergo, when the app does do a rolling
           | deploy, it will get the right config on startup, not some
           | random version.
           | 
           | It looks like it does exactly that now, it just wasn't clear
           | from the article.
        
       | swyx wrote:
       | i find this very light on the actual "diving deep" part promised
       | in the title. theres a lot of self congratulatory chest thumping,
       | not a lot of technical detail. Werner of course doesnt owe us any
       | explanation whatsoever. i just dont find this particularly deep.
        
       | MeteorMarc wrote:
       | And for those who use minio server, the self hosted s3 storage,
       | that has strong consistency, too.
        
         | vergessenmir wrote:
         | I have wondered if anyone is using minio at really large scale,
         | or if there any examples of production use at its limits?
        
       | nhoughto wrote:
       | Would love a dive (hopefully deep) into IAM, the innards of that
       | must be some impressive wizardry. Surprising there isn't more
       | around about the amazing technical workings of these foundational
       | AWS products.
        
       | iblaine wrote:
       | Anyone else still seeing consistency problems w/S3 & EMR? The
       | latest AWS re:Invent made it sound like this would be fixed but
       | as of yesterday I was still using emrfs to correct S3 consistency
       | problems.
        
         | 8note wrote:
         | Yeah, yesterday as well
        
       | juancampa wrote:
       | Can someone elaborate on this Witness system OP talks about?
       | 
       | I'm picturing a replicated, in-memory KV store where the value is
       | some sort of version or timestamp representing the last time the
       | object was modified. Cached reads can verify they are fresh by
       | checking against this version/timestamp, which is acceptable
       | because it's a network+RAM read. Is this somewhat accurate?
        
         | skynet-9000 wrote:
         | I'm picturing the same, but my guess is that it's using a time-
         | synced serializability graph or MVCC in some way.
         | 
         | However, even a "basic" distributed lock system (like a
         | consistently-hashed in-memory DB, sharded across reliable
         | servers) might provide both the scale and single source of
         | truth that's needed. The difficulty arises when one of those
         | servers has a hiccup.
         | 
         | It'd be a delicious irony if it was based on hardware like an
         | old-school mainframe or something like that.
        
       | pawelmi wrote:
       | So it is both available and consistent (but perhaps only in read
       | your own writes way?). What is then with resilence to network
       | partitions, referring to CAP theorm? Did they build super
       | reliable global network, so this is never a real issue?
        
         | somethingAlex wrote:
         | The consistency level seems to be Causal Consistency, which
         | does include read-your-writes. S3 doesn't provide ACID
         | transactions, so stricter consistency models aren't really
         | needed.
         | 
         | From what I've read, if a network issue occurs which would
         | impair consistency, S3 sacrifices availability. The write would
         | just fail.
         | 
         | But this isn't your 5-node distributed system. Like they
         | mention in the article, the witness system can remove and add
         | nodes very quickly and it's highly redundant. A network issue
         | that would actually cause split-brain or make it difficult to
         | reach consensus would be few and far between.
        
       | rossmohax wrote:
       | Recent S3 consistency improvements are welcome, but S3 still
       | falls behind Google GCS until they support conditional PUTs.
       | 
       | GCS allows object to be replaced conditionally with `x-goog-if-
       | generation-match` header, which sometimes can be quite useful.
        
         | ithkuil wrote:
         | There is a conditional CopyObject though (x-amz-copy-source-
         | if...)
         | 
         | Can cover some of the use cases
        
           | ryanworl wrote:
           | Can you explain how this is useful? It seems like the
           | destination is the important thing here not the source.
        
         | ignoramous wrote:
         | Vogels spoke briefly about why AWS prefers versioned objects
         | instead here: https://queue.acm.org/detail.cfm?id=3434573
         | 
         | BTW, DynamoDB supports conditional PUTs if your data can fit
         | under 400 KiB.
        
           | skynet-9000 wrote:
           | Why would AWS provide a feature that makes additional
           | transaction and storage charges (as well as subsequent reads
           | to see which is the correct version) irrelevant?
        
             | the_reformation wrote:
             | S3 (and most AWS services) are extremely price elastic;
             | i.e., the lower you make them cost, the more people use it
             | (a la electricity.) That's why they've done stuff like drop
             | from 50ms billing to 1ms billing, etc.
        
               | ece wrote:
               | They could still offer it as a client library feature,
               | just tell the users what kind of r/w amplification and
               | guarantees they can expect, and it's something they could
               | optimize later or not.
        
           | CodesInChaos wrote:
           | How do versioned objects make conditional puts unnecessary? I
           | see little relation between them, except that you could use
           | the version identifier in the condition.
        
             | jjoonathan wrote:
             | Because they let AWS offload the hard part to you, which is
             | what AWS does best :)
        
               | ryeguy wrote:
               | This doesn't answer the question being asked.
        
         | [deleted]
        
       | crashocaster wrote:
       | I would have been interested to hear more about the verification
       | techniques and tools they used for this project.
        
         | jeffbarr wrote:
         | Check out https://cacm.acm.org/magazines/2015/4/184701-how-
         | amazon-web-... ("How Amazon Web Services Uses Formal Methods")
         | and
         | https://d1.awsstatic.com/Security/pdfs/One_Click_Formal_Meth...
         | ("One-Click Formal Methods") for more info.
        
         | [deleted]
        
         | sidereal wrote:
         | The folks involved gave a neat talk about the verification
         | techniques and tools they used as part of AWS Pi Week recently:
         | https://www.twitch.tv/videos/951537246?t=1h10m10s
        
       | wolf550e wrote:
       | AWS fixed S3 consistency in December 2020:
       | 
       | https://aws.amazon.com/about-aws/whats-new/2020/12/amazon-s3...
        
         | deepsun wrote:
         | By the way, Google's GCS had it from the beginning.
        
           | CommanderHux wrote:
           | Azure had it from the beginning
        
           | BrandonY wrote:
           | Hi, GCS engineer here. GCS offered a lot of consistency from
           | the beginning, but we didn't have strong object listing
           | consistency at the beginning. We got that somewhere around
           | 2017 when we moved object metadata to Spanner. See
           | https://cloud.google.com/blog/products/gcp/how-google-
           | cloud-...
        
         | swyx wrote:
         | thats... what the post is about..
        
       | valenterry wrote:
       | Here's what I take away from this post:
       | 
       | > We built automation that can respond rapidly to load
       | concentration and individual server failure. Because the
       | consistency witness tracks minimal state and only in-memory, we
       | are able to replace them quickly without waiting for lengthy
       | state transfers.
       | 
       | So this means that the "system" that contains the witness(es) is
       | a single point of truth and failure (otherwise we would lose
       | consistency again), but because it does not have to store a lot
       | of information, it can be kept in-memory and can be exchanged
       | quickly in case of failure.
       | 
       | Or in other words: minimize the amount of information that is
       | strictly necessary to keep a system consistent and then make that
       | part its own in-memory and quickly failover-able system which is
       | then the bar for the HA component.
       | 
       | Is that what they did?
        
         | mgdev wrote:
         | They've basically bolted on causal consistency.
         | 
         | It's a great change.
        
       ___________________________________________________________________
       (page generated 2021-04-28 23:01 UTC)