[HN Gopher] Continuous reinvention: A brief history of block sto...
       ___________________________________________________________________
        
       Continuous reinvention: A brief history of block storage at AWS
        
       Author : riv991
       Score  : 87 points
       Date   : 2024-08-22 14:59 UTC (2 hours ago)
        
 (HTM) web link (www.allthingsdistributed.com)
 (TXT) w3m dump (www.allthingsdistributed.com)
        
       | mjb wrote:
       | Super cool to see this here. If you're at all interested in big
       | systems, you should read this.
       | 
       | > Compounding this latency, hard drive performance is also
       | variable depending on the other transactions in the queue.
       | Smaller requests that are scattered randomly on the media take
       | longer to find and access than several large requests that are
       | all next to each other. This random performance led to wildly
       | inconsistent behavior.
       | 
       | The effect of this can be huge! Given a reasonably sequential
       | workload, modern magnetic drives can do >100MB/s of reads or
       | writes. Given an entirely random 4kB workload, they can be
       | limited to as little as 400kB/s of reads or writes. Queuing and
       | scheduling can help avoid the truly bad end of this, but real-
       | world performance still varies by over 100x depending on
       | workload. That's really hard for a multi-tenant system to deal
       | with (especially with reads, where you can't do the "just write
       | it somewhere else" trick).
       | 
       | > To know what to fix, we had to know what was broken, and then
       | prioritize those fixes based on effort and rewards.
       | 
       | This was the biggest thing I learned from Marc in my career (so
       | far). He'd spend time working on visualizations of latency (like
       | the histogram time series in this post) which were much richer
       | than any of the telemetry we had, then tell a story using those
       | visualizations, and completely change the team's perspective on
       | the work that needed to be done. Each peak in the histogram came
       | with it's own story, and own work to optimize. Really diving into
       | performance data - and looking at that data in multiple ways -
       | unlocks efficiencies and opportunities that are invisible without
       | that work and investment.
       | 
       | > Armed with this knowledge, and a lot of human effort, over the
       | course of a few months in 2013, EBS was able to put a single SSD
       | into each and every one of those thousands of servers.
       | 
       | This retrofit project is one of my favorite AWS stories.
       | 
       | > The thing that made this possible is that we designed our
       | system from the start with non-disruptive maintenance events in
       | mind. We could retarget EBS volumes to new storage servers, and
       | update software or rebuild the empty servers as needed.
       | 
       | This is a great reminder that building distributed systems isn't
       | just for scale. Here, we see how building the system in a way
       | that can seamlessly tolerate the failure of a server, and move
       | data around without loss, makes large scale operations
       | (everything from day-to-day software upgrades to a massive
       | hardware retrofit project) possible that just wouldn't be
       | possible in a "simpler" architecture. A "simpler" architecture
       | would make these operations much harder, to the point of being
       | impossible, making the end-to-end problem we're trying to solve
       | for the customer harder.
        
       | tw04 wrote:
       | I think the most fascinating thing is watching them relearn every
       | lesson the storage industry already knew about a decade earlier.
       | Feels like most of this could have been solved by either hiring
       | storage industry experts or just acquiring one of the major
       | vendors.
        
         | jeeyoungk wrote:
         | What is there to learn from an "storage industry expert" or
         | major vendors? network attached block level storage at AWS's
         | scale hasn't been done before.
        
       | simonebrunozzi wrote:
       | If you're curious, this is a talk I gave back in 2009 [0] about
       | Amazon S3 internals. It was created from internal assets by the
       | S3 team, and a lot in there influenced how EBS was developed.
       | 
       | [0]: https://vimeo.com/7330740
        
       ___________________________________________________________________
       (page generated 2024-08-22 17:00 UTC)