[HN Gopher] Understanding Google's File System (2020)
       ___________________________________________________________________
        
       Understanding Google's File System (2020)
        
       Author : tosh
       Score  : 83 points
       Date   : 2024-03-19 14:48 UTC (8 hours ago)
        
 (HTM) web link (www.micahlerner.com)
 (TXT) w3m dump (www.micahlerner.com)
        
       | jeffbee wrote:
       | (2020) isn't quite enough to clue the reader in that GFS was
       | extirpated at Google more than a decade ago. Historical trivia
       | only.
        
         | tivert wrote:
         | > (2020) isn't quite enough to clue the reader in that GFS was
         | extirpated at Google more than a decade ago. Historical trivia
         | only.
         | 
         | Do you have more info about that? I was just Googling and found
         | this:
         | 
         | https://pdos.csail.mit.edu/6.824/papers/gfs-faq.txt
         | 
         | > Q: Does Google still use GFS?
         | 
         | > A: Rumor has it that GFS has been replaced by something
         | called Colossus, with the same overall goals, but improvements
         | in master performance and fault-tolerance. In addition, many
         | applications within Google have switched to more database-like
         | storage systems such as BigTable and Spanner. However, much of
         | the GFS design lives on in HDFS, the storage system for the
         | Hadoop open-source MapReduce.
         | 
         | > https://cloud.google.com/blog/products/storage-data-
         | transfer...
        
           | jeffbee wrote:
           | There is almost nothing public about Colossus. Here's some
           | marketing fluff, but it's clear that Colossus is the
           | successor to GFS and they compare scale (CFS >> GFS).
           | https://cloud.google.com/blog/products/storage-data-
           | transfer...
           | 
           | Also this slide deck https://www.pdsw.org/pdsw-
           | discs17/slides/PDSW-DISCS-Google-K...
        
           | nosefrog wrote:
           | > In addition, many applications within Google have switched
           | to more database-like storage systems such as BigTable and
           | Spanner.
           | 
           | And where do BigTable and Spanner store their data? ;)
        
             | summerlight wrote:
             | Interestingly, Colossus was initially designed to be the
             | next generation BigTable storage. It turned out that other
             | teams had a similar problem, so it has become a more
             | general file system.
        
               | ithkuil wrote:
               | Turns out that big table is also useful to implement the
               | storage layer for big table itself. Luckily turtles can
               | be stacked on top of each others
        
             | dist-epoch wrote:
             | BigTable stores it's data in Colossus, but Colossus stores
             | it's metadata in BigTable :)
        
               | kristjansson wrote:
               | And Colossus stores its metametametadata in Chubby :)
               | 
               | https://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-
               | Google-K...
        
           | kccqzy wrote:
           | At Google the project to migrate GFS to Colossus was called
           | Moonshot. It was endorsed by Eric Schmidt and became a top-
           | down mandate. This goes into details of the migration: https:
           | //sre.google/static/pdf/CaseStudiesInfrastructureChang...
           | (scroll to Chapter 1)
        
             | rkagerer wrote:
             | Thanks. This stat from the section on Diskless is
             | interesting:
             | 
             |  _On shared machines, spinning disk reliability decreased
             | compute reliability; 25%-30% of production task deaths were
             | attributable to disk failure._
        
       | dekhn wrote:
       | GFS was terrible. It may have been great in the early days of
       | google when they couldn't build an index, but I recall having to
       | bribe hwops people (with single-malt) to upgrade RAM on the GFS
       | masters, and a few times, my team (Ads) threatened to run our own
       | filesystems that didn't fail so badly.
       | 
       | Colossus, on the other hand, is really good.
        
         | the-rc wrote:
         | Among other things, GFS ran outside of Borg, so you couldn't
         | just donate quota to let it run with more resources, as you
         | found out. Conversely, given that Colossus ran on Borg, part of
         | the migration involved an elaborate system that minted CPU/RAM
         | quota for the storage system users whenever the CFS cell was
         | grown. Then there were all the issues on the low end machines
         | (hello, Nocona) that had not enough resources to run the
         | storage stack AND user jobs. Stuff wouldn't schedule (and the
         | cluster suddenly appeared to be missing some disk), but at
         | least that was a lot more explicit than the performance issues
         | you witnessed on GFS.
        
           | robertlagrant wrote:
           | > part of the migration involved an elaborate system that
           | minted CPU/RAM quota for the storage system users whenever
           | the CFS cell was grown
           | 
           | I've no idea what this is, but it sounds fascinating. Was
           | this a way to auto-allocate more resource based on rules?
        
             | marklar423 wrote:
             | What I think the GP means is previously, GFS ran on their
             | own machines and didn't need to participate in the Google-
             | wide quota system for CPU/RAM.
             | 
             | However Colossus _does_ run on the general Google compute
             | infrastructure, and so needed a way to get CPU/RAM quota
             | where none existed (because it was all used by existing,
             | non-storage users).
        
         | tytso wrote:
         | Single malt was the currency of choice when bribing and/or
         | placating SRE's and hwops folks. For example, if a SWE botched
         | a rollout that caused a multiple SRE's to get paged at 3am, a
         | bottle of single malt donated to the SRE bar was considered a
         | way of apologizing.
        
       | alt227 wrote:
       | If anything, the interview that this was based on is from 2009
       | and was a much more interesting read for myself:
       | 
       | https://queue.acm.org/detail.cfm?id=1594206
        
       | btilly wrote:
       | The key thing to understand about GFS and relatives is how
       | essential it is for distributed computation.
       | 
       | It is natural for us to look at a big file and think, "I need to
       | do something complicated, so I'll do a distributed calculation,
       | then get a new file." This makes the file you start with, and the
       | file you wind up with, both bottlenecks to the calculation.
       | 
       | Instead you really want to have data distributed across machines.
       | Now do a distributed map-reduce calculations, and get another
       | distributed dataset. Then create a pipeline of such calculations.
       | Leading to a distributed calculation with no bottlenecks.
       | Anywhere.
       | 
       | (Of course this is a low-level view of how it works. We may want
       | to build that pipeline automatically off of, say, a SQL
       | statement. To get a distributed query against a datastore.)
       | 
       | Absolutely essential for this is the ability to have a
       | distributed, replicated, filesystem. This is what GFS was. When
       | you try to scale a system, you focus on your current bottlenecks,
       | and accept that there will be later ones. GFS did not distribute
       | metadata. It did not play well with later systems created for
       | managing distributed machines. But it was an important step
       | towards a distributed world. And the fact that Google was so far
       | ahead of everyone else on this was a giant piece of their secret
       | sauce.
       | 
       | True story. I went to Google at about the same time that eBay
       | shut down their first datacenter. So I was reading an article
       | from eBay about how more work it took to shut down a datacenter
       | without disrupting operations. And how proud they are at
       | succeeding. And then I saw the best practices that Google was
       | using so that users wouldn't notice if a random datacenter went
       | offline without warning. Which was a capacity that Google
       | regularly tested. Because if you're afraid to test your emergency
       | systems, they probably don't work.
       | 
       | What eBay was proud of doing with a lot of work, was entirely
       | taken for granted and automated at Google.
        
       ___________________________________________________________________
       (page generated 2024-03-19 23:00 UTC)