[HN Gopher] Understanding Google's File System (2020)
___________________________________________________________________
Understanding Google's File System (2020)
Author : tosh
Score : 83 points
Date : 2024-03-19 14:48 UTC (8 hours ago)
(HTM) web link (www.micahlerner.com)
(TXT) w3m dump (www.micahlerner.com)
| jeffbee wrote:
| (2020) isn't quite enough to clue the reader in that GFS was
| extirpated at Google more than a decade ago. Historical trivia
| only.
| tivert wrote:
| > (2020) isn't quite enough to clue the reader in that GFS was
| extirpated at Google more than a decade ago. Historical trivia
| only.
|
| Do you have more info about that? I was just Googling and found
| this:
|
| https://pdos.csail.mit.edu/6.824/papers/gfs-faq.txt
|
| > Q: Does Google still use GFS?
|
| > A: Rumor has it that GFS has been replaced by something
| called Colossus, with the same overall goals, but improvements
| in master performance and fault-tolerance. In addition, many
| applications within Google have switched to more database-like
| storage systems such as BigTable and Spanner. However, much of
| the GFS design lives on in HDFS, the storage system for the
| Hadoop open-source MapReduce.
|
| > https://cloud.google.com/blog/products/storage-data-
| transfer...
| jeffbee wrote:
| There is almost nothing public about Colossus. Here's some
| marketing fluff, but it's clear that Colossus is the
| successor to GFS and they compare scale (CFS >> GFS).
| https://cloud.google.com/blog/products/storage-data-
| transfer...
|
| Also this slide deck https://www.pdsw.org/pdsw-
| discs17/slides/PDSW-DISCS-Google-K...
| nosefrog wrote:
| > In addition, many applications within Google have switched
| to more database-like storage systems such as BigTable and
| Spanner.
|
| And where do BigTable and Spanner store their data? ;)
| summerlight wrote:
| Interestingly, Colossus was initially designed to be the
| next generation BigTable storage. It turned out that other
| teams had a similar problem, so it has become a more
| general file system.
| ithkuil wrote:
| Turns out that big table is also useful to implement the
| storage layer for big table itself. Luckily turtles can
| be stacked on top of each others
| dist-epoch wrote:
| BigTable stores it's data in Colossus, but Colossus stores
| it's metadata in BigTable :)
| kristjansson wrote:
| And Colossus stores its metametametadata in Chubby :)
|
| https://www.pdsw.org/pdsw-discs17/slides/PDSW-DISCS-
| Google-K...
| kccqzy wrote:
| At Google the project to migrate GFS to Colossus was called
| Moonshot. It was endorsed by Eric Schmidt and became a top-
| down mandate. This goes into details of the migration: https:
| //sre.google/static/pdf/CaseStudiesInfrastructureChang...
| (scroll to Chapter 1)
| rkagerer wrote:
| Thanks. This stat from the section on Diskless is
| interesting:
|
| _On shared machines, spinning disk reliability decreased
| compute reliability; 25%-30% of production task deaths were
| attributable to disk failure._
| dekhn wrote:
| GFS was terrible. It may have been great in the early days of
| google when they couldn't build an index, but I recall having to
| bribe hwops people (with single-malt) to upgrade RAM on the GFS
| masters, and a few times, my team (Ads) threatened to run our own
| filesystems that didn't fail so badly.
|
| Colossus, on the other hand, is really good.
| the-rc wrote:
| Among other things, GFS ran outside of Borg, so you couldn't
| just donate quota to let it run with more resources, as you
| found out. Conversely, given that Colossus ran on Borg, part of
| the migration involved an elaborate system that minted CPU/RAM
| quota for the storage system users whenever the CFS cell was
| grown. Then there were all the issues on the low end machines
| (hello, Nocona) that had not enough resources to run the
| storage stack AND user jobs. Stuff wouldn't schedule (and the
| cluster suddenly appeared to be missing some disk), but at
| least that was a lot more explicit than the performance issues
| you witnessed on GFS.
| robertlagrant wrote:
| > part of the migration involved an elaborate system that
| minted CPU/RAM quota for the storage system users whenever
| the CFS cell was grown
|
| I've no idea what this is, but it sounds fascinating. Was
| this a way to auto-allocate more resource based on rules?
| marklar423 wrote:
| What I think the GP means is previously, GFS ran on their
| own machines and didn't need to participate in the Google-
| wide quota system for CPU/RAM.
|
| However Colossus _does_ run on the general Google compute
| infrastructure, and so needed a way to get CPU/RAM quota
| where none existed (because it was all used by existing,
| non-storage users).
| tytso wrote:
| Single malt was the currency of choice when bribing and/or
| placating SRE's and hwops folks. For example, if a SWE botched
| a rollout that caused a multiple SRE's to get paged at 3am, a
| bottle of single malt donated to the SRE bar was considered a
| way of apologizing.
| alt227 wrote:
| If anything, the interview that this was based on is from 2009
| and was a much more interesting read for myself:
|
| https://queue.acm.org/detail.cfm?id=1594206
| btilly wrote:
| The key thing to understand about GFS and relatives is how
| essential it is for distributed computation.
|
| It is natural for us to look at a big file and think, "I need to
| do something complicated, so I'll do a distributed calculation,
| then get a new file." This makes the file you start with, and the
| file you wind up with, both bottlenecks to the calculation.
|
| Instead you really want to have data distributed across machines.
| Now do a distributed map-reduce calculations, and get another
| distributed dataset. Then create a pipeline of such calculations.
| Leading to a distributed calculation with no bottlenecks.
| Anywhere.
|
| (Of course this is a low-level view of how it works. We may want
| to build that pipeline automatically off of, say, a SQL
| statement. To get a distributed query against a datastore.)
|
| Absolutely essential for this is the ability to have a
| distributed, replicated, filesystem. This is what GFS was. When
| you try to scale a system, you focus on your current bottlenecks,
| and accept that there will be later ones. GFS did not distribute
| metadata. It did not play well with later systems created for
| managing distributed machines. But it was an important step
| towards a distributed world. And the fact that Google was so far
| ahead of everyone else on this was a giant piece of their secret
| sauce.
|
| True story. I went to Google at about the same time that eBay
| shut down their first datacenter. So I was reading an article
| from eBay about how more work it took to shut down a datacenter
| without disrupting operations. And how proud they are at
| succeeding. And then I saw the best practices that Google was
| using so that users wouldn't notice if a random datacenter went
| offline without warning. Which was a capacity that Google
| regularly tested. Because if you're afraid to test your emergency
| systems, they probably don't work.
|
| What eBay was proud of doing with a lot of work, was entirely
| taken for granted and automated at Google.
___________________________________________________________________
(page generated 2024-03-19 23:00 UTC)