[HN Gopher] Fixing Ext4 Under Pressure
       ___________________________________________________________________
        
       Fixing Ext4 Under Pressure
        
       Author : vitplister
       Score  : 90 points
       Date   : 2024-03-01 20:07 UTC (1 days ago)
        
 (HTM) web link (sdomi.pl)
 (TXT) w3m dump (sdomi.pl)
        
       | umanwizard wrote:
       | 0x10000 in the mentioned bitfield is
       | EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT which is documented here:
       | https://www.kernel.org/doc/Documentation/filesystems/ext4/or...
       | 
       | If I'm understanding this correctly, probably the only effect of
       | having accidentally turned this off is leaking some blocks.
        
       | fragmede wrote:
       | And people say "just do it on-prem, it's cheaper than cloud! It's
       | not all that hard." Work like this is impressive as hell, but
       | finding wizards who can do this kind of work is difficult and
       | expensive. I've sat there are poured over hexdumps trying to get
       | it to reveal it's secrets (Kaitai is pretty cool). But this kind
       | of nitty gritty went away with cloud, to be replaced with a
       | different kind of problem, distributed systems. My point is,
       | that's pretty cool, but there's less demand for that kind of work
       | these days, which is unfortunate, because I kind of liked it.
        
         | chgs wrote:
         | Or he could have made backups like we've been doing for decades
         | 
         | > I didn't have any recent PostgreSQL backups.
        
           | LordN00b wrote:
           | That would have been the easiest/best solution here.
           | However...haven't they uncovered a limitation in the
           | filesystem here? The superblock data WAS fine, only the
           | checksum was at fault. They found a away around the issue,
           | wrote up their findings, suspicion about the flushing after a
           | resize, and asked for more tooling support. This is
           | classically a good blog post.
        
             | chgs wrote:
             | Oh certainly, but it's not required of anyone self hosting
             | who has a backup (and you need backups when someone else
             | shots for you anyway)
        
         | lathiat wrote:
         | Except the same thing might happen to your ext4 cloud instance
         | and they'll just tell you it's lost.
        
           | awskinda wrote:
           | As with most things, I think the answer is "it depends." I
           | have seen some very efficient storage systems designed around
           | S3/R2 object storage where applications run in ephemeral
           | containers, and storage is abstracted.
           | 
           | I have also played a role in running an on-prem SAN that
           | worked well mostly.
        
           | fragmede wrote:
           | The RDS database in the cloud that is backed by ext4 is
           | trivially backed up (at great cost) to S3 which isn't going
           | anywhere. S3 has 99.999999999% of durability, that is to say
           | is eleven nines. They're not going to tell you it's lost.
        
         | mistrial9 wrote:
         | management systematically replaced people like that as a)
         | expensive b) rivals in power. The idea to build replaceable-
         | parts operations by workers, instead of University skill level
         | super-admins, was by design and implemented with plenty of
         | capital behind it.
        
           | applied_heat wrote:
           | This is what any business does and what you would do with
           | your own growing business that you one day will no longer own
           | too. You need defined roles with responsibilities and
           | required skills that you can hire people to fill as people
           | come and go in their own lives. Sucks to be a cog but cogs we
           | are
        
             | mistrial9 wrote:
             | > what any business does
             | 
             | absolute statements are by definition False, since "any"
             | business that does otherwise, even one, disproves the
             | statement.
             | 
             | no need to invoke concepts of stability, social contract or
             | law; maybe just add some ref to obsequiousness for lit
             | value here
        
               | applied_heat wrote:
               | Thank you for the assistance, "a mature" business would
               | have been more accurate than "any"
        
       | mike_hock wrote:
       | Let me guess, the server has been running with metadata
       | checksumming (and the mystery feature) disabled since that
       | incident and it still hasn't been switched on again.
        
       | captn3m0 wrote:
       | I would have considered re-compiling tune2fs with the check
       | disabled, but this is a cool hack.
        
       ___________________________________________________________________
       (page generated 2024-03-02 23:02 UTC)