[HN Gopher] Fixing Ext4 Under Pressure
___________________________________________________________________
Fixing Ext4 Under Pressure
Author : vitplister
Score : 90 points
Date : 2024-03-01 20:07 UTC (1 days ago)
(HTM) web link (sdomi.pl)
(TXT) w3m dump (sdomi.pl)
| umanwizard wrote:
| 0x10000 in the mentioned bitfield is
| EXT4_FEATURE_RO_COMPAT_ORPHAN_PRESENT which is documented here:
| https://www.kernel.org/doc/Documentation/filesystems/ext4/or...
|
| If I'm understanding this correctly, probably the only effect of
| having accidentally turned this off is leaking some blocks.
| fragmede wrote:
| And people say "just do it on-prem, it's cheaper than cloud! It's
| not all that hard." Work like this is impressive as hell, but
| finding wizards who can do this kind of work is difficult and
| expensive. I've sat there are poured over hexdumps trying to get
| it to reveal it's secrets (Kaitai is pretty cool). But this kind
| of nitty gritty went away with cloud, to be replaced with a
| different kind of problem, distributed systems. My point is,
| that's pretty cool, but there's less demand for that kind of work
| these days, which is unfortunate, because I kind of liked it.
| chgs wrote:
| Or he could have made backups like we've been doing for decades
|
| > I didn't have any recent PostgreSQL backups.
| LordN00b wrote:
| That would have been the easiest/best solution here.
| However...haven't they uncovered a limitation in the
| filesystem here? The superblock data WAS fine, only the
| checksum was at fault. They found a away around the issue,
| wrote up their findings, suspicion about the flushing after a
| resize, and asked for more tooling support. This is
| classically a good blog post.
| chgs wrote:
| Oh certainly, but it's not required of anyone self hosting
| who has a backup (and you need backups when someone else
| shots for you anyway)
| lathiat wrote:
| Except the same thing might happen to your ext4 cloud instance
| and they'll just tell you it's lost.
| awskinda wrote:
| As with most things, I think the answer is "it depends." I
| have seen some very efficient storage systems designed around
| S3/R2 object storage where applications run in ephemeral
| containers, and storage is abstracted.
|
| I have also played a role in running an on-prem SAN that
| worked well mostly.
| fragmede wrote:
| The RDS database in the cloud that is backed by ext4 is
| trivially backed up (at great cost) to S3 which isn't going
| anywhere. S3 has 99.999999999% of durability, that is to say
| is eleven nines. They're not going to tell you it's lost.
| mistrial9 wrote:
| management systematically replaced people like that as a)
| expensive b) rivals in power. The idea to build replaceable-
| parts operations by workers, instead of University skill level
| super-admins, was by design and implemented with plenty of
| capital behind it.
| applied_heat wrote:
| This is what any business does and what you would do with
| your own growing business that you one day will no longer own
| too. You need defined roles with responsibilities and
| required skills that you can hire people to fill as people
| come and go in their own lives. Sucks to be a cog but cogs we
| are
| mistrial9 wrote:
| > what any business does
|
| absolute statements are by definition False, since "any"
| business that does otherwise, even one, disproves the
| statement.
|
| no need to invoke concepts of stability, social contract or
| law; maybe just add some ref to obsequiousness for lit
| value here
| applied_heat wrote:
| Thank you for the assistance, "a mature" business would
| have been more accurate than "any"
| mike_hock wrote:
| Let me guess, the server has been running with metadata
| checksumming (and the mystery feature) disabled since that
| incident and it still hasn't been switched on again.
| captn3m0 wrote:
| I would have considered re-compiling tune2fs with the check
| disabled, but this is a cool hack.
___________________________________________________________________
(page generated 2024-03-02 23:02 UTC)