Post 9ySUlAu8AB2RunzFaa by smortex@mamot.fr
 (DIR) More posts by smortex@mamot.fr
 (DIR) Post #9ySUl9HoBAn2tf38DY by raichoo@chaos.social
       2020-08-24T09:13:30Z
       
       0 likes, 0 repeats
       
       Some #ZFS myth busting. No you don't need ECC RAM. #FreeBSD #Linux #BSD https://youtu.be/pN7OLChclH8?t=2195
       
 (DIR) Post #9ySUlAu8AB2RunzFaa by smortex@mamot.fr
       2020-08-24T20:13:35Z
       
       1 likes, 0 repeats
       
       @raichoo … really depends on your usage.  If you have real backups (not based on ZFS clone), it can be okay.If you want reliable storage, mirroring without ECC RAM will behave fine for a long time before you realize it's already too late.  When lucky, bit flips happen in unused memory, in application or kernel memory causing a crash / reboot, in cached data causing strange behavior until the next reboot.
       
 (DIR) Post #9ySUlD4U6eVcdUqY0O by smortex@mamot.fr
       2020-08-24T20:15:57Z
       
       0 likes, 0 repeats
       
       @raichoo  When unlucky, in happen to data before it's written to disk (data loss).Or worse : corruption can happen in ZFS metadata.  In that case, you are basically doomed…
       
 (DIR) Post #9ySUlEERn64YEgW1Am by feld@bikeshed.party
       2020-08-24T20:35:06.235318Z
       
       0 likes, 0 repeats
       
       @smortex @raichoo yep, you don't *need* ECC RAM but if you happen to get the memory of the updated Uberblock/MOS corrupted just before it's written to disk you're boned, say goodbye to your pool.
       
 (DIR) Post #9yU39LWHwDFXO7UZPc by raichoo@chaos.social
       2020-08-25T08:12:51Z
       
       0 likes, 0 repeats
       
       @feld @smortex Given that the toot's intention was to call out the "ZFS without ECC is harmful" myth, it feels like we have gone off the rails a bit here. Anyway, since the uberblock is stored across 4 vdev labels (2 front and 2 back) with 128 copies each. Wouldn't ZFS just pick the latest one with the highest TX number and valid checksum? I'm sure there are still ways to mess that up somehow, not claiming it's impossible to lose data, as some might read into the original toot.
       
 (DIR) Post #9yU39MVyEsacTQLpYW by raichoo@chaos.social
       2020-08-25T08:13:47Z
       
       0 likes, 0 repeats
       
       @feld @smortex Happy to see pointers to the code and potential ways how that might blow up. Even though we are not talking about the specific need for ECC anymore.
       
 (DIR) Post #9yU39NPysdOPH8YYrI by feld@bikeshed.party
       2020-08-25T14:35:11.008830Z
       
       0 likes, 0 repeats
       
       @raichoo @smortex We're still talking about ECC because this is the one area where ECC helps you with ZFS...
       
 (DIR) Post #9yU3IPDFZpx3RouIlM by feld@bikeshed.party
       2020-08-25T14:36:50.451288Z
       
       0 likes, 0 repeats
       
       @raichoo @smortex > wouldn't ZFS just pick the latest one with the highest TX number and valid checksum?You are missing the part where the data was corrupted in memory before written to disk. This is also in the same time window before the checksum has been calculated. The checksum would be calculated on the bad version of the data, not the good one.
       
 (DIR) Post #9yU56IHLjO4gNNB1kW by raichoo@chaos.social
       2020-08-25T14:48:01Z
       
       0 likes, 0 repeats
       
       @feld @smortex Okay now I seem to understand where this conversation went wrong. I was never arguing against the benefits of ECC nor what it can protect you against. I was arguing against "you have to have ECC otherwise ZFS is more dangerous than a regular FS".
       
 (DIR) Post #9yU56K6QwwX0265L0q by feld@bikeshed.party
       2020-08-25T14:57:02.126981Z
       
       0 likes, 0 repeats
       
       @raichoo @smortex It depends on what your risk profile is.Other filesystems can recover with some fsck, you might end up with a little bit rot but if you're vigilant you can work around it. You won't lose everything.With ZFS, you can lose everything.So if you don't believe that bit flips and silent corruption from hardware defects is a threat, you're probably OK...But I'm not gambling with that. The data does not look good. Just ask @SlicerDicer how fun bit flips are -- he's had several while up at high altitudes.Gamma rays are increasing YoY.Google, 2009: "Across the entire fleet, 8.2% of all DIMMs are affected by correctable errors and an average DIMM experiences nearly 4000 correctable errors per year."https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/35162.pdfHow many DIMMs in your computer? 4000 correctable errors per stick per year. That should make you feel uneasy.
       
 (DIR) Post #9yU6DSnX7TfmyvDHEm by raichoo@chaos.social
       2020-08-25T15:06:43Z
       
       0 likes, 0 repeats
       
       @feld @SlicerDicer @smortex What exactly are you trying to convince me of that I don't already know? Hardware can fail? Yes I know that. Bit flips can and will happen? I know that as well. You seem to be arguing against some fictional version of me that is completely oblivious, and it's a bit exhausting.
       
 (DIR) Post #9yU6DTNKyMJSlxxs2q by feld@bikeshed.party
       2020-08-25T15:09:32.683839Z
       
       0 likes, 0 repeats
       
       @raichoo @SlicerDicer @smortex you seem to be unable to accept the scope of danger here.The answer is that yes, ZFS without ECC is more volatile than other filesystems if a specific failure scenario occurs. The incident rate is not improbably low.