We had a minor crash in our data center. It's called "minor" because it only affected about a dozen systems and the total number of systems is over 700. Still, we had data loss and possibly some silent corruption. That silent corruption is a really big issue. Since we use ZFS for almost all application data, we could simply scrub them. No checksum errors? Perfect, the data is fine then. But what about operating systems? We run full-blown Linux virtual machines, so every customer gets his own /, /usr, /lib and so on. What about all that data? Did that get corrupted as well? Very sadly, we still use ext4 here. On some systems we got lucky: Bad superblocks. This results in more work for me (because I have to re-build those systems -- which is not *that* much work, though, since we use config management for virtually everything), but I can be sure that these systems indeed *are* affected. Other systems just crashed and successfully rebooted. Now what? I'm pretty much fed up with this situation. All filesystems should have checksums in 2017. Fuck performance. Performance is worth nothing if you operate on faulty data. I'm currently in the process of writing a tool that checksums files and stores the checksum in extended attributes. This is *far* from satisfactory. Still, in scenarios like the one above, we could manually "scrub" our data to further assess the situation.