[HN Gopher] ZFS: Apple's New Filesystem that wasn't (2016)
       ___________________________________________________________________
        
       ZFS: Apple's New Filesystem that wasn't (2016)
        
       Author : jitl
       Score  : 128 points
       Date   : 2025-04-27 09:25 UTC (13 hours ago)
        
 (HTM) web link (ahl.dtrace.org)
 (TXT) w3m dump (ahl.dtrace.org)
        
       | jitl wrote:
       | Besides the licensing issue, I wonder if optimizing ZFS for low
       | latency + low RAM + low power on iPhone was an uphill battle or
       | if it's easy. My experiencing running ZFS years ago was poor
       | latency and large RAM use with my NAS, but that hardware and
       | drive configuration was optimized for low $ per gb stored and
       | used parity stuff.
        
         | zoky wrote:
         | If it were an issue it would hardly be an insurmountable one. I
         | just can't imagine a scenario where Apple engineers go "Yep,
         | we've eked out all of the performance we possibly can from this
         | phone, the only thing left to do is change out the filesystem."
        
           | klodolph wrote:
           | Does it matter if it's insurmountable? At some point, the
           | benefits of a new FS outweigh the drawbacks. This happens
           | earlier than you might think, because of weird factors like
           | "this lets us retain top filesystem experts on staff".
        
             | karlgkk wrote:
             | It's worth remembering that the filesystem they were
             | looking to replace was HFS+. It was introduced in the 90s
             | as a modernization of HFS, itself introduced in the 80s.
             | 
             | Now, old does not necessarily mean bad, but in this
             | case....
        
         | twoodfin wrote:
         | This seems like an early application of the Tim Cook doctrine:
         | Why would Apple want to surrender control of this key bit of
         | technology for their platforms?
         | 
         | The rollout of APFS a decade later validated this concern.
         | There's just no way that flawless transition happens so rapidly
         | without a filesystem fit to order for Apple's needs from Day 0.
        
           | TheNewsIsHere wrote:
           | (Edit: My comment is simply about the logistics and work
           | involved in a very well executed filesystem migration. Not
           | about whether ZFS is good for embedded or memory constrained
           | devices.)
           | 
           | What you describe hits my ear as more NIH syndrome than
           | technical reality.
           | 
           | Apple's transition to APFS was managed like you'd manage any
           | kind of mass scale filesystem migration. I can't imagine
           | they'd have done anything differently if they'd have adopted
           | ZFS.
           | 
           | Which isn't to say they wouldn't have modified ZFS.
           | 
           | But with proper driver support and testing it wouldn't have
           | made much difference whether they wrote their own file system
           | or adopted an existing one. They have done a fantastic job of
           | compartmentalizing and rationalizing their OS and user data
           | partitions and structures. It's not like every iPhone model
           | has a production run that has different filesystem needs that
           | they'd have to sort out.
           | 
           | There was an interesting talk given at WWDC a few years ago
           | on this. The roll out of APFS came after they'd already
           | tested the filesystem conversion for randomized groups of
           | devices and then eventually every single device that upgraded
           | to one of the point releases prior to iOS 10.3. The way they
           | did this was to basically run the conversion in memory as a
           | logic test against real data. At the end they'd have the
           | super block for the new APFS volume, and on a successful exit
           | they simply discarded it instead of writing it to persistent
           | storage. If it errored it would send a trace back to Apple.
           | 
           | Huge amounts of testing and consistency in OS and user data
           | partitioning and directory structures is a huge part of why
           | that migration worked so flawlessly.
        
           | jeroenhd wrote:
           | I don't see why ZFS wouldn't have gone over equally
           | flawlessly. None of the features that make ZFS special were
           | in HFS(+), so conversion wouldn't be too hard. The only
           | challenge would be maintaining the legacy compression
           | algorithms, but ZFS is configurable enough that Apple
           | could've added their custom compression to it quite easily.
           | 
           | There are probably good reasons for Apple to reinvent ZFS as
           | APFS a decade later, but none of them technical.
           | 
           | I also wouldn't call the rollout of APFS flawless, per se.
           | It's still a terrible fit for (external) hard drives and
           | their own products don't auto convert to APFS in some cases.
           | There was also plenty of breakage when case-sensitivity
           | flipped on people and software, but as far as I can tell
           | Apple just never bothered to address that.
        
             | jonhohle wrote:
             | HFS compression, AFAICT, is all done in user space with
             | metadata and extended attributes.
        
           | kmeisthax wrote:
           | To be clear, BTRFS _also_ supports in-place upgrade. It 's
           | not a uniquely Apple feature; any copy-on-write filesystem
           | with flexibility as to where data is located can be made to
           | fit inside of the free blocks of another filesystem. Once you
           | can do that, then you can do test runs[0] of the filesystem
           | upgrade before committing to wiping the superblock.
           | 
           | I don't know for certain if they could have done it with ZFS;
           | but I can imagine it would at least been doable with some
           | Apple extensions that would only have to exist during test /
           | upgrade time.
           | 
           | [0] Part of why the APFS upgrade was so flawless was that
           | Apple had done a test upgrade in a prior iOS update. They'd
           | run the updater, log any errors, and then revert the upgrade
           | and ship the error log back to Apple for analysis.
        
           | toast0 wrote:
           | Using ZFS isn't surrendering control. Same as using parts of
           | FreeBSD. Apple retains control because they don't have an
           | obligation (or track record) of following the upstream.
           | 
           | For zfs, there's been a lot of improvements over the years,
           | but if they had done the fork and adapt and then leave it
           | alone, their fork would continue to work without outside
           | control. They could pull in things from outside if they want,
           | when they want; some parts easier than others.
        
         | hs86 wrote:
         | While its deduplication feature clearly demands more memory, my
         | understanding is that the ZFS ARC is treated by the kernel as a
         | driver with a massive, persistent memory allocation that cannot
         | be swapped out ("wired" pages). Unlike the regular file system
         | cache, ARC's eviction is not directly managed by the kernel.
         | Instead, ZFS itself is responsible for deciding when and how to
         | shrink the ARC.
         | 
         | This can lead to problems under sudden memory pressure. Because
         | the ARC does not immediately release memory when the system
         | needs it, userland pages might get swapped out instead. This
         | behavior is more noticeable on personal computers, where memory
         | usage patterns are highly dynamic (applications are constantly
         | being started, used, and closed). On servers, where workloads
         | are more static and predictable, the impact is usually less
         | severe.
         | 
         | I do wonder if this is also the case on Solaris or illumos,
         | where there is no intermediate SPL between ZFS and the kernel.
         | If so, I don't think that a hypothetical native integration of
         | ZFS on macOS (or even Linux) would adopt the ARC in its current
         | form.
        
           | dizhn wrote:
           | Maz arc size is configurable and it does not need the
           | mythical 1GB per TB to function well.
        
           | ryao wrote:
           | The ZFS driver will release memory if the kernel requests it.
           | The only integration level issue is that the free command
           | does not show ARC as a buffer/cache, so it misrepresents
           | reality, but as far as I know, this is an issue with caches
           | used by various filesystems (e.g. extent caches). It is only
           | obvious in the case of ZFS because the ARC can be so large.
           | That is a feature, not a bug, since unused memory is wasted
           | memory.
        
             | pseudalopex wrote:
             | > The ZFS driver will release memory if the kernel requests
             | it.
             | 
             | Not fast enough always.
        
           | netbsdusers wrote:
           | Solaris achieved some kind of integration between the ARC and
           | the VM subsystem as part of the VM2 project. I don't know any
           | more details than that.
        
             | ryao wrote:
             | I assume that the VM2 project achieved something similar to
             | the ABD changes that were done in OpenZFS. ABD replaced the
             | use of SLAB buffers for ARC with lists of pages. The issue
             | with SLAB buffers is that absurd amounts of work could be
             | done to free memory, and a single long lived SLAB object
             | would prevent any of it from mattering. Long lived slab
             | objects caused excessive reclaim, slowed down the process
             | of freeing enough memory to satisfy system needs and in
             | some cases, prevented enough memory from being freed to
             | satisfy system needs entirely. Switching to linked lists of
             | pages fixed that since the memory being freed from ARC upon
             | request would immediately become free rather than be
             | deferred to when all of the objects in the SLAB had been
             | freed.
        
         | fweimer wrote:
         | If I recall correctly, ZFS error recovery was still "restore
         | from backup" at the time, and iCloud acceptance was more
         | limited. (ZFS basically gave up if an error was encountered
         | after the checksum showed that the data was read correctly from
         | storage media.) That's fine for deployments where the
         | individual system does not matter (or you have dedicated staff
         | to recover systems if necessary), but phones aren't like that.
         | At least not from the user perspective.
        
           | ryao wrote:
           | ZFS has ditto blocks that allows it to self heal in the case
           | of corrupt metadata as long as a good copy remains (and there
           | would be at least 2 copies by default). ZFS only ever needs
           | you to restore from backup if the damage is so severe that
           | there is no making sense of things.
           | 
           | Minor things like the indirect blocks being missing for a
           | regular file only affect that file. Major things like all 3
           | copies of the MOS (the equivalent to a superblock) being gone
           | for all uberblock entries would require recovery from backup.
           | 
           | If all copies of any other filesystem's superblock were gone
           | too, that filesystem would be equally irrecoverable and would
           | require restoring from backup.
        
             | fweimer wrote:
             | As far as I understand it, ditto blocks were only used if
             | the corruption was detected due to checksum mismatch. If
             | the checksum was correct, but metadata turned out to be
             | unusable later (say because it was corrupted in memory, and
             | the the checksum was computed after the corruption
             | happened), that was treated as a fatal error.
        
         | alwillis wrote:
         | Apple wanted one operating system that ran on everything from a
         | Mac Pro to an Apple Watch and there's no way ZFS could have
         | done that.
        
           | ryao wrote:
           | ZFS would be quite comfortable with the 512MB of RAM on an
           | Apple Watch:
           | 
           | https://iosref.com/ram-processor
           | 
           | People have run operating systems using ZFS on less.
        
       | volemo wrote:
       | It was just yesterday I relistened to the contemporary
       | Hypercritical episode on the topic:
       | https://hypercritical.fireside.fm/56
        
         | mrkstu wrote:
         | Wow, John's voice has changed a LOT from back then
        
       | jeroenhd wrote:
       | I wonder what ZFS in the iPhone would've looked like. As far as I
       | recall, the iPhone didn't have error correcting memory, and ZFS
       | is notorious for corrupting itself when bit flips hit it and
       | break the checksum on disk. ZFS' RAM-hungry nature would've also
       | forced Apple to add more memory to their phone.
        
         | amarshall wrote:
         | > ZFS is notorious for corrupting itself when bit flips hit it
         | and break the checksum on disk
         | 
         | ZFS does not need or benefit from ECC memory any more than any
         | other FS. The bitflip corrupted the data, regardless of ZFS.
         | Any other FS is just oblivious, ZFS will at least tell you your
         | data is corrupt but happily keep operating.
         | 
         | > ZFS' RAM-hungry nature
         | 
         | ZFS is not really RAM-hungry, unless one uses deduplication
         | (which is not enabled by default, nor generally recommended).
         | It can often seem RAM hungry on Linux because the ARC is not
         | counted as "cache" like the page cache is.
         | 
         | ---
         | 
         | ZFS docs say as much as well:
         | https://openzfs.github.io/openzfs-docs/Project%20and%20Commu...
        
           | williamstein wrote:
           | And even dedup was finally rewritten to be significantly more
           | memory efficient, as of the new 2.3 release of ZFS:
           | https://github.com/openzfs/zfs/discussions/15896
        
         | Dylan16807 wrote:
         | > ZFS is notorious for corrupting itself when bit flips hit it
         | and break the checksum on disk
         | 
         | I don't think it is. I've never heard of that happening, or
         | seen any evidence ZFS is more likely to break than any random
         | filesystem. I've only seen people spreading paranoid rumors
         | based on a couple pages saying ECC memory is important to fully
         | get the benefits of ZFS.
        
           | thfuran wrote:
           | They also insist that you need about 10 TB RAM per TB disk
           | space or something like that.
        
             | yjftsjthsd-h wrote:
             | There is a rule of thumb that you should have at least 1 GB
             | of RAM per TB of disk _when using deduplication_. That
             | 's.... Different.
        
               | williamstein wrote:
               | Fortunately, this has significantly improved since dedup
               | was rewritten as part of the new ZFS 2.3 release. Search
               | for zfs "fast dedup".
        
               | thfuran wrote:
               | So you've never seen the people saying you should steer
               | clear of ZFS unless you're going to have an enormous ARC
               | even when talking about personal media servers?
        
               | amarshall wrote:
               | Even then you obviously need L2ARC as well!! /s
        
               | thfuran wrote:
               | But on optane. Because obviously you need an all flash
               | main array for streaming a movie.
        
               | toast0 wrote:
               | People, especially those on the Internet, say a lot of
               | things.
               | 
               | Some of the things they say aren't credible, even if
               | they're said often.
               | 
               | You don't need an enormous amount of ram to run zfs
               | unless you have dedupe enabled. A lot of people thought
               | they wanted dedupe enabled though. (2024's fast dedupe
               | may help, but probably the right answer for most people
               | is not to use dedupe)
               | 
               | It's the same thing with the "need" for ECC. If your ram
               | is bad, you're going to end up with bad data in your
               | filesystem. With ZFS, you're likely to find out your
               | filesystem is corrupt (although, if the data is corrupted
               | before the checksum is calculated, then the checksum
               | doesn't help); with a non-checksumming filesystem, you
               | may get lucky and not have meta data get corrupted and
               | the OS keeps going, just some of your files are wrong.
               | Having ECC would be better, but there's tradeoffs so it
               | never made sense for me to use it at home; zfs still
               | works and is protecting me from disk contents changing,
               | even if what was written could be wrong.
        
               | yjftsjthsd-h wrote:
               | Not that I recall? And it's worked fine for me...
        
               | ryao wrote:
               | I have seen people say such things, and none of it was
               | based on reality. They just misinterpreted the
               | performance cliff that data deduplication had to mean you
               | must have absurd amounts of memory even though data
               | deduplication is off by default. I suspect few of the
               | people peddling this nonsense even used ZFS and the few
               | who did, had not looked very deeply into it.
        
             | amarshall wrote:
             | It's unfortunate some folks are missing the tongue-in-cheek
             | nature of your comment.
        
         | mrkeen wrote:
         | > ZFS is notorious for corrupting itself when bit flips hit it
         | and break the checksum on disk.
         | 
         | What's a bit flip?
        
           | zie wrote:
           | Basically it's that memory changes out from under you. As we
           | know, computers use Binary, so everything boils down to it
           | being a 0 or a 1. A bit flip is changing what was say a 0
           | into a 1.
           | 
           | Usually attributed to "cosmic rays", but really can happen
           | for any number of less exciting sounding reasons.
           | 
           | Basically, there is zero double checking in your computer for
           | almost _everything_ except stuff that goes across the
           | network. Memory and disks are not checked for correctness,
           | basically ever on any machine anywhere. Many servers(but
           | certainly not all) are the rare exception when it comes to
           | memory safety. They usually have ECC(Error Correction Code)
           | Memory, basically a checksum on the memory to ensure that if
           | memory is corrupted, it 's noticed and fixed.
           | 
           | Essentially every filesystem everywhere does zero data
           | integrity checking:                 MacOS APFS: Nope
           | Windows NTFS: Nope       Linux EXT4: Nope       BSD's UFS:
           | Nope       Your mobile phone: Nope
           | 
           | ZFS is the rare exception for file systems that actually
           | double check the data you save to it is the data you get back
           | from it. Every other filesystem is just a big ball of unknown
           | data. You probably get back what you put it, but there is
           | zero promises or guarantees.
        
             | crazygringo wrote:
             | > _disks are not checked for correctness, basically ever on
             | any machine anywhere._
             | 
             | I'm not sure that's really accurate -- all modern hard
             | drives and SSD's use error-correcting codes, as far as I
             | know.
             | 
             | That's different from implementing _additional_ integrity
             | checking at the filesystem level. But it 's definitely
             | there to begin with.
        
               | tpetry wrote:
               | But SSDs (to my knowledge) only implement checksum for
               | the data transfer. Its a requirement of the protocol. So
               | you can be sure that the Stuff in memory and checksum
               | computed by the CPU arrives exactly like that in the SSD
               | driver. In the past this was a common error source with
               | hardware raid which was faulty.
               | 
               | But there is ABSOLUTELY NO checksum for the bits stored
               | on a SSD. So bit rot at the cells of the SSDs are
               | undetected.
        
               | lgg wrote:
               | That is ABSOLUTELY incorrect. SSDs have enormous amounts
               | of error detection and correction builtin explicitly
               | because errors on the raw medium are so common that
               | without it you would never be able to read correct data
               | from the device.
               | 
               | It has been years since I was familiar enough with the
               | insides of SSDs to tell you exactly what they are doing
               | now, but even ~10-15 years ago it was normal for each raw
               | 2k block to actually be ~2176+ bytes and use at least 128
               | bytes for LDPC codes. Since then the block sizes have
               | gone up (which reduces the number of bytes you need to
               | achieve equivalent protection) and the lithography has
               | shrunk (which increases the raw error rate).
               | 
               | Where exactly the error correction is implemented
               | (individual dies, SSD controller, etc) and how it is
               | reported can vary depending on the application, but I can
               | say with assurance that there is no chance your OS sees
               | uncorrected bits from your flash dies.
        
               | zie wrote:
               | > I can say with assurance that there is no chance your
               | OS sees uncorrected bits from your flash dies.
               | 
               | While true, there is zero promises that what you meant to
               | save and what gets saved are the same things. All the
               | drive mostly promises is that if the drive safely wrote
               | XYZ to the disk and you come back later, you should
               | expect to get XYZ back.
               | 
               | There are lots of weasel words there on purpose. There is
               | generally zero guarantee in reality and drives lie all
               | the time about data being safely written to disk, even if
               | it wasn't actually safely written to disk yet. This means
               | on power failure/interruption the outcome of being able
               | to read XYZ back is 100% unknown. Drive Manufacturers
               | make zero promises here.
               | 
               | On most consumer compute, there is no promises or
               | guarantees that what you wrote on day 1 will be there on
               | day 2+. It mostly works, and the chances are better than
               | even that your data will be mostly safe on day 2+, but
               | there is zero promises or guarantees. We know how to
               | guarantee it, we just don't bother(usually).
               | 
               | You can buy laptops and desktops with ECC RAM and use
               | ZFS(or other checksumming FS), but basically nobody does.
               | I'm not aware of any mobile phones that offer either
               | option.
        
               | crazygringo wrote:
               | > _While true, there is zero promises that what you meant
               | to save and what gets saved are the same things. All the
               | drive mostly promises is that if the drive safely wrote
               | XYZ to the disk and you come back later, you should
               | expect to get XYZ back._
               | 
               | I'm not really sure what point you're trying to make.
               | It's using ECC, so they should be the same.
               | 
               | There isn't infinite reliability, but nothing has
               | infinite reliability. File checksums don't provide
               | infinite reliability either, because the checksum itself
               | can be corrupted.
               | 
               | You keep talking about promises and guarantees, but there
               | aren't any. All there is are statistical rates of
               | reliability. Even ECC RAM or file checksums don't offer
               | perfect guarantees.
               | 
               | For daily consumer use, the level of ECC built into disks
               | is generally far more than sufficient.
        
               | o11c wrote:
               | All MLC SSDs absolutely do data checksums and error
               | recovery, otherwise they would very lose your data much
               | more than they do.
               | 
               | You can see some stats using `smartctl`.
        
               | zie wrote:
               | Yes, the disk mostly promises what you write there will
               | be read back correctly, but that's at the disk level
               | only. The OS, Filesystem and Memory generally do no
               | checking, so any errors at those levels will propagate.
               | We know it happens, we just mostly choose to not do
               | anything about it.
               | 
               | My point was, on most consumer compute, there is no
               | promises or guarantees that what you see on day 1 will be
               | there on day 2. It mostly works, and the chances are
               | better than even that your data will be mostly safe on
               | day 2, but there is zero promises or guarantees, even
               | though we know how to do it. Some systems do, those with
               | ECC memory and ZFS for example. Other filesystems also
               | support checksumming, like BTRFS being the most common
               | counter-example to ZFS. Even though parts of BTRFS are
               | still completely broken(see their status page for
               | details).
        
             | amarshall wrote:
             | Btrfs and bcachefs both have data checksumming. I think
             | ReFS does as well.
        
               | zie wrote:
               | Yes, ZFS is not the only filesystem with data
               | checksumming and guarantees, but it's one of the very
               | rare exceptions that do.
               | 
               | ZFS has been in productions work loads since 2005, 20
               | years now. It's proven to be very safe.
               | 
               | BTRFS has known fundamental issues past one disk. It is
               | however improving. I will say BTRFS is fine for a single
               | drive. Even the developers last I checked(a few years
               | ago) don't really recommend it past a single drive,
               | though hopefully that's changing over time.
               | 
               | I'm not familiar enough with bcachefs to comment.
        
           | ahl wrote:
           | Sometimes data on disk and in memory are randomly corrupted.
           | For a pretty amazing example, check out
           | "bitsquatting"[1]--it's like domain name squatting, but
           | instead of typos, you squat on domains that would bit looked
           | up in the case of random bit flips. These can occur due e.g.
           | to cosmic rays. On-disk, HDDs and SSDs can produce the wrong
           | data. It's uncommon to see actual invalid data rather than
           | have an IO fail on ECC, but it certainly can happen (e.g. due
           | to firmware bugs).
           | 
           | [1]: https://en.wikipedia.org/wiki/Bitsquatting
        
         | terlisimo wrote:
         | > ZFS is notorious for corrupting itself when bit flips
         | 
         | That is a notorious myth.
         | 
         | https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...
        
         | ahl wrote:
         | It's very amusing that this kind of legend has persisted! ZFS
         | is notorious for *noticing* when bits flip, something APFS
         | designers claimed was rare given the robustness of Apple
         | hardware.[1][2] What would ZFS on iPhone have looked like? Hard
         | to know, and that certainly wasn't the design center.
         | 
         | Neither here nor there, but DTrace _was_ ported to iPhone--it
         | was shown to me in hushed tones in the back of an auditorium
         | once...
         | 
         | [1]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-
         | ana...
         | 
         | [2]: https://ahl.dtrace.org/2016/06/19/apfs-part5/#checksums
        
           | ryao wrote:
           | I did early ZFSOnLinux development on hardware that did not
           | have ECC memory. I once had a situation where a bit flip
           | happened in the ARC buffer for libpython.so and all python
           | software started crashing. Initially, I thought I had hit
           | some sort of blizzard bug in ZFS, so I started debugging. At
           | that time, opening a ZFS snapshot would fetch a duplicate
           | from disk into a redundant ARC buffer, so while debugging, I
           | ran cmp on libpython.so between the live copy and a snapshot
           | copy. It showed the exact bit that had flipped. After seeing
           | that and convincing myself the bitflip was not actually on
           | stable storage, I did a reboot, and all was well. Soon
           | afterward, I got a new development machine that had ECC so
           | that I would not waste my time chasing phantom bugs caused by
           | bit flips.
        
         | Modified3019 wrote:
         | ZFS _detects_ corruption.
         | 
         | A very long ago someone named cyberjock was a prolific and
         | opinionated proponent of ZFS, who wrote many things about ZFS
         | during a time when the hobbyist community was tiny and not very
         | familiar with how to use it and how it worked. Unfortunately,
         | some of their most misguided and/or outdated thoughts still
         | haunt modern consciousness like an egregore.
         | 
         | What you are probably thinking of is the proposed doomsday
         | scenario where bad ram could _theoretically_ kill a ZFS pool
         | during a scrub.
         | 
         | This article does a good job of explaining how that might
         | happen, and why being concerned about it is tilting at
         | windmills: https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-
         | ram-kill-y...
         | 
         | I have _never once_ heard of this happening in real life.
         | 
         | Hell, I've never even had bad ram. I have had bad sata/sas
         | cables, and a bad disk though. ZFS faithfully informed me there
         | was a problem, which no other file system would have done. I've
         | seen other people that start getting corruption when sata/sas
         | controllers go bad or overheat, which again is detected by ZFS.
         | 
         | What actually destroys pools is user error, followed very
         | distantly by plain old fashioned ZFS bugs that someone with an
         | unlucky edge case ran into.
        
           | tmoertel wrote:
           | > Hell, I've never even had bad ram.
           | 
           | To what degree can you separate this claim from "I've never
           | _noticed_ RAM failures "?
        
             | wtallis wrote:
             | It isn't hard to run memtest on all your computers, and
             | that _will_ catch the kind of bad RAM that the
             | aforementioned doomsday scenario requires.
        
             | Modified3019 wrote:
             | You can take that as meaning "I've never had a noticed
             | issue that was detected by extensive ram testing, or solved
             | by replacing ram".
             | 
             | I got into overclocking both regular and ECC DDR4 ram for a
             | while when AMD's 1st gen ryzen stuff came out, thanks to
             | asrock's x399 motherboard which unofficially supporting
             | ECC, allowing both it's function and reporting of errors
             | (produced when overlocking)
             | 
             | Based on my own testing and issues seen from others,
             | regular memory has quite a bit of leeway before it becomes
             | unstable, and memory that's generating errors tends to
             | constantly crash the system, or do so under certain
             | workloads.
             | 
             | Of course, without ECC you can't _prove_ every single
             | operation has been fault free, but as some point you call
             | it close enough.
             | 
             | I am of the opinion that ECC memory is _the best_ memory to
             | overclock, precisely because you can prove stability simply
             | by using the system.
             | 
             | All that said, as things become smaller with tighter
             | specifications to squeeze out faster performance, I do grow
             | more leery of intermittent single errors that occur on the
             | order of weeks or months in newer generations of hardware.
             | I was once able to overclock my memory to the edge of what
             | I thought was stability as it passed all tests for days,
             | but about every month or two there'd be a few corrected
             | errors show up in my logs. Typically, any sort of stability
             | is caught by manual tests within minutes or the hour.
        
           | wtallis wrote:
           | To me, the most implausible thing about ZFS-without-ECC
           | doomsaying is the presumption that the failure mode of RAM is
           | a persistently stuck bit. That's _way_ less common than
           | transient errors, and way more likely to be noticed, since it
           | will destabilize any piece of software that uses that address
           | range. And now that all modern high-density DRAM includes on-
           | die ECC, transient data corruption on the link between DRAM
           | and CPU seems overwhelmingly more likely than a stuck bit.
        
       | rrdharan wrote:
       | Kind of odd that the blog states that "The architect for ZFS at
       | Apple had left" and links to the LinkedIn profile of someone who
       | doesn't have any Apple work experience listed on their resume. I
       | assume the author linked to the wrong profile?
        
         | nikhizzle wrote:
         | Ex-Apple File System engineer here who shared an office with
         | the other ZFS lead at the time. Can confirm they link to the
         | wrong profile for Don Brady.
         | 
         | This is the correct person: https://github.com/don-brady
         | 
         | Also can confirm Don is one of the kindest, nicest principal
         | engineer level people I've worked with in my career. Always had
         | time to mentor and assist.
        
           | ahl wrote:
           | Not sure how I fat-fingered Don's LinkedIn, but I'm updating
           | that 9-year-old typo. Agreed that Don is a delight. In the
           | years after this article I got to collaborate more with him,
           | but left Delphix before he joined to work on ZFS.
        
           | whitepoplar wrote:
           | Given your expertise, any chance you can comment on the risk
           | of data corruption on APFS given that it only checksums
           | metadata?
        
             | nikhizzle wrote:
             | I moved out of the kernel in 2008 and never went back, so
             | don't have a wise opinion here which would be current.
        
       | smittywerben wrote:
       | Thanks for sharing I was just looking for what happened to Sun. I
       | like the second-hand quote comparing the IBM and HP as "garbage
       | trucks colliding" plus the inclusion of blog posts with links to
       | the court filings.
       | 
       | Is it fair to say ZFS made most sense on Solaris using Solaris
       | Containers on SPARK?
        
         | ahl wrote:
         | ZFS was developed in Solaris, and at the time we were mostly
         | selling SPARC systems. That changed rapidly and the biggest
         | commercial push was in the form of the ZFS Storage Appliance
         | that our team (known as Fishworks) built at Sun. Those systems
         | were based on AMD servers that Sun was making at the time such
         | as Thumper [1]. Also in 2016, Ubuntu leaned in to use of ZFS
         | for containers [2]. There was nothing that specific about
         | Solaris that made sense for ZFS, and even less of a connection
         | to the SPARC architecture.
         | 
         | [1]: https://www.theregister.com/2005/11/16/sun_thumper/
         | 
         | [2]: https://ubuntu.com/blog/zfs-is-the-fs-for-containers-in-
         | ubun...
        
           | ghaff wrote:
           | Yeah I think if it hadn't been for the combination of Oracle
           | and CDDL, Red Hat would have been more interested in for
           | Linux. As it was they basically went with XFS and volume
           | management. Fedora did eventually go with btrfs but dints
           | know if there are are any plans for copy-on-write FS for RHEL
           | at any point.
        
             | m4rtink wrote:
             | Fedora Server uses XFS on LVM by default & you can do CoW
             | with any modern filesystem on top of an LVM thin pool.
             | 
             | And there is also the Stratis project Red Hat is involved
             | in: https://stratis-storage.github.io/
        
               | ghaff wrote:
               | It looks like btrfs is/was the default for just Fedora
               | Workstation. I'm less connected to Red Hat filesystem
               | details than I used to be.
        
               | curt15 wrote:
               | TIL Stratis is still alive. I thought it basically went
               | on life support after the lead dev left Red Hat.
               | 
               | Still no checksumming though...
        
             | ryao wrote:
             | RedHat's policy is no out of tree kernel modules, so it
             | would not have made a difference.
        
               | ghaff wrote:
               | It's not like Red Hat had/has no influence over what
               | makes it into mainline. But the options for copy on write
               | were either relatively immature or had license issues in
               | their view.
        
               | ryao wrote:
               | Their view is that if it is out of tree, they will not
               | support it. This supersedes any discussion of license.
               | Even out of tree GPL drivers are not supported by RedHat.
        
           | thyristan wrote:
           | We had those things at work as fileservers, so no containers
           | or anything fancy.
           | 
           | Sun salespeople tried to sell us the idea of "zfs filesystems
           | are very cheap, you can create many of them, you don't need
           | quota" (which ZFS didn't have at the time), which we tried
           | out. It was abysmally slow. It was even slow with just one
           | filesystem on it. We scrapped the whole idea, just put Linux
           | on them and suddenly fileserver performance doubled. Which is
           | something we weren't used to with older Solaris/Sparc/UFS or
           | /VXFS systems.
           | 
           | We never tried another generation of those, and soon after
           | Sun was bought by Oracle anyways.
        
             | kjellsbells wrote:
             | I had a combination uh-oh/wow! moment back in those days
             | when the hacked up NFS server I built on a Dell with Linux
             | and XFS absolutely torched the Solaris and UFS system we'd
             | been using for development. Yeah, it wasnt apples to
             | apples. Yes, maybe ZFS would have helped. But XFS was
             | proven at SGI and it was obvious that the business would
             | save thousands overnight by moving to Linux on Dell instead
             | of sticking with Sun E450s. That was the death knell for my
             | time as a Solaris sysadmin, to be honest.
        
               | thyristan wrote:
               | ZFS probably wouldn't have helped. One of my points is,
               | ZFS was slower than UFS in our setup. And both where
               | slower than Linux on the same hardware.
        
           | ryao wrote:
           | > There was nothing that specific about Solaris that made
           | sense for ZFS, and even less of a connection to the SPARC
           | architecture.
           | 
           | Although it does not change the answer to the original
           | question, I have long been under the impression that part of
           | the design of ZFS had been influenced by the Niagara
           | processor. The heavily threaded ZIO pipeline had been so
           | forward thinking that it is difficult to imagine anyone
           | devising it unless they were thinking of the future that the
           | Niagara processor represented.
           | 
           | Am I correct to think that or did knowledge of the upcoming
           | Niagara processor not shape design decisions at all?
           | 
           | By the way, why did Thumper use an AMD Opteron over the
           | UltraSPARC T1 (Niagara)? That decision seems contrary to idea
           | of putting all of the wood behind one arrow.
        
             | ahl wrote:
             | I don't recall that being the case. Bonwick had been
             | thinking about ZFS for at least a couple of years. Matt
             | Ahrens joined Sun (with me) in 2001. The Afara acquisition
             | didn't close until 2002. Niagara certainly was tantalizing
             | but it wasn't a primary design consideration. As I recall,
             | AMD was head and shoulders above everything else in terms
             | of IO capacity. Sun was never very good (during my tenure
             | there) at coordination or holistic strategy.
        
             | bcantrill wrote:
             | Niagara did not shape design decisions at all -- remember
             | that Niagara was really only doing on a single socket what
             | we had already done on large SMP machines (e.g.,
             | Starfire/Starcat). What _did_ shape design decisions -- or
             | at least informed thinking -- was a belief that all main
             | memory would be non-volatile within the lifespan of ZFS.
             | (Still possible, of course!) I don 't know that there are
             | any true artifacts of that within ZFS, but I would say that
             | it affected thinking much more than Niagara.
             | 
             | As for Thumper using Opteron over Niagara: that was due to
             | many reasons, both technological (Niagara was interesting
             | but not world-beating) and organizational (Thumper was a
             | result of the acquisition of Kealia, which was
             | independently developing on AMD).
        
               | ryao wrote:
               | Thanks. I had been unaware of the Starfire/Starcat
               | machines.
        
           | smittywerben wrote:
           | Thanks. Also, the Thumper looks awesome like a max-level
           | MMORPG character that would kill the level-1 consumer
           | Synology NAS character in one hit.
        
       | jFriedensreich wrote:
       | The death of ZFS in macOS was a huge shift in the industry. This
       | has to be seen in the context of microsoft killed their largely
       | ambitious WinFS which felt like the death of desktop innovation
       | in combination.
        
         | thyristan wrote:
         | Both are imho linked to "offline desktop use cases are not
         | important anymore". Both companies saw their future gains
         | elsewhere, in internet-related functions and what became known
         | as "cloud". No need to have a fancy, featurefull and expensive
         | filesystem when it is only to be used as a cache for remote
         | cloud stuff.
        
           | em500 wrote:
           | Linux or FreeBSD developers are free to adopt ZFS as their
           | primary file systems. But it appears that practical benefits
           | are not really evident to most users.
        
             | thyristan wrote:
             | Lots of ZFS users are enthusiasts who heard about that one
             | magic thing that does it all in one tidy box. Whereas
             | usually you would have to known all the minutiae of
             | LVM/mdadm/cryptsetup/nbd and mkfs.whatever to get to the
             | same point. So while ZFS is the nicer-dicer of volume
             | management and filesystems, the latter is your whole chef's
             | knife set. And while you can dice with both, the user
             | groups are not the same. And enthusiasts with the right
             | usecases are very few.
             | 
             | And for the thin-provisioned snapshotted subvolume usecase,
             | btrfs is currently eating ZFS's lunch due to far better
             | Linux integration. Think snapshots at every update, and
             | having a/b boot to get back to a known-working config after
             | an update. So widespread adoption through the distro route
             | is out of the question.
        
               | queenkjuul wrote:
               | Ubuntu's ZFS-on-root with zsys auto snapshots have been
               | working excellently on my server for 5 years. It
               | automatically takes snapshots on every update and adds
               | entries to grub so rolling back to the last good state is
               | just a reboot away.
        
             | wkat4242 wrote:
             | That's called marketing. Give it a snazzy name, like say
             | "TimeMachine" and users will jump on it.
             | 
             | Also, ZFS has a bad name within the Linux community due to
             | some licensing stuff. I find that most BSD users don't
             | really care about such legalese and most people I know that
             | run FreeBSD are running ZFS on root. Which works amazingly
             | well I might add.
             | 
             | Especially with something like sanoid added to it, it
             | basically does the same as timemachine on mac, a feature
             | that users love. Albeit stored on the same drive (but with
             | syncoid or just manually rolled zfs send/recv scripts you
             | can do that on another location too).
        
               | cherryteastain wrote:
               | > ZFS has a bad name within the Linux community due to
               | some licensing stuff
               | 
               | This is out of an abundance of caution. Canonical bundle
               | ZFS in the Ubuntu kernel and no one sued them (yet).
        
               | wkat4242 wrote:
               | True and I understand the caution considering Oracle is
               | involved which are an awful company to do deal with (and
               | their takeover of Sun was a disaster).
               | 
               | But really, this is a concern for distros. Not for end
               | users. Yet many of the Linux users I speak to are somehow
               | worried about this. Most can't even describe the
               | provisions of the GPL so I don't really know what that's
               | about. Just something they picked up, I guess.
        
               | thyristan wrote:
               | Licensing concerns that prevent distros from using ZFS
               | will sooner or later also have adverse effects on end
               | users. Actually those effects are already there: The
               | constant need to adapt a large patchset to the current
               | kernel, meaning updates are a hassle. The lack of
               | packaging in distributions, meaning updates are a hassle.
               | And the lack of integration and related tooling, meaning
               | many features can not be used (like a/b boots from
               | snapshots after updates) easily, and installers won't
               | know about ZFS so you have to install manually.
               | 
               | None of this is a worry about being sued as an end user.
               | But all of those are worries that you life will be harder
               | with ZFS, and a lot harder as soon as the first lawsuits
               | hit anyone, because all the current (small) efforts to
               | keep it working will cease immediately.
        
               | ryao wrote:
               | Unlike other out of tree filesystems such as Reiser4, the
               | ZFS driver does not patch the kernel sources.
        
               | thyristan wrote:
               | That is due to licensing reasons, yes. It makes
               | maintaining the codebase even more complicated because
               | when the kernel module API changes (which it very
               | frequently does) you cannot just adapt it to your needs,
               | you have to work around all the new changes that are
               | there in the new version.
        
               | ryao wrote:
               | You have things backward. Licensing has nothing to do
               | with it. Changes to the kernel are unnecessary.
               | Maintaining the code base is also simplified by
               | supporting the various kernel versions the way that they
               | are currently supported.
        
               | yjftsjthsd-h wrote:
               | > I find that most BSD users don't really care about such
               | legalese and most people I know that run FreeBSD are
               | running ZFS on root.
               | 
               | I don't think it's that they don't _care_ , it's that the
               | CDDL and BSD-ish licenses are generally believed to just
               | not have the conflict that CDDL and GPL might. (IANAL,
               | make your own conclusions about whether either of those
               | are true)
        
               | gruturo wrote:
               | >I find that most BSD users don't really care about such
               | legalese and most people I know that run FreeBSD are
               | running ZFS on root.
               | 
               | What a weird take. BSD's license is compatible with ZFS,
               | that's why. "Don't really care?" Really? Come on.
        
               | badc0ffee wrote:
               | Time Machine was released 17 years ago, and I wish
               | Windows had anything that good. And they're on their 3rd
               | backup system since then.
        
               | mdaniel wrote:
               | > I wish Windows had anything that good
               | 
               | I can't readily tell how much of the dumbness is from the
               | filesystem and how much from the kernel but the end
               | result is that until it gets away from 1980s version of
               | file locking there's no prayer. Imagine having to explain
               | to your boss that your .docx wasn't backed up because you
               | left Word open over the weekend. A just catastrophically
               | idiotic design
        
             | m4rtink wrote:
             | The ZFS license makes it impossible to include in upstream
             | Linux kernel, which makes it much less usable as primary
             | filesystem.
        
               | ryao wrote:
               | Linux's signed off policy makes that impossible. Linus
               | Torvalds would need Larry Elison's signed off before even
               | considering it. Linus told me this by email around 2013
               | (if I recall correctly) when I emailed him to discuss
               | user requests for upstream inclusion. He had no concerns
               | about the license being different at the time.
        
             | Gud wrote:
             | ZFS is a first class citizen in FreeBSD and has been for at
             | least a decade(probably longer). Not at all like in most
             | Linux distros.
        
             | toast0 wrote:
             | ZFS on FreeBSD is quite nice. System tools like freebsd-
             | update integrate well. UFS continues to work as well, and
             | may be more appropriate for some use cases where ZFS isn't
             | a good fit, copy on write is sometimes very expensive.
             | 
             | Afaik, the FreeBSD position is both ZFS and UFS are fully
             | supported and neither is secondary to the other; the
             | installer asks what you want from ZFS, UFS, Manual (with a
             | menu based tool), or Shell and you do whatever; in that
             | order, so maybe a slight preferance towards ZFS.
        
             | lotharcable wrote:
             | OpenZFS exists and there is a port of it for Mac OS X.
             | 
             | The problem is that it is still owned by Oracle. And
             | Solaris ZFS is incompatible with OpenZFS. Not that people
             | really use Solaris anymore.
             | 
             | It is really unfortunate. Linux has adopted file systems
             | from other operating systems before. It is just nobody
             | trust Oracle.
        
           | 8fingerlouie wrote:
           | Exactly this.
           | 
           | The business case for providing a robust desktop filesystem
           | simply doesn't exist anymore.
           | 
           | 20 years ago, (regular) people stored their data on computers
           | and those needed to be dependable. Phones existed, but not to
           | the extent they do today.
           | 
           | Fast forward 20 years, and many people don't even own a
           | computer (in the traditional sense, many have consoles).
           | People now have their entire life on their phones, backed up
           | and/or stored in the cloud.
           | 
           | SSDs also became "large enough" that HDDs are mostly a thing
           | of the past in consumer computers.
           | 
           | Instead you today have high reliability hardware and software
           | in the cloud, which arguably is much more resilient than
           | anything you could reasonably cook up at home. Besides the
           | hardware (power, internet, fire suppression, physical
           | security, etc), you're also typically looking at multi
           | geographical redundancy across multiple data centers using
           | reed-Solomon erasure coding, but that's nothing the ordinary
           | user needs to know about.
           | 
           | Most cloud services also offer some kind of snapshot
           | functionality as malware protection (ie OneDrive offers
           | unlimited snapshots for 30 days rolling).
           | 
           | Truth is that most people are way better off just storing
           | their data in the cloud and making a backup at home, though
           | many people seem to ignore the latter, and Apple makes it
           | exceptionally hard to automate.
        
             | ryao wrote:
             | What do you do when you discover that some thing you have
             | not touched in a long time, but suddenly need, is corrupted
             | and all of your backups are corrupt because the corruption
             | happened prior to your 30 day window at OneDrive?
             | 
             | You would have early warning with ZFS. You have data loss
             | with your plan.
        
             | sho_hn wrote:
             | Workstation use cases exist. Data archival is not the only
             | application of file systems.
        
           | GeekyBear wrote:
           | Internet connections of the day didn't yet offer enough speed
           | for cloud storage.
           | 
           | Apple was already working to integrate ZFS when Oracle bought
           | Sun.
           | 
           | From TFA:
           | 
           | > ZFS was featured in the keynotes, it was on the developer
           | disc handed out to attendees, and it was even mentioned on
           | the Mac OS X Server website. Apple had been working on its
           | port since 2006 and now it was functional enough to be put on
           | full display.
           | 
           | However, once Oracle bought Sun, the deal was off.
           | 
           | Again from TFA:
           | 
           | > The Apple-ZFS deal was brought for Larry Ellison's
           | approval, the first-born child of the conquered land brought
           | to be blessed by the new king. "I'll tell you about doing
           | business with my best friend Steve Jobs," he apparently said,
           | "I don't do business with my best friend Steve Jobs."
           | 
           | And that was the end.
        
             | mixmastamyk wrote:
             | Was it not open source at that point?
        
               | ahl wrote:
               | It was! And Apple seemed fine with including DTrace under
               | the CDDL. I'm not sure why Apple wanted some additional
               | arrangement but they did.
        
           | wenc wrote:
           | I remember eagerly anticipating ZFS for desktop hard disks. I
           | seem to remember it never took off because memory
           | requirements were too high and payoffs were insufficient to
           | justify the trade off.
        
         | deburo wrote:
         | APFS? That still happened.
        
       | ahl wrote:
       | Back in 2016, Ars Technica picked up this piece from my blog [1]
       | as well as a longer piece reviewing the newly announced APFS [2]
       | [3]. Glad it's still finding an audience!
       | 
       | [1]: https://arstechnica.com/gadgets/2016/06/zfs-the-other-new-
       | ap...
       | 
       | [2]: https://ahl.dtrace.org/2016/06/19/apfs-part1/
       | 
       | [3]: https://arstechnica.com/gadgets/2016/06/a-zfs-developers-
       | ana...
        
       | throw0101b wrote:
       | Apple and Sun couldn't agree on a 'support contract'. From Jeff
       | Bonwick, one of the co-creators ZFS:
       | 
       | >> _Apple can currently just take the ZFS CDDL code and
       | incorporate it (like they did with DTrace), but it may be that
       | they wanted a "private license" from Sun (with appropriate
       | technical support and indemnification), and the two entities
       | couldn't come to mutually agreeable terms._
       | 
       | > _I cannot disclose details, but that is the essence of it._
       | 
       | * https://archive.is/http://mail.opensolaris.org/pipermail/zfs...
       | 
       | Apple took DTrace, licensed via CDDL--just like ZFS--and put it
       | into the kernel without issue. Of course a file system is much
       | more central to an operating system, so they wanted much more of
       | a CYA for that.
        
       | secabeen wrote:
       | ZFS remains an excellent filesystem for bulk storage on rust, but
       | were I Apple at the time, I would probably want to focus on
       | something built for the coming era of flash and NVMe storage.
       | There are a number of axioms built into ZFS that come out of the
       | spinning disk era that still hold it back for flash-only
       | filesystems.
        
         | ahl wrote:
         | Certainly one would build something different starting in 2025
         | rather than 2001, but do you have specific examples of how
         | ZFS's design holds it back? I think it has been adapted
         | extremely well for the changing ecosystem.
        
       | whartung wrote:
       | As a desktop user, I am content with APFS. The only feature from
       | ZFS that I would like, is the corruption detection. I honestly
       | don't know how robust the image and video formats are to bit
       | corruption. On the one hand, potentially, "very" robust. But on
       | the other, I would think that there are some very special bits
       | that if toggled can potentially "ruin" the entire file. But I
       | don't know.
       | 
       | However, I can say, every time I've tried ZFS on my iMac, it was
       | simply a disaster.
       | 
       | Just trying to set it up on a single USB drive, or setting it up
       | to mirror a pair. The net effect was that it CRUSHED the
       | performance on my machine. It became unusable. We're talking
       | "move the mouse, watch the pointer crawl behind" unusable. "Let's
       | type at 300 baud" unusable. Interactive performance was shot.
       | 
       | After I remove it, all is right again.
        
         | ryao wrote:
         | > I honestly don't know how robust the image and video formats
         | are to bit corruption.
         | 
         | It depends on the format. A BMP image format would limit the
         | damage to 1 pixel, while a JPEG could propagate the damage to
         | potentially the entire image. There is an example of a bitflip
         | damaging a picture here:
         | 
         | https://arstechnica.com/information-technology/2014/01/bitro...
         | 
         | That single bit flip ruined about half of the image.
         | 
         | As for video, that depends on how far apart I frames are. Any
         | damage from a bit flip would likely be isolated to the section
         | of video from the bitflip until the next I-frame occurs. As for
         | how bad it could be, it depends on how the encoding works.
         | 
         | > On the one hand, potentially, "very" robust.
         | 
         | Only in uncompressed files.
         | 
         | > But on the other, I would think that there are some very
         | special bits that if toggled can potentially "ruin" the entire
         | file. But I don't know.
         | 
         | The way that image compression works means that a single bit
         | flip prior to decompression can affect a great many pixels, as
         | shown at Ars Technica.
         | 
         | > However, I can say, every time I've tried ZFS on my iMac, it
         | was simply a disaster.
         | 
         | Did you file an issue? I am not sure what the current status of
         | the macOS driver's production readiness is, but it will be
         | difficult to see it improve if people do not report issues that
         | they have.
        
         | mdaniel wrote:
         | > Just trying to set it up on a single USB drive
         | 
         | That's the fault of macOS, I also experienced 100% CPU and load
         | off the charts and it was kernel_task jammed up by USB. Once I
         | used a Thunderbolt enclosure it started to be sane. This
         | experience was the same across multiple non-Apple filesystems
         | as I was trying a bunch to see which one was the best at cross-
         | os compatibility
         | 
         | Also, separately, ZFS says "don't run ZFS on USB". I didn't
         | have problems with it, but I knew I was rolling the dice
        
           | queenkjuul wrote:
           | Yeah they do say that but anecdotally my Plex server has been
           | ZFS over USB 3 since 2020 with zero problems (using Ubuntu
           | 20.04)
           | 
           | Anyway only bringing it up to reinforce that it is probably a
           | macOS problem.
        
       | ewuhic wrote:
       | What's the current state of ZFS on Macos? As far as I'm aware
       | there's a supported fork.
        
       | ein0p wrote:
       | ZFS sort of moved inside the NVMe controller - it also checksums
       | and scrubs things all the time, you just don't see it. This does
       | not, however, support multi-device redundant storage, but that is
       | not a concern for Apple - the vast majority of their devices have
       | only one storage device.
        
       ___________________________________________________________________
       (page generated 2025-04-27 23:01 UTC)