[HN Gopher] Copy-on-write performance and debugging
       ___________________________________________________________________
        
       Copy-on-write performance and debugging
        
       Author : meysamazad
       Score  : 60 points
       Date   : 2024-06-24 03:01 UTC (19 hours ago)
        
 (HTM) web link (devblogs.microsoft.com)
 (TXT) w3m dump (devblogs.microsoft.com)
        
       | bhouston wrote:
       | I had to read an early blog to figure out what it was:
       | 
       | https://devblogs.microsoft.com/engineering-at-microsoft/dev-...
       | 
       | "Copy-on-write (CoW) linking, also known as block cloning in the
       | Windows API documentation, avoids fully copying a file by
       | creating a metadata reference to the original data on-disk. CoW
       | links are like hardlinks but are safe to write to, as the
       | filesystem lazily copies the original data into the link as
       | needed when opened for append or random-access write. With a CoW
       | link you save disk space and time since the link consists of a
       | small amount of metadata and they write fast."
       | 
       | It seems there is a MacOS implementation:
       | https://github.com/dotnet/runtime/pull/79243
       | 
       | But it seems that this is .Net specific and not something that
       | would speed up other build systems? It is confusing if this can
       | apply to other build technologies other than .NET. Can it speed
       | up TypeScript/JavaScript builds? Can it speed up Rust builds?
       | Also what are the speed ups on these other platforms like MacOS
       | and Linux?
       | 
       | Is this something that all build systems and all OSes would
       | benefit from?
       | 
       | I guess this blog post for me raises more questions than it
       | answers.
        
         | supriyo-biswas wrote:
         | Certain filesystems like XFS do support CoW copying, and ZFS
         | also does chunk-based deduplication. You'd typically use it
         | through `cp --reflink` and similar.
        
         | kmeisthax wrote:
         | The block cloning feature in newer versions of Windows is
         | enabled by copy-on-write filesystems. macOS ships with one by
         | default - APFS. Linux also has BTRFS, and before that there was
         | the ZFS-on-Linux project. Microsoft is now shipping a CoW
         | filesystem for Windows that appears to be a ReFS derivative.
         | 
         | This is .NET specific insamuch as this is getting MSBuild to
         | take advantage of ReFS features; but there's no particular
         | reason why other build systems couldn't take advantage of ReFS
         | in the same way MSBuild can take advantage of APFS. The main
         | question is if the build system needs to make lots of copies of
         | files that may not ever be updated. I imagine anything that
         | does dependency fetching (especially Node/NPM) would benefit.
        
           | bhouston wrote:
           | I know that npm now has a per-user cache in ~/.npm:
           | 
           | https://docs.npmjs.com/cli/v7/commands/npm-cache
           | 
           | I am not sure if it uses CoW to bring those packages into
           | each project. If it did, that would be efficient and speed up
           | "npm install" if the cache was warm.
        
             | derefr wrote:
             | Language package managers don't need copy-on-write, because
             | there's no "write" -- the files that make up dependencies
             | are immutable from the perspective of the projects that
             | they get installed into. There's no advantage to using CoW
             | to "deploy" such files into work trees, over using plain-
             | old hard links to do so. (And hard-linking these files is
             | indeed what all the Node package managers -- other than NPM
             | -- already do.)
        
           | nullindividual wrote:
           | Volume Shadow Copies (Win Svr 2003) on NTFS were the first
           | implementation by Microsoft of CoW. But it was limited to VSS
           | snapshots, so not useful for day-to-day storage usage.
           | 
           | DevDrive is not a derivative of ReFS, it is ReFS with some
           | file system filter bits turned off among a couple of other
           | things. DevDrive is a collection of features centered around
           | ReFS for the purposes of speeding up file read/writes (think
           | node modules).
        
           | adrian_b wrote:
           | Linux XFS has also added copy-on-write a few years ago.
           | 
           | Initially I was not aware of this and I was surprised when I
           | have copied a directory with a total size greater than 50 GB
           | and the copy was instantaneous. At first I believed that I
           | had given some wrong command, but then I searched the XFS
           | documentation and I saw that this was a new feature at that
           | time.
        
         | ack_complete wrote:
         | It could speed up other build systems, but the .NET build
         | system (MSBuild) has a particular design issue where by default
         | it will copy dependencies local to each project that's using
         | them (Copy Local). This leads to assemblies being copied
         | multiple times throughout the filesystem according to the build
         | process.
        
         | neonsunset wrote:
         | The article talks about CoW as feature of ReFS, while the
         | linked PR in dotnet/runtime is about adjusting the way File API
         | issues calls on macOS so they take advantage of APFS's CoW
         | instead.
        
       | mgerdts wrote:
       | I think they use a lot of extra words to say that ReFS will
       | support the equivalent of cp --reflink.
        
       | 42lux wrote:
       | And with all that said WSL2 still buffers file transfers in
       | RAM...
        
       | Joker_vD wrote:
       | You know, I've always been kinda amused that something very
       | simple like "cat a b >c" or even "fa = open("a", O_APPEND |
       | O_WRONLY); fb = open("b", O_RDONLY); sendfile(fa, fb, NULL,
       | 0x7ffff000);" doesn't really have either user-visible specialized
       | API nor under-the-hood speed ups in the FS implementations. It's
       | just gluing two files together, it's got to be a very popular
       | operation, about as popular as "prepend the contents of file A to
       | file B". But you can't do it in-place which is kinda annoying
       | when you have to preserve the existing files' attributes.
        
         | jmole wrote:
         | What attributes would be worth preserving that you wouldn't
         | otherwise be able to do?
        
           | Joker_vD wrote:
           | Having the same permissions and the owner would be nice,
           | which a bit annoying to pull off with the "write to a
           | temporary file, then rename it over the original one"
           | approach. Also, mtime/atime. And the xattrs, of course.
        
         | derefr wrote:
         | Yes, in theory, any filesystem could trivially add a feature of
         | "ref-counted immutable extents" -- where a special syscall
         | equivalent to `cat a b c > d` could be implemented that creates
         | an inode d that consists of references to the existing extents
         | of a+b+c.
         | 
         | (The shared extents have to be _immutable_ , because on non-CoW
         | filesystems, filesystem locks apply to "byte ranges of inodes",
         | not to extents or slices thereof; so extents could only be
         | _safely_ shared between inodes if they forced the inodes
         | referencing them to act as if they were always reader-locked.)
         | 
         | You could even implement this on e.g. Linux ext4 today -- you
         | could consider extents immutable if they're part of an
         | immutable (chattr +i) file that has no additional hard links;
         | and you could prevent any files that are "sharing" immutable
         | extents from being made non-immutable (where in the above, the
         | syscall would create a file that is immediately immutable.)
         | 
         | This would basically result in the same semantics +
         | efficiencies that you get with "composite uploads" in an object
         | store.
         | 
         | ---
         | 
         | Given a CoW filesystem, you could probably extend this concept
         | to allow arbitrary CoW blocks to be explicitly referenced from
         | file A into file B _without_ any need for immutability -- it 'd
         | just be an explicit "partial" reflink. (This is already
         | possible for the simple A->B case, by starting with a CoW
         | clone, and then overwriting the blocks that shouldn't be
         | shared. But more complex cases like A+B+C->D above, aren't
         | possible; nor is having those shared blocks be in a different
         | position in the clone than they are in the original; and so
         | forth.)
         | 
         | It wouldn't quite work like you're imagining with sendfile(2),
         | though, because the CoW sharing could only occur at filesystem-
         | block boundaries. You still wouldn't be able to use partial
         | reflinking to optimize the operation of e.g. adding three bytes
         | of header to a file (unless you also added BLKSZ-3 bytes of
         | padding.)
        
         | magicalhippo wrote:
         | > about as popular as "prepend the contents of file A to file
         | B"
         | 
         | I've never understood why filesystems don't easily support
         | prepending data, or to truncate the start of a file.
         | 
         | It should be, as far as I can see, about as trivial to support
         | as appending and truncating the end, and is something that
         | comes up quite often in application code. Even if it is a bit
         | more tricky, I think the benefits would be great in the cases
         | where it's needed.
         | 
         | Instead applications are left having to rewrite the contents.
        
           | kbolino wrote:
           | A queue is simpler to implement than a deque and the same is
           | true of a file system: supporting growing the file in both
           | directions is more complicated than supporting growing the
           | file in one direction. In practice, append is much more
           | common than prepend, so the extra bookkeeping and code
           | doesn't seem to be worth it in general.
        
             | o11c wrote:
             | That's much less of a concern ever since everybody switched
             | to extent-based filesystems.
             | 
             | The real concern is block alignment.
        
         | tedunangst wrote:
         | So if I prepend 17 bytes to a file, where are they stored? And
         | if I prepend another 47 bytes, etc.? How would this be tracked?
        
           | kbolino wrote:
           | Same as going forward, you'd grab a free block and simply
           | fill it backwards from the end instead of forwards from the
           | beginning. But the file system would have to support file
           | data starting mid-block and new blocks getting added to the
           | head of the file. The problem is that there'd be more
           | bookkeeping data to store, more code to implement it, and
           | more edge cases to handle for concurrent writes.
        
             | tedunangst wrote:
             | That'd be hell for mmap, too.
        
         | o11c wrote:
         | On some Linux filesystems you can do this _if_ the chunk to be
         | inserted is a multiple of the block size.
         | 
         | See `FALLOC_FL_COLLAPSE_RANGE` and `FALLOC_FL_INSERT_RANGE` in
         | `fallocate(2)`
        
         | lmm wrote:
         | > It's just gluing two files together, it's got to be a very
         | popular operation, about as popular as "prepend the contents of
         | file A to file B".
         | 
         | Realistically I don't think that's going to happen very often.
         | What would be the use case? The only case I can think of is
         | something like tar where you pack up a bunch of files as a
         | single file, and you usually do compression at the same time in
         | that case.
        
       | tedunangst wrote:
       | > We ran into this problem on one machine that had run continuous
       | CoW builds for weeks under a prerelease CoW-in-Win32
       | implementation, so we don't expect this to appear in the wild
       | very often.
       | 
       | That's not exactly confidence inspiring.
        
       | whalesalad wrote:
       | Crazy to see Microsoft talking about performance like they have
       | any expertise in the matter.
        
       | forrestthewoods wrote:
       | Oh man. My dream Git Successor combines a Virtual File System
       | with a Copy-on-Write cache to allow repos to trivially commit all
       | their dependencies including compiler toolchains.
       | 
       | Windows having CoW makes my far fetched dream a possibility.
        
       | tester756 wrote:
       | >Dev Drive was released
       | 
       | I tried that Dev Drive thing and I havent seen perf improvement
       | when building C++ code, sadly.
        
       | justinlloyd wrote:
       | Have been running ReFS on a drive on my Windows 10 workstation
       | for about three years, and recently started using a dev drive
       | equivalent on Windows 10 for the past two months. Our Unreal
       | Engine project is quite large, 600+GB straight from the P4 depot
       | before building. I need to keep a few separate workspaces around,
       | one for current development work, one for swarm reviews, one for
       | "let me test out a thing that might break" because as we know,
       | branching in Perforce can quite painful, especially on large
       | depots. At one point I needed to have dozens of workspaces synced
       | to specific changelists whilst we hunted down a bug in one of our
       | levels.
       | 
       | ReFS, with block de-duplication and LZ4 compression has reduced
       | the per-workspace footprint to around 10% of what it was
       | previously. Decreased build times by around 5% and decreased
       | archive, stage and package times by about 80% by deploying
       | MSBuild SDK CopyOnWrite. I also moved the DDC onto the VHDX where
       | the project resides which has further reduced the footprint of
       | the project.
       | 
       | Windows 11 canary channel (still in canary I think) has a
       | modified Win32 that supports CoW FileCopyEx. You can get similar
       | gains by other means on Win10 and Win11 by using ReFS CoW aware
       | utilities.
       | 
       | Have used XFS, BTRFS, APFS and others extensively over the years,
       | so I am glad that Windows is finally getting in on the action.
        
       ___________________________________________________________________
       (page generated 2024-06-24 23:00 UTC)