[HN Gopher] Copy-on-write performance and debugging
___________________________________________________________________
Copy-on-write performance and debugging
Author : meysamazad
Score : 60 points
Date : 2024-06-24 03:01 UTC (19 hours ago)
(HTM) web link (devblogs.microsoft.com)
(TXT) w3m dump (devblogs.microsoft.com)
| bhouston wrote:
| I had to read an early blog to figure out what it was:
|
| https://devblogs.microsoft.com/engineering-at-microsoft/dev-...
|
| "Copy-on-write (CoW) linking, also known as block cloning in the
| Windows API documentation, avoids fully copying a file by
| creating a metadata reference to the original data on-disk. CoW
| links are like hardlinks but are safe to write to, as the
| filesystem lazily copies the original data into the link as
| needed when opened for append or random-access write. With a CoW
| link you save disk space and time since the link consists of a
| small amount of metadata and they write fast."
|
| It seems there is a MacOS implementation:
| https://github.com/dotnet/runtime/pull/79243
|
| But it seems that this is .Net specific and not something that
| would speed up other build systems? It is confusing if this can
| apply to other build technologies other than .NET. Can it speed
| up TypeScript/JavaScript builds? Can it speed up Rust builds?
| Also what are the speed ups on these other platforms like MacOS
| and Linux?
|
| Is this something that all build systems and all OSes would
| benefit from?
|
| I guess this blog post for me raises more questions than it
| answers.
| supriyo-biswas wrote:
| Certain filesystems like XFS do support CoW copying, and ZFS
| also does chunk-based deduplication. You'd typically use it
| through `cp --reflink` and similar.
| kmeisthax wrote:
| The block cloning feature in newer versions of Windows is
| enabled by copy-on-write filesystems. macOS ships with one by
| default - APFS. Linux also has BTRFS, and before that there was
| the ZFS-on-Linux project. Microsoft is now shipping a CoW
| filesystem for Windows that appears to be a ReFS derivative.
|
| This is .NET specific insamuch as this is getting MSBuild to
| take advantage of ReFS features; but there's no particular
| reason why other build systems couldn't take advantage of ReFS
| in the same way MSBuild can take advantage of APFS. The main
| question is if the build system needs to make lots of copies of
| files that may not ever be updated. I imagine anything that
| does dependency fetching (especially Node/NPM) would benefit.
| bhouston wrote:
| I know that npm now has a per-user cache in ~/.npm:
|
| https://docs.npmjs.com/cli/v7/commands/npm-cache
|
| I am not sure if it uses CoW to bring those packages into
| each project. If it did, that would be efficient and speed up
| "npm install" if the cache was warm.
| derefr wrote:
| Language package managers don't need copy-on-write, because
| there's no "write" -- the files that make up dependencies
| are immutable from the perspective of the projects that
| they get installed into. There's no advantage to using CoW
| to "deploy" such files into work trees, over using plain-
| old hard links to do so. (And hard-linking these files is
| indeed what all the Node package managers -- other than NPM
| -- already do.)
| nullindividual wrote:
| Volume Shadow Copies (Win Svr 2003) on NTFS were the first
| implementation by Microsoft of CoW. But it was limited to VSS
| snapshots, so not useful for day-to-day storage usage.
|
| DevDrive is not a derivative of ReFS, it is ReFS with some
| file system filter bits turned off among a couple of other
| things. DevDrive is a collection of features centered around
| ReFS for the purposes of speeding up file read/writes (think
| node modules).
| adrian_b wrote:
| Linux XFS has also added copy-on-write a few years ago.
|
| Initially I was not aware of this and I was surprised when I
| have copied a directory with a total size greater than 50 GB
| and the copy was instantaneous. At first I believed that I
| had given some wrong command, but then I searched the XFS
| documentation and I saw that this was a new feature at that
| time.
| ack_complete wrote:
| It could speed up other build systems, but the .NET build
| system (MSBuild) has a particular design issue where by default
| it will copy dependencies local to each project that's using
| them (Copy Local). This leads to assemblies being copied
| multiple times throughout the filesystem according to the build
| process.
| neonsunset wrote:
| The article talks about CoW as feature of ReFS, while the
| linked PR in dotnet/runtime is about adjusting the way File API
| issues calls on macOS so they take advantage of APFS's CoW
| instead.
| mgerdts wrote:
| I think they use a lot of extra words to say that ReFS will
| support the equivalent of cp --reflink.
| 42lux wrote:
| And with all that said WSL2 still buffers file transfers in
| RAM...
| Joker_vD wrote:
| You know, I've always been kinda amused that something very
| simple like "cat a b >c" or even "fa = open("a", O_APPEND |
| O_WRONLY); fb = open("b", O_RDONLY); sendfile(fa, fb, NULL,
| 0x7ffff000);" doesn't really have either user-visible specialized
| API nor under-the-hood speed ups in the FS implementations. It's
| just gluing two files together, it's got to be a very popular
| operation, about as popular as "prepend the contents of file A to
| file B". But you can't do it in-place which is kinda annoying
| when you have to preserve the existing files' attributes.
| jmole wrote:
| What attributes would be worth preserving that you wouldn't
| otherwise be able to do?
| Joker_vD wrote:
| Having the same permissions and the owner would be nice,
| which a bit annoying to pull off with the "write to a
| temporary file, then rename it over the original one"
| approach. Also, mtime/atime. And the xattrs, of course.
| derefr wrote:
| Yes, in theory, any filesystem could trivially add a feature of
| "ref-counted immutable extents" -- where a special syscall
| equivalent to `cat a b c > d` could be implemented that creates
| an inode d that consists of references to the existing extents
| of a+b+c.
|
| (The shared extents have to be _immutable_ , because on non-CoW
| filesystems, filesystem locks apply to "byte ranges of inodes",
| not to extents or slices thereof; so extents could only be
| _safely_ shared between inodes if they forced the inodes
| referencing them to act as if they were always reader-locked.)
|
| You could even implement this on e.g. Linux ext4 today -- you
| could consider extents immutable if they're part of an
| immutable (chattr +i) file that has no additional hard links;
| and you could prevent any files that are "sharing" immutable
| extents from being made non-immutable (where in the above, the
| syscall would create a file that is immediately immutable.)
|
| This would basically result in the same semantics +
| efficiencies that you get with "composite uploads" in an object
| store.
|
| ---
|
| Given a CoW filesystem, you could probably extend this concept
| to allow arbitrary CoW blocks to be explicitly referenced from
| file A into file B _without_ any need for immutability -- it 'd
| just be an explicit "partial" reflink. (This is already
| possible for the simple A->B case, by starting with a CoW
| clone, and then overwriting the blocks that shouldn't be
| shared. But more complex cases like A+B+C->D above, aren't
| possible; nor is having those shared blocks be in a different
| position in the clone than they are in the original; and so
| forth.)
|
| It wouldn't quite work like you're imagining with sendfile(2),
| though, because the CoW sharing could only occur at filesystem-
| block boundaries. You still wouldn't be able to use partial
| reflinking to optimize the operation of e.g. adding three bytes
| of header to a file (unless you also added BLKSZ-3 bytes of
| padding.)
| magicalhippo wrote:
| > about as popular as "prepend the contents of file A to file
| B"
|
| I've never understood why filesystems don't easily support
| prepending data, or to truncate the start of a file.
|
| It should be, as far as I can see, about as trivial to support
| as appending and truncating the end, and is something that
| comes up quite often in application code. Even if it is a bit
| more tricky, I think the benefits would be great in the cases
| where it's needed.
|
| Instead applications are left having to rewrite the contents.
| kbolino wrote:
| A queue is simpler to implement than a deque and the same is
| true of a file system: supporting growing the file in both
| directions is more complicated than supporting growing the
| file in one direction. In practice, append is much more
| common than prepend, so the extra bookkeeping and code
| doesn't seem to be worth it in general.
| o11c wrote:
| That's much less of a concern ever since everybody switched
| to extent-based filesystems.
|
| The real concern is block alignment.
| tedunangst wrote:
| So if I prepend 17 bytes to a file, where are they stored? And
| if I prepend another 47 bytes, etc.? How would this be tracked?
| kbolino wrote:
| Same as going forward, you'd grab a free block and simply
| fill it backwards from the end instead of forwards from the
| beginning. But the file system would have to support file
| data starting mid-block and new blocks getting added to the
| head of the file. The problem is that there'd be more
| bookkeeping data to store, more code to implement it, and
| more edge cases to handle for concurrent writes.
| tedunangst wrote:
| That'd be hell for mmap, too.
| o11c wrote:
| On some Linux filesystems you can do this _if_ the chunk to be
| inserted is a multiple of the block size.
|
| See `FALLOC_FL_COLLAPSE_RANGE` and `FALLOC_FL_INSERT_RANGE` in
| `fallocate(2)`
| lmm wrote:
| > It's just gluing two files together, it's got to be a very
| popular operation, about as popular as "prepend the contents of
| file A to file B".
|
| Realistically I don't think that's going to happen very often.
| What would be the use case? The only case I can think of is
| something like tar where you pack up a bunch of files as a
| single file, and you usually do compression at the same time in
| that case.
| tedunangst wrote:
| > We ran into this problem on one machine that had run continuous
| CoW builds for weeks under a prerelease CoW-in-Win32
| implementation, so we don't expect this to appear in the wild
| very often.
|
| That's not exactly confidence inspiring.
| whalesalad wrote:
| Crazy to see Microsoft talking about performance like they have
| any expertise in the matter.
| forrestthewoods wrote:
| Oh man. My dream Git Successor combines a Virtual File System
| with a Copy-on-Write cache to allow repos to trivially commit all
| their dependencies including compiler toolchains.
|
| Windows having CoW makes my far fetched dream a possibility.
| tester756 wrote:
| >Dev Drive was released
|
| I tried that Dev Drive thing and I havent seen perf improvement
| when building C++ code, sadly.
| justinlloyd wrote:
| Have been running ReFS on a drive on my Windows 10 workstation
| for about three years, and recently started using a dev drive
| equivalent on Windows 10 for the past two months. Our Unreal
| Engine project is quite large, 600+GB straight from the P4 depot
| before building. I need to keep a few separate workspaces around,
| one for current development work, one for swarm reviews, one for
| "let me test out a thing that might break" because as we know,
| branching in Perforce can quite painful, especially on large
| depots. At one point I needed to have dozens of workspaces synced
| to specific changelists whilst we hunted down a bug in one of our
| levels.
|
| ReFS, with block de-duplication and LZ4 compression has reduced
| the per-workspace footprint to around 10% of what it was
| previously. Decreased build times by around 5% and decreased
| archive, stage and package times by about 80% by deploying
| MSBuild SDK CopyOnWrite. I also moved the DDC onto the VHDX where
| the project resides which has further reduced the footprint of
| the project.
|
| Windows 11 canary channel (still in canary I think) has a
| modified Win32 that supports CoW FileCopyEx. You can get similar
| gains by other means on Win10 and Win11 by using ReFS CoW aware
| utilities.
|
| Have used XFS, BTRFS, APFS and others extensively over the years,
| so I am glad that Windows is finally getting in on the action.
___________________________________________________________________
(page generated 2024-06-24 23:00 UTC)