[HN Gopher] How useful should copy_file_range be?
___________________________________________________________________
How useful should copy_file_range be?
Author : chmaynard
Score : 42 points
Date : 2021-02-18 22:22 UTC (1 days ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| CyberRabbi wrote:
| Why not optimistically copy the file until EOF and report number
| of bytes copied? Why is stat() consulted at all? That seems
| broken.
| tyingq wrote:
| Something to change the start point of an existing file would be
| neat. Sort of like truncate(), but for the start.
| jandrese wrote:
| When you think about how files are stored on the filesystem it
| becomes clear why this functionality doesn't exist. Each file
| is basically a list of blocks and some metadata like the total
| length of the file. What it doesn't have is a length for each
| block--they are assumed to be full length except for the last,
| which is stored as the total length of the file.
|
| So if you wanted to add bytes to the front of the file you
| would have to allocate new blocks to store it, but since there
| is no map of length for each block you would have to only move
| it by exact block lengths. Same for shrinking the file by
| cutting off the head, you can't handle values other than full
| blocks.
|
| It's certainly possible to build a filesystem where this would
| work, but when you wrote programs using the feature they
| wouldn't be portable to any other commonly used filesystem.
| People also don't change filesystems very often, so even if you
| got the change into Ext and waited a decade many people would
| still be incompatible.
|
| Finally, it's a feature that is helpful only rarely. So there
| isn't enough demand to push through such a massive change given
| the headwinds it has.
| the8472 wrote:
| fallocate(..., FALLOC_FL_COLLAPSE_RANGE) will do that but it
| comes with limitations such as alignment requirements and
| limited filesystem support.
|
| You could also try creating a new file and use copy_file_range
| to copy the tail of the file to the new one, then move it over
| the old one. That might reuse a good chunk of the storage on a
| CoW filesystem.
| callesgg wrote:
| I used to ponder on an idea that partial file copy's could be
| done with file fragmentaion.
| dataflow wrote:
| It sounds wrong to depend on the file length for correctness even
| on physical file systems. What if the file length shrinks during
| the copy? You need to just keep going until you can't anymore...
| cpuguy83 wrote:
| That's assuming you want to copy the the whole thing. And even
| if you do want to do that, this is what you do with
| copy_file_range as well, just that in many cases you can do it
| with a single call instead of multiple read/write calls in
| addition to being able to take advantage of performance
| optimizations (such as reflinking).
| dataflow wrote:
| > That's assuming you want to copy the the whole thing
|
| Right, but if you don't, then you _definitely_ don 't need to
| query the file length in the first place...
| jeffrallen wrote:
| As a Go user and former contributor, it makes me pleased that the
| rigor of the Go team occasionally gives the Linux kernel
| developers heartburn. As long as everyone stays professional, the
| end result is better for both groups.
|
| Linux gets feature velocity by playing fast and loose sometimes
| with stability. Demanding users like the Go authors are a
| necessary and welcome counterbalance.
| jandrese wrote:
| It seems kind of strange that this is a kernel function at all.
| It seems like something that should live in libc or the like. Is
| there a performance benefit from having it up in kernel space? It
| seems a bit outside of the scope of the kernel IMHO.
|
| I can understand functions like sendfile() being able to cut down
| on context switches and being helpful for bulk data transfer, but
| is that the case here? How much benefit do you get from
| copy_file_range() vs. a read/write loop?
|
| I do note with some amusement how the kernel developer basically
| went "Why are you using copy_ _file_ _range on things that aren't
| actually files?"
| 7786655 wrote:
| >The copy_file_range() system call looks like a relatively
| straightforward feature; it allows user space to ask the kernel
| to copy a range of data from one file to another, _hopefully
| applying some optimizations along the way._
| jandrese wrote:
| That's exactly what I was asking. The Go developers are
| knocking themselves out chasing this syscall in the hopes
| that it might improve performance? Has it been benchmarked?
| webstrand wrote:
| Given all of the effort being put into zero-copy read/write in
| the kernel, I would assume there are significant performance
| gains available.
|
| I suspect that some, correctly aligned, ranges could be copied
| with CoW semantics, thereby skipping the read/write altogether.
| bonzini wrote:
| On NFS you can avoid network traffic completely.
| jandrese wrote:
| Is this theoretical or is there support for it already in
| the kernel and NFS daemons?
|
| Reading the manpage for copy_file_range the notes section
| states: copy_file_range() gives
| filesystems an opportunity to implement "copy
| acceleration" techniques, such as the use of reflinks
| (i.e., two or more inodes that share pointers to
| the same copy-on-write disk blocks) or server-
| side-copy (in the case of NFS).
|
| But doesn't mention if these techniques are actually used.
| I guess there's some help in future proofing your code.
|
| Edit: I tested this on my Ubuntu 20.04 machine and a 1GB
| file full of random data sitting in the file cache. Using
| copy_file_range I could make a local-local copy in 0.595
| seconds on average. Using a primitive copy/write loop took
| 0.600 seconds. But these values are somewhat noisy and the
| difference is down in the error margin. It doesn't appear
| that my 5.4.0 kernel on ext4 is employing the reflinks
| optimization.
| dallbee wrote:
| Can't share details, I've seen the copy_file_range
| optimization represent roughly 25% increased throughput
| for a prior employer. It's more than theoretical.
| bonzini wrote:
| It's totally practical and already implemented by Linux
| on both the server and the client side.
| bonzini wrote:
| ext4 does not support sharing blocks across multiple
| files, indeed. Try btrfs or perhaps xfs.
| the8472 wrote:
| ext4 has no reflink support. btrfs does by default, xfs
| under some configurations. server-side offload is
| supported by nfs and cifs but also needs server-side
| support.
| gravypod wrote:
| Even if it wasn't smart you'd still get a huge benifit
| from not context switching to/from kernel/user space for
| a standard copy implementation. Far fewer syscalls.
| There's a real performance benifit to be had there.
___________________________________________________________________
(page generated 2021-02-19 23:01 UTC)