Re: New copyfile system call - discuss before LSF?

From: Ric Wheeler
Date: Mon Feb 25 2013 - 16:51:14 EST

On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
On 02/21/2013 02:24 PM, Zach Brown wrote:
On Thu, Feb 21, 2013 at 08:50:27PM +0000, Myklebust, Trond wrote:
On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
Il 21/02/2013 15:57, Ric Wheeler ha scritto:
sendfile64() pretty much already has the right arguments for a
"copyfile", however it would be nice to add a 'flags' parameter: the
NFSv4.2 version would use that to specify whether or not to copy file
That would seem to be enough to me and has the advantage that it is an
relatively obvious extension to something that is at least not totally
unknown to developers.

Do we need more than that for non-NFS paths I wonder? What does reflink
need or the SCSI mechanism?
For virt we would like to be able to specify arbitrary block ranges.
Copying an entire file helps some copy operations like storage
migration. However, it is not enough to convert the guest's offloaded
copies to host-side offloaded copies.
So how would a system call based on sendfile64() plus my flag parameter
prevent an underlying implementation from meeting your criterion?
If I'm guessing correctly, sendfile64()+flags would be annoying because
it's missing an out_fd_offset. The host will want to offload the
guest's copies by calling sendfile on block ranges of a guest disk image
file that correspond to the mappings of the in and out files in the

You could make it work with some locking and out_fd seeking to set the
write offset before calling sendfile64()+flags, but ugh.

ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
out_offset, size_t count, int flags);

That seems closer.

We might also want to pre-emptively offer iovs instead of offsets,
because that's the very first thing that's going to be requested after
people prototype having to iterate calling sendfile() for each
contiguous copy region.
I thought the first thing people would ask for is to atomically create a
new file and copy the old file into it (at least on local file systems).
The idea is that nothing should see an empty destination file, either
by race or by crash. (This feature would perhaps be described as a
pony, but it should be implementable.)

This would be like a better link(2).


Why would this need to be atomic? That would seem to be a very difficult property to provide across all target types with multi-GB sized files...


To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at