Re: New copyfile system call - discuss before LSF?

From: Myklebust, Trond
Date: Mon Feb 25 2013 - 18:29:00 EST


On Mon, 2013-02-25 at 14:16 -0800, Andy Lutomirski wrote:
> On Mon, Feb 25, 2013 at 1:59 PM, Myklebust, Trond
> <Trond.Myklebust@xxxxxxxxxx> wrote:
> > On Mon, 2013-02-25 at 16:49 -0500, Ric Wheeler wrote:
> >> On 02/25/2013 04:14 PM, Andy Lutomirski wrote:
> >> > On 02/21/2013 02:24 PM, Zach Brown wrote:
> >> >> On Thu, Feb 21, 2013 at 08:50:27PM +0000, Myklebust, Trond wrote:
> >> >>> On Thu, 2013-02-21 at 21:00 +0100, Paolo Bonzini wrote:
> >> >>>> Il 21/02/2013 15:57, Ric Wheeler ha scritto:
> >> >>>>>> sendfile64() pretty much already has the right arguments for a
> >> >>>>>> "copyfile", however it would be nice to add a 'flags' parameter: the
> >> >>>>>> NFSv4.2 version would use that to specify whether or not to copy file
> >> >>>>>> metadata.
> >> >>>>> That would seem to be enough to me and has the advantage that it is an
> >> >>>>> relatively obvious extension to something that is at least not totally
> >> >>>>> unknown to developers.
> >> >>>>>
> >> >>>>> Do we need more than that for non-NFS paths I wonder? What does reflink
> >> >>>>> need or the SCSI mechanism?
> >> >>>> For virt we would like to be able to specify arbitrary block ranges.
> >> >>>> Copying an entire file helps some copy operations like storage
> >> >>>> migration. However, it is not enough to convert the guest's offloaded
> >> >>>> copies to host-side offloaded copies.
> >> >>> So how would a system call based on sendfile64() plus my flag parameter
> >> >>> prevent an underlying implementation from meeting your criterion?
> >> >> If I'm guessing correctly, sendfile64()+flags would be annoying because
> >> >> it's missing an out_fd_offset. The host will want to offload the
> >> >> guest's copies by calling sendfile on block ranges of a guest disk image
> >> >> file that correspond to the mappings of the in and out files in the
> >> >> guest.
> >> >>
> >> >> You could make it work with some locking and out_fd seeking to set the
> >> >> write offset before calling sendfile64()+flags, but ugh.
> >> >>
> >> >> ssize_t sendfile(int out_fd, int in_fd, off_t in_offset, off_t
> >> >> out_offset, size_t count, int flags);
> >> >>
> >> >> That seems closer.
> >> >>
> >> >> We might also want to pre-emptively offer iovs instead of offsets,
> >> >> because that's the very first thing that's going to be requested after
> >> >> people prototype having to iterate calling sendfile() for each
> >> >> contiguous copy region.
> >> > I thought the first thing people would ask for is to atomically create a
> >> > new file and copy the old file into it (at least on local file systems).
> >> > The idea is that nothing should see an empty destination file, either
> >> > by race or by crash. (This feature would perhaps be described as a
> >> > pony, but it should be implementable.)
> >> >
> >> > This would be like a better link(2).
> >> >
> >> > --Andy
> >>
> >> Why would this need to be atomic? That would seem to be a very difficult
> >> property to provide across all target types with multi-GB sized files...
> >
> > Right. It may sound cool, but what's the real-life use case?
> >
>
> Download file from some source and then verify it. Now copyfile it
> into my repository of known-good files.
>
> Admittedly I could link + unlink or rename it there, but I consider
> hard links to be rather evil, especially when cow links are available.

Rename is the right way to do that as it can't corrupt the data after
you have verified it. copyfile can...

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/