Re: [RFC v0 0/4] sys_copy_range() rough draft

From: Dave Chinner
Date: Tue May 14 2013 - 17:43:03 EST


On Tue, May 14, 2013 at 02:15:22PM -0700, Zach Brown wrote:
> We've been talking about implementing some form of bulk data copy
> offloading for a while now. BTRFS and OCFS2 implement forms of copy
> offloading with ioctls, NFS 4.2 will include a byte-granular COPY
> operation, and the SCSI XCOPY command is being implemented now that
> Windows can issue it.
>
> In the past we've discussed promoting the ocfs2 reflink ioctl into a
> system call that would create a new file and implicitly copy the
> source data into the new file:
> https://lkml.org/lkml/2009/9/14/481
>
> These draft patches take the simpler approach of only copying data
> between existing files. The patches 1) make a system call out of the
> btrfs CLONE_RANGE ioctl, 2) implement the btrfs .copy_range method with
> the ioctl's guts, 3) implement the nfs .copy_range by sending a COPY
> op, and 4) serve the COPY op in nfsd by calling the .copy_range method
> again.
>
> The nfs patch is an untested hack. I'm happy to beat it in to shape
> but I'll need some guidance.
>
> I'd like strong review feedback on the interfaces, here are some
> possible topics:
>
> a) Hopefully being able to specify a portion of the data to copy will
> avoid *huge* syscall latencies and the motivation for new async
> semantics.
>
> b) The BTRFS ioctl and nfs COPY let you specify a count of 0 to copy
> from the start offset to the end of the file. Does anyone have a
> strong feeling about this? I'm leaning towards not bothering with it
> in the syscall interface.
>
> c) I chose to return partial progess in the ssize_t return code. This
> limits the length of the range and the size_t count argument can be too
> large and return errors, much like other io syscalls. This seemed
> less awful than some extra argument with a pointer to a status value.
>
> d) I'm dreading mentioning a vector of ranges to copy in one syscall
> because I don't want to think about overlaping ranges and file systems
> that use range locks -- xfs for now, but more if Jan gets his way.

XFS doesn't use range locks (yet).

> I'd rather that we get some experience with this simpler syscall before
> taking on that headache.
>
> I'm sure I'm forgetting some other details.
>
> I'm going to keep hacking away at this. My next step is to get ext4
> supporting .copy_range, probably with a quick hack to copy the
> contents of bios. Hopefully that'll give enough time to also integrate
> review feedback.

Wouldn't the easiest "support all filesystems" hack just be to add
a destination offset parameter to do_splice_direct() and call that
when the filesystem doesn't supply a ->copy_range method? i.e. use
the mechanisms we already have for copying from one file to another
via the page cache as efficiently as possible?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/