Re: [PATCH RFC 1/3] vfs: add copy_file_range syscall and vfs helper

From: Trond Myklebust
Date: Fri Apr 10 2015 - 20:24:13 EST

On Fri, Apr 10, 2015 at 8:02 PM, Zach Brown <zab@xxxxxxxxxx> wrote:
> On Fri, Apr 10, 2015 at 06:36:41PM -0400, Trond Myklebust wrote:
>> On Fri, Apr 10, 2015 at 6:00 PM, Zach Brown <zab@xxxxxxxxxx> wrote:
>> > +
>> > +/*
>> > + * copy_file_range() differs from regular file read and write in that it
>> > + * specifically allows return partial success. When it does so is up to
>> > + * the copy_file_range method.
>> > + */
>> > +ssize_t vfs_copy_file_range(struct file *file_in, loff_t pos_in,
>> > + struct file *file_out, loff_t pos_out,
>> > + size_t len, int flags)
>> I'm going to repeat a gripe with this interface. I really don't think
>> we should treat copy_file_range() as taking a size_t length, since
>> that is not sufficient to do a full file copy on 32-bit systems w/ LFS
>> support.
> *nod*. The length type is limited by the syscall return type and the
> arbitrary desire to mimic read/write.
> I sympathize with wanting to copy giant files with operations that don't
> scale with file size because files can be enormous but sparse.

The other argument against using a size_t is that there is no memory
buffer involved here. size_t is, after all, a type describing
in-memory objects, not files.

>> Could we perhaps instead of a length, define a 'pos_in_start' and a
>> 'pos_in_end' offset (with the latter being -1 for a full-file copy)
>> and then return an 'loff_t' value stating where the copy ended?
> Well, the resulting offset will be set if the caller provided it. So
> they could already be getting the copied length from that. But they
> might not specify the offsets. Maybe they're just using the results to
> total up a completion indicator.
> Maybe we could make the length a pointer like the offsets that's set to
> the copied length on return.

That works, but why do we care so much about the difference between a
length and an offset as a return value?

To be fair, the NFS copy offload also allows the copy to proceed out
of order, in which case the range of copied data could be
non-contiguous in the case of a failure. However neither the length
nor the offset case will give you the full story in that case. Any
return value can at best be considered to define an offset range whose
contents need to be checked for success/failure.

> This all seems pretty gross. Does anyone else have a vote?
> (And I'll argue strongly against creating magical offset values that
> change behaviour. If we want to ignore arguments and get the length
> from the source file we'd add a flag to do so.)

The '-1' was not intended to be a special/magical value: as far as I'm
concerned any end offset that covers the full range of supported file
lengths would be OK.

>> Note that both btrfs and NFSv4.2 allow for 64-bit lengths, so this
>> interface would be closer to what is already in use anyway.
> Yeah, btrfs doesn't allow partial progress. It returns 0 on success.
> We could also do that but people have expressed an interest in returning
> partial progress.

Returning an end offset would satisfy the partial progress requirement
(with the caveat mentioned above).

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at