Re: File copy system call proposal

From: Daniel Phillips (
Date: Sat Dec 08 2001 - 08:57:17 EST

On December 8, 2001 07:03 am, Quinn Harris wrote:
> On Fri, 2001-12-07 at 21:00, H. Peter Anvin wrote:
> > > Commands like cp and install open the source and destination file using
> > > the open sys call. The data from the source is copied to the
> > > destination by repeatedly calling the read then write sys calls. This
> > > process involves copying the data in the file from kernel memory space
> > > to the user memory space and back again. Note that all this copying is
> > > done by the kernel upon calling read or write. I would expect if this
> > > can be moved completely into the kernel no memory copy operations would
> > > be performed by the processor by using hardware DMA.
> >
> > mmap(source file);
> > write(target file, mmap region);
> mmap will indeed get a file into user mem space without any memcopy
> operation. But as far as I can tell from examining generic_file_write
> (in mm/filemap.c) used by ext2 and I asume many others, a write will
> copy the memory even if it was mapped via mmap. Am I missing
> something? This isn't true if kiobuf is used but as I understand it,
> this bypasses the buffer cache.

> I would like to see a zero-memcopy file copy. A file copy that would
> read the file into the buffer cache if its not already there similar to
> a normal read then write it back to disk from the cache without
> duplicating the pages. This would probably lead to modifying the buffer
> cache to allow multiple buffer_heads to refer to the same data which
                          ^^^^^^^^^^^^---struct page
> might not be worth the overhead.
> One might implement such a thing by attempting in the generic_file_read
> to determine if the memory range is an actual page or pages that can be
> eventually written to the disk without a memcopy.

There's some merit to this idea. As Peter pointed out, an in-kernel cp isn't
needed: mmap+write does the job. The question is, how to avoid the
copy_from_user and double caching of data?

Generic_file_write would have to determine that the source range is an mmap,
then do the required xxx_get_block's (somehow) to determine the physical
destination of the write, then finally use kio to dma the in-memory data to
the destination. (Never mind that get_block isn't a vfs method, that's a
detail ;-)

The obvious problem is that, should somebody subsequently read the file, it's
not in cache. Oops, so much for the performance gain. So some mechanism
would have to be devised to get hold of the original page data by way of the
destination inode, and to keep that consistent through the various
combinations of events that can occur to the two files involved. This is
where things start to diverge quite a lot from the current page cache design,
so if you're interested in pursuing this idea, a good way to start would be
to find out why.

Before you put a lot of energy into it, you might consider measuring the
actual cost of the memory copy versus the two disk transfers.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
More majordomo info at
Please read the FAQ at

This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:12 EST