Re: PATCH: Raw device IO for 2.1.131

Stephen C. Tweedie (sct@redhat.com)
Mon, 14 Dec 1998 20:13:49 GMT


Hi,

On Sun, 13 Dec 1998 20:57:35 +0100 (CET), MOLNAR Ingo
<mingo@chiara.csoma.elte.hu> said:

> On Sun, 13 Dec 1998, Alan Cox wrote:

>> Thats basically the model we use now. And we can mmap() nicely , but
>> read() is seriously hard to do sensibly. read() should be doing the
>> lock/dma to user/unlock return sequence

> read() is simply not the thing to be used for this. read() does (source =>
> user-space) IO, and we need sendfile() done on behalf of your driver,
> which does (source => pagecache). (it doesnt do it in this direction right
> now, and sendfile() is not yet virtualized to drivers, but this is a
> detail. Maybe it's more correct to call this 'receivefile()'.)

By the same argument, read() is not yet abstract enough to do what you
want, but this is an implementation detail. There's nothing wrong with
the model. Look at /dev/zero: if you read an aligned page from it, the
kernel performs a zeromap_page_range() for you.

There is nothing conceptually better about locking down a set of pages
in the page cache and dma'ing to them via copyfd than locking down a set
of user pages in the swap cache and dma'ing to them via read(). I know
for sure which version looks cleaner to the user. Similarly there is
nothing to stop us doing that read into a mmap()ed region if we want to
write it to disk.

For regular files, implementing O_DIRECT by transparently mapping the
page cache into user space is not hard. The only functionality required
in the VM to allow that to happen is COW in find_page(): we already have
the necessary COW in the VM itself. O_DIRECT is then nothing more than
a hint to reject unaligned accesses, to write synchronously, and to
uncache the page after the IO.

I think this is a far better model; from the user's perspective it still
looks just like fast read/write and O_DIRECT. copyfd simply cannot cope
with some of the cases we need without copying, like a database sending
contents of a sysV shared memory segment out to a log file on disk.
read/write, on the other hand, has aliasing problems if you try using a
filemapped region as the user buffer. Either way, there's no problem
with presenting the drivers with simple readpage/writepage interfaces.

--Stephen

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/