Re: "raw" block devices?

Theodore Y. Ts'o (tytso@mit.edu)
Thu, 17 Oct 1996 13:16:06 -0400


Date: Fri, 18 Oct 1996 01:57:52 +1000 (EST)
From: David Monro <davidm@fuzzbox.psrg.cs.usyd.edu.au>

1) Messing with the raw device doesn't go through the buffer cache,
so programs which basically scan a large device (eg fsck) don't trash
the cache. Seems reasonable. Does e2fsck have some nifty way of not
trashing cache currently? Also this allows eg database systems to be
given a slice of disk which they are in complete control of, and can
maybe manage better than the normal buffering (known access patterns
etc).

When you're running e2fsck at the beginning of the boot sequence, there
really isn't much cache to trash. :-)

2) Because of the above, it should be possible to get data straight from the
device into user memory without any copying. This should be a big win eg for
the above mentioned database system. Actually it should be possible to
do this anyway using copy-on-write and having the kernel copy the page only
if it is modified by the program (using the same phys memory in both the
cache and the user space). Currently I believe we don't do this;
correct me if I am wrong.

Actually, if you're using mmap, you *can* get the data into the user
memory without doing any copying. That's how the new memory management
system works.

Eventually, it should be possible to do the same with read's, either
filesystem reads or reads from the device; there'll no doubt be
restrictions such as requiring that you read a full page, and that the
target buffer be page aligned, etc., but I know there are people who
were thinking about doing things like this for 2.1.

Similarly, you can play similar tricks for network writes; just DMA it
straight from user memory. One of the ways how the SGI gets truely
awesome web server benchmarks is by doing zero-copy disk reads from the
filesystem, and zero-copy network writes. All you need the CPU for is
to set up the DMA's, lock down the memory pages, and stay out of the
way. :-)

- Ted