Re: mmap() versus read()

Erik Corry (erik@arbat.com)
Sun, 8 Mar 1998 15:48:06 +0100


On Sun, Mar 08, 1998 at 01:15:45PM +0000, Chris Evans wrote:
>
> On Sun, 8 Mar 1998, Erik Corry wrote:
>
> > Take a look at madvise for Solaris. You can say for a
> > mmaped area that you are going to read sequentially (do
> > lots of readahead), read randomly (do no readahead at all),
> > that you are going to need an area soon, or that you are
> > (probably) not going to need the area at all any more.
>
> I see no reason for an madvise() -- the kernel should be able to monitor
> faults and if they are sequential, decided for _itself_ that lots of

It seems from Alans message that there is automatic readahead
on mmaped files for 2.1, but not 2.0. I don't know whether it
is always on, or whether it is switched on when sequential
access is detected.

> readahead is a good idea. Calling madvise() is still incurring the
> overhead of a system call too.

Perhaps for the case where faults are sequential, this
applies, but madvise is much more powerful than this. If
you are reading sequentially at more than one point in
the file, or you are reading in an application-specific
but well-defined order, then it can be very useful to
be able to use madvise (as an example, a TIFF file with
tiles, where you know what order you will need the data,
but it probably isn't even close to linear).

I find the objection of a system call overhead hard
to understand. If the kernel makes a mistake you will
probably have to wait 3-10ms for the hard disk to seek
to the right place. Compared with this, a system call
is peanuts.

Note also that not everything that looks sequential is
actually sequential. A search for fadvise on Deja News
reveals that .exe loads from DOS boxes (eg. for Samba) are
sequential except for one single backwards seek near the
start. This fooled FreeBSD's (I think) sequential-detector
into thinking access was random.

This isn't to say that we shouldn't try to autodetect
common patterns, but an madvise call is genuinely useful
too. One useful feature to detect might be:

* Sequential accesses to a file bigger than RAM cause the
free page search policy to switch to most-recently-used or
something like it for that file.

This avoids the pathological situation where everything is
thrown out of RAM, except the data we just read and won't be
needing any more. This applies even if the file is read again
shortly afterwards, because the file is bigger than RAM.

If you feel this is too hard to autodetect, then you really
need madvise or fadvise.

-- 
Erik Corry

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu