Re: mmap() is slower than read() on SCSI/IDE on 2.0 and 2.1

Rik van Riel (H.H.vanRiel@phys.uu.nl)
Thu, 17 Dec 1998 01:27:05 +0100 (CET)


On Wed, 16 Dec 1998, Stephen C. Tweedie wrote:
> On Tue, 15 Dec 1998 14:42:28 +1300, Chris Wedgwood <cw@ix.net.nz> said:
> > On Mon, Dec 14, 1998 at 08:58:41PM +0000, Stephen C. Tweedie wrote:
> >> It will be pretty simple to add a soft-page-fault fencepost to the
> >> page cache to allow us to prime the next readahead chunk before we
> >> have finished accessing the current one in the case of sequential
> >> access, which will allow us to sustain even faster sequential IO
> >> rates on fast disks.
>
> > But doesn't this assume we'll sequentially access mmap regions?
>
> No. Done correctly, it assumes that we _might_ access them
> sequentially, and that the readahead only kicks in if we do so. The
> heuristic I am currently coding is that if the entire previous VA
> cluster is currently mapped in the page tables, then the processor is
> accessing all of the referenced pages and readahead is enabled.

Once again, you seem to be forgetting about read-behind.

There might be cases where the data is used in the 'other'
direction. We can detect that by not only testing for
presence, but also for the page count of the pages in
question.

It should be a pretty safe bet to assume that we are
going in the direction of the lowest page counts --
what's already mapped (and has a higher count) is
where we came from.

This might go wrong if multiple processes are using
the same piece of file simultaneously and in the other
direction, but then we've probably already read in the
data in question :)

> This algorithm has many good properties: in particular, it does
> not assume that there is only one readahead stream per vma. In
> numerical codes it is very common to have multiple large arrays in
> memory, and we may well be accessing all of those arrays
> sequentially at the same time. In that case we have multiple
> readahead "cursors" active in the VM at once, so we can't just use
> heuristics based on the address of the last fault to recognise
> sequential access.

Yes. We want all those properties and we can have them very
cheaply -- let's go for it :)

> > I really don't fully understand why madvise is a bad thing, I don't
> > see how the OS can possibly know better than the application about
> > future access patterns....
>
> The point is that there are many very common access patterns which the
> O/S can, and should, recognise and optimise without the user needing to
> tell us.

I'd say we can catch >95% of all accesses with the algorithm
you mentioned (and my little extension). The other 5% is not
worth optimizing for since we don't really lose out with the
algorithm enabled vs. disabled, even in case of these weird
patterns...

cheers,

Rik -- the flu hits, the flu hits, the flu hits -- MORE
+-------------------------------------------------------------------+
| Linux memory management tour guide. H.H.vanRiel@phys.uu.nl |
| Scouting Vries cubscout leader. http://www.phys.uu.nl/~riel/ |
+-------------------------------------------------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/