Re: mmap() is slower than read() on SCSI/IDE on 2.0 and 2.1

David S. Miller (davem@dm.cobaltmicro.com)
Mon, 14 Dec 1998 06:07:37 -0800


Date: Mon, 14 Dec 1998 05:43:21 -0800
From: "Jay Nordwick" <nordwick@scam.XCF.Berkeley.EDU>

> How about cluing in the VM to prefetch with madvise() ? How would this
> effect performance?
>
>It's a kludge.
>

Why? It is not dirty or strange. Isn't that what madvise if for... to
hint to the VM how you are going to access the mapping?

Because madvise() is a kludge, and is why we don't have such a beast
in Linux.

>You have to "prime" the read prefetcher first by asking for a couple
>big chunks via read() first.

Hmmm... sound exactly like what madvise() would do.

Not really, read() can act intelligently. See below.

When you say read() what exactly do you mean?

I mean the kernel guts which actually do the copies from the page
cache and calls out for I/O if needed (mm/filemap.c:generic_readpage
and friends)

(relying on how I assume this is working, from the previous paragraph) How
big of chunks? If a page is sufficient to prime the read prefetcher, then
wouldn't the faults cause the read prefetcher to start fetching ahead?

Read the readahead heuristics in generic_readpage() and friends.

I don't really see the problem: some indecision on an arbitrary limit?
This is the same thing as prefetching with read(), isn't it?

No.

You don't know how much to prefetch with read, do you?

You never know _exactly_, but you know quite a bit.

You know exactly at a read() call:

1) Where in the file.
2) How much the user wants in this request.

For page faults you know exactly where but your "how much" is constant
per request, that is PAGE_SIZE. This is the core problem.

As a worst case, why not have the same semantics as read --
prefetch the next page if you have to sleep to read it -- not
optimal, but at least will bring mmap() to the performance of
read().

This is what mmap() faults currently do, one page of readahead. On
the read() side it is much more aggressive. More agressive read-ahead
leads to fatter and more efficient I/O requests.

I always thought that mmap() should be fast, especially when you
use it for IPC.

It's harder to make fast than read, simply due to the missing
information I mention above. You only know, per request, that one
page is needed.

Now one possible solution is to keep some kind of page fault history
around per VM area. That is what some systems do.

Linux here optimizes for what the paging behavior typically is on a
client, random access.

BTW, there is a neat way you could increase the filemap_nopage()
prefetching if the copy from the user's mmap()'d area happens from the
kernel (ie. in a non-sendfile() socket write for example). Here you
know the amount of data you will be copying, so you could add a
per-vma "faultahead" hint value, then filemap_nopage() uses this
exactly how generic_readpage() uses the 'len' parameter to prefetching
heuristics.

I wanted to play around with this for TCP once, but it no longer makes
sense in the TCP send path if sendfile() exists. Here is happens all
in kernel space with no faults.

>The problematic case (and a real life one) is when all of libc has
>been faulted into main memory by various processes, when you start one
>up do you map in all of libc when it gets mmap'd by the application?
>If not, then which if any pages do you choose?

I cannot parse this.

Let me try again.

Programs in your system, as they run, fault in different pages of the
C library right? After some time, more and more of libc resides in
the page cache and no I/O is needed. So if the optimization is "at
mmap() time, setup the page table entries for pages which we have in
ram already right then" what is your upper bound on this? The problem
is what if this is just some short lived program which only needs one
or two pages or libc to do it's work and then exit()? We don't want
to spend all of our time setting up all of his page tables when he
will use only a few.

However this is an important optimization, because when it helps, it
helps a lot. I had code which did this once, but because the
heuristic was difficult to come by, I threw that work away. It was
amusing, when the system first came up benchmarks ran incredibly fast,
but after some time and system usage they degraded horribly.

Later,
David S. Miller
davem@dm.cobaltmicro.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/