Now that it became clear to me that "lat_mmap" is actually testing the
latency of _dirtying_ a memory map, the reason it is slow is very
obvious indeed: this is not something we have ever been very good at,
simply because I haven't ever given it much serious consideration.
Basically, when writing back a dirty page, Linux internally actually
uses the "write()" logic (there's a "writepage" thing that I originally
thought I'd use, but I never got around to doing it because it just
wasn't a very high priority).
Just to explain how silly this is, when something does a "write()", the
kernel will copy the written data not only to a disk buffer, but also to
any virtual page cache page. Which in this case means that we'll always
do two copies: once to copy to the buffer cache, and then another one
that copies the data to the page cache entry.
The second copy is truly ridiculous: we're actually copying it back to
the place it came from in the first place (as the mmap source was the
page cache page). But the write() logic doesn't know that, it only
knows that somebody is writing to the file.
This is probably not going to be fixed for 2.2 - the proper fix is to
just do the writes properly in the page cache, something that I was
going to do in 2.3.x anyway. That proper fix is not trivial, though.
Btw, that also explains why we're twice as slow as Solaris: Solaris
probably does just one copy. Which is not very impressive either, as
the proprt (2.3.x) way to do it is to do it without a single copy at
all, and just write it out directly from the page cache page.
I just didn't imagine that lmbench would test something as silly as this.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/