Re: [rfc] no ZERO_PAGE?

From: Linus Torvalds
Date: Wed Apr 04 2007 - 13:04:24 EST




On Wed, 4 Apr 2007, Eric Dumazet wrote:
>
> But results on an Intel Pentium-M are interesting, in particular 2) & 3)
>
> If a page is first allocated as page_zero then cow to a full rw page, this is more expensive.
> (2660 cycles instead of 2300)

Yes, you have an extra TLB flush there at a minimum (if the page didn't
exist at all before, you don't have to flush).

That said, the big cost tends to be the clearing of the page. Which is why
the "bring in zero page" is so much faster than anything else - it's the
only case that doesn't need to clear the page.

So you should basically think of your numbers like this:
- roughly 900 cycles is the cost of the page fault and all the
"basic software" side in the kernel
- roughly 1400 cycles to actually do the "memset" to clear the page (and
no, that's *not* the cost of memory accesses per se - it's very likely
already in the L2 cache or similar, we just need to clear it and if
it wasn't marked exclusive need to do a bus cycle to invalidate it on
any other CPU's).

with small variation depending on what the state was before of the cache
in particular (for example, the TLB flush cost, but also: when you do

> 4) memset 4096 bytes to 0x55:
> Poke_full (addr=0x804f000, len=4096): 2719 cycles

This only adds ~600 cycles to memset the same 4kB that cost ~1400 cycles
before, but that's *probably* largely because it was now already dirty in
the L2 and possibly the L1, so it's quite possible that this is really
just a cache effect, because now it's entirely exclusive in the caches so
you don't need to do any probing on the bus at all).

Also note: in the end, page faults are usually fairly unusual. You do them
once, and then use the page a lot after that. That's not *always* true, of
course. Some malloc()/free() patterns of big areas that are not used for
long will easily cause constant mmap/munmap, and a lot of page faults.

The worst effect of page faults tends to be for short-lived stuff. Notably
things like "system()" that executes a shell just to execute something
else. Almost *everything* in that path is basically "use once, then throw
away", and page fault latency is interesting.

So this is one case where it might be interesting to look at what lmbench
reports for the "fork/exit", "fork/exec" and "shell exec" numbers before
and after.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/