Re: Page Colouring (was: 2.6.0 Huge pages not working as expected)

From: William Lee Irwin III
Date: Mon Dec 29 2003 - 05:24:34 EST


On Mon, 29 Dec 2003, William Lee Irwin III wrote:
>> I can't say I'm particularly encouraged by what I've heard thus far,

On Mon, Dec 29, 2003 at 01:33:53AM -0800, Linus Torvalds wrote:
> Well, I don't even know what your approach is - mind giving an overview?
> My original plan (and you can see some of it in the fact that
> PAGE_CACHE_SIZE is separate from PAGE_SIZE), was to just have the page
> cache be able to use bigger pages than the "normal" pages, and the
> normal pages would continue to be the hardware page size.
> However, especially with mem_map[] becoming something of a problem, and
> all the problems we'd have if PAGE_SIZE and PAGE_CACHE_SIZE were
> different, I suspect I'd just be happier with increasing PAGE_SIZE
> altogether (and PAGE_CACHE_SIZE with it), and then just teaching the VM
> mapping about "fractional pages".
> What's your approach?

Hmm, I presented on this at KS. Basically, it's identical to Hugh
Dickins' approach from 2000. The only difference is really that it had
to be forward ported (or unfortunately in too many cases reimplemented)
to mix with current code and features.

Basically, elevate PAGE_SIZE, introduce MMUPAGE_SIZE to be a nice macro
representing the hardware pagesize, and the fault handling is done with
some relatively localized complexity. Numerous s/PAGE_SIZE/MMUPAGE_SIZE/
bits are sprinkled around, along with a few more involved changes because
a large number of distributed changes are required to handle oddities
that occur when PAGE_SIZE changes from 4KB. The more involved changes
are often for things such as the only reason it uses PAGE_SIZE is
really that it just expects 4KB and says PAGE_SIZE, or that it wants
some fixed (even across compiles) size and needs updating for more
general PAGE_SIZE numbers, or sometimes that it expects PAGE_SIZE to be
what a pte maps when this is now represented by MMUPAGE_SIZE. I have a
bad feeling the diligence of the original code audit could be bearing
against me (and though I'm trying to be equally diligent, I'm not hugh).

The fact merely elevating PAGE_SIZE breaks numerous things makes me
rather suspicious of claims that minimalistic patches can do likewise.

The only new infrastructures introduced are the MMUPAGE_SIZE and a couple
of related macros (defining numbers, not structures or code) and the
fault handler implementations. The diff size is not small. The memory
footprint is, and demonstrably so (c.f. March 27 2003).

My 2.6 code has been heavily leveraging the pfn abstraction in its favor
to represent physical addresses measured in units of the hardware
pagesize. Generally, my maintenance approach has been incrementally
advancing the state of the thing while keeping it working on as broad a
cross section of i386 systems as I can test or get testers on. It has
been verified to run userspace on Thinkpad T21's and 16x/32GB and
32x/64GB NUMA-Q's at every point release it's been ported to, which
since 2.5.68 or so has been every point release coming out of kernel.org.


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/