Re: Page Colouring (was: 2.6.0 Huge pages not working as expected)

From: William Lee Irwin III
Date: Tue Dec 30 2003 - 00:02:53 EST


On Mon, 29 Dec 2003 02:23:19 -0800 William Lee Irwin III wrote:
>> The fact merely elevating PAGE_SIZE breaks numerous things makes me
>> rather suspicious of claims that minimalistic patches can do likewise.

On Tue, Dec 30, 2003 at 01:00:29PM +1100, Rusty Russell wrote:
> Can you give an example?
> One approach is to simply present a larger page size to userspace w/
> getpagesize(). This does break ELF programs which have been laid out assuming
> the old page size (presumably they try to mprotect the read-only sections).
> On PPC, the ELF ABI already insists on a 64k boundary between such sections,
> and maybe for others you could simply round appropriately and pray, or do
> fine-grained protections (ie. on real pagesize) for that one case.

Apps must, of course, be relinked for that, but that's userspace. This
ABI change is largely out of the picture due to legacy binaries, user
virtualspace fragmentation (most likely an issue for 32-bit threading),
and so on. The choice of PAGE_SIZE in such schemes is also restricted
to no larger than whatever choice used for userspace linking, which is
a relatively ugly dependency. There's also a question of "smooth
transition": the only way to "incrementally deploy" it on a mixture
"ready" userspace and "unready" userspace is to turn it off. I suppose
it has the minor advantage of being trivial to program.

I had in mind pure kernel internal issues, not ABI.

The issues from raising PAGE_SIZE alone are things like interpreting
hardware descriptions in arch code, some shifts underflowing for things
like hashtables, certain drivers doing ioremap() and the like either
filling up vmallocspace or getting their math wrong, and some other
drivers doing calculations on physical addresses getting them wrong, or
using PAGE_SIZE to represent some 4KB or other fixed-size memory area
interpreted by hardware, and filesystems that assume blocksize ==
PAGE_SIZE or assume PAGE_SIZE is less than some particular value (e.g.
short offsets into pages, worst of all being signed shorts), and
tripping BUG()'s in ll_rw_blk.c when 512*q->max_sectors < PAGE_SIZE.

These issues are the bulk of the work needing to be done for the driver
and fs sweeps. Actual concerns about MMUPAGE_SIZE in drivers/ and fs/
are rather limited in scope, though drivers/char/drm/ was somewhat
painful to get going (Zwane actually did most of this for me, as I have
no DRM/DRI -capable graphics cards at my disposal).


-- wli
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/