Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Linus Torvalds
Date: Sun Nov 06 2005 - 10:56:09 EST




On Sat, 5 Nov 2005, Paul Jackson wrote:
>
> It seems to me this is making it harder than it should be. You're
> trying to create a zone that is 100% cleanable, whereas the HPC folks
> only desire 99.8% cleanable.

Well, 99.8% is pretty borderline.

> Unlike the hot(un)plug folks, the HPC folks don't mind a few pages of
> Linus's unmoveable kmalloc memory in their way. They rather expect
> that some modest percentage of each node will have some 'kernel stuff'
> on it that refuses to move.

The thing is, if 99.8% of memory is cleanable, the 0.2% is still enough to
make pretty much _every_ hugepage in the system pinned down.

Besides, right now, it's not 99.8% anyway. Not even close. It's more like
60%, and then horribly horribly ugly hacks that try to do something about
the remaining 40% and usually fail (the hacks might get it closer to 99%,
but they are fragile, expensive, and ugly as hell).

It used to be that HIGHMEM pages were always cleanable on x86, but even
that isn't true any more, since now at least pipe buffers can be there
too.

I agree that HPC people are usually a bit less up-tight about things than
database people tend to be, and many of them won't care at all, but if you
want hugetlb, you'll need big areas.

Side note: the exact size of hugetlb is obviously architecture-specific,
and the size matters a lot. On x86, for example, hugetlb pages are either
2MB or 4MB in size (and apparently 2GB may be coming). I assume that's
where you got the 99.8% from (4kB out of 2M).

Other platforms have more flexibility, but sometimes want bigger areas
still.

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/