Re: [00/17] Large Blocksize Support V3

From: Paul Mackerras
Date: Fri Apr 27 2007 - 07:05:53 EST


Andrew Morton writes:

> If x86 had larger pagesize we wouldn't be seeing any of this. It is a workaround
> for present-generation hardware.

Unfortunately, it's not really practical to increase the page size
very much on most systems, because you end up wasting a lot of space
in the page cache. So there is a tension between wanting a small page
size so your page cache uses memory efficiently, and wanting a large
page size so the TLB covers more address space and your programs run
faster (not to mention other benefits such as the kernel having to
manage fewer pages, and I/O being done in bigger chunks).

Thus there is not really any single page size that suits all workloads
and machines. With distros wanting to just have a single kernel per
architecture, and the fact that the page size is a compile-time
constant, we currently end up having to pick one size and just put up
with the fact that it will suck for some users. We currently have
this situation on ppc64 now that POWER5+ and POWER6 machines have
hardware support for 64k pages as well as 4k pages.

So I can see a few different options:

(a) Keep things more or less as they are now and just wear the fact
that we will continue to show lower performance than certain
proprietary OSes, or

(b) Somehow manage to make the page size a variable rather than a
compile-time constant, and pick a suitable page size at boot time
based on how much memory the machine has, or something. I looked at
implementing this at one point and recoiled in horror. :)

(c) Make the page cache able to use small pages for small files and
large pages for large files. AIUI this is basically what Christoph is
proposing.

Option (a) isn't very palatable to me (nor I expect, Christoph :)
since it basically says that Linux is very much focussed on the
embedded and desktop end of things and isn't really suitable as a
high-performance OS for large SMP systems. I don't want to believe
that. ;)

Option (b) would be a bit of an ugly hack.

Which leaves option (c) - unless you have a further option. So I have
to say I support Christoph on this, at least as far as the general
principle is concerned.

Regards,
Paul.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/