Re: [00/17] Large Blocksize Support V3

From: Nick Piggin
Date: Fri Apr 27 2007 - 06:38:56 EST

Christoph Hellwig wrote:
On Thu, Apr 26, 2007 at 04:50:06PM +1000, Nick Piggin wrote:

Improving the buffer layer would be a good way. Of course, that is
a long and difficult task, so nobody wants to do it.

It's also a stupid idea. We got rid of the buffer layer because it's
a complete pain in the ass, and now you want to reintroduce one that's
even more complex, and most likely even slower than the elegant solution?

Well, for those architectures (and this would solve your large block
size and 16TB pagecache size without any core kernel changes), you
can manage 1<<order hardware ptes as a single Linux pte. There is
nothing that says you must implement PAGE_SIZE as a single TLB sized

Well, ppc64 can do that. And guess what, it really painfull for a lot
of workloads. Think of a poor ps3 with 256 from which the broken hypervisor
already takes a lot away and now every file in the pagecache takes
64k, every thread stack takes 64k, etc? It's good to have variable
sized objects in places where it makes sense, and the pagecache is
definitively one of them.

Well, I think 64K is probably too large for a generic Linux config, but
16K is a lot more reasonable and would also nicely cut out most other
higher order allocations too.

The efficiency argument has always been there and is nothing new. Yeah,
I know it can be slightly more efficient to do most hardware operations
in larger chunks, and I didn't ever buy it before or now. Superpages can
also be used to use TLBs more efficiently sometimes, etc etc. This is
coming up now because of some specific big problems that are being
encountered which is absolutely not some fundamental hardware limit we
are seeing approach.

SUSE Labs, Novell Inc.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at