On Thu, Apr 26, 2007 at 04:50:06PM +1000, Nick Piggin wrote:
Improving the buffer layer would be a good way. Of course, that is
a long and difficult task, so nobody wants to do it.
It's also a stupid idea. We got rid of the buffer layer because it's
a complete pain in the ass, and now you want to reintroduce one that's
even more complex, and most likely even slower than the elegant solution?
Well, for those architectures (and this would solve your large block
size and 16TB pagecache size without any core kernel changes), you
can manage 1<<order hardware ptes as a single Linux pte. There is
nothing that says you must implement PAGE_SIZE as a single TLB sized
page.
Well, ppc64 can do that. And guess what, it really painfull for a lot
of workloads. Think of a poor ps3 with 256 from which the broken hypervisor
already takes a lot away and now every file in the pagecache takes
64k, every thread stack takes 64k, etc? It's good to have variable
sized objects in places where it makes sense, and the pagecache is
definitively one of them.