Re: [00/17] Large Blocksize Support V3

From: Nick Piggin
Date: Thu Apr 26 2007 - 03:15:58 EST

Christoph Lameter wrote:
On Thu, 26 Apr 2007, Nick Piggin wrote:

I am working now and again on some code to do this, it is a big job but
I think it is the right way to do it. But it would take a long time to
get stable and supported by filesystems...

Ummm... We already have a radix tree for this???? What more is needed? You
just need to go through all filesystems and make them use extends.

I'm talking about block size > page size in the buffer layer.

I fail to see the point of adding another layer when you already have a

It isn't another layer. We already have this layer.

mapping through the radix tree. You just need to change the way the filesystem looks up pages.

You didn't think any of the criticisms of higher order page cache size
were valid?

What are the exact requirement you are trying to address?

Block size > page cache size.

You fundamentally cannot address the large blocksize requirements with 4k pages since you simply must have larger contiguous memory.

Large blocksize means that the device can do I/O on blocks of that size.

What can be done is to create some kind of fake linearity. At one level the radix tree and the address space already provide that. The radix tree allows you to find the next page etc. Another approach would be to create a virtual address space that fakes linearity even for the processor.

Then there are ways with I/O mmus to avoid the issues again.

However, you still have not addressed the underlying problem of the device not being able to do I/O to a larger block of memory.

With iommus and sg lists?

You guys have a couple of problems, firstly you need to have ia64
filesystems accessable to x86_64. And secondly you have these controllers
without enough sg entries for nice sized IOs.

I sympathise, and higher order pagecache might solve these in a way, but
I don't think it is the right way to go, mainly because of the fragmentation

Increasing PAGE_SIZE, support for block size > page cache size, and getting
io controllers matched to a 4K page size IMO would be some good ways to
solve these problems. I know they are probably harder...

SUSE Labs, Novell Inc.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at