Re: [00/17] Large Blocksize Support V3

From: Nick Piggin
Date: Thu Apr 26 2007 - 03:45:22 EST

Next message: Tejun Heo: "Re: [DOC] Fix wrong identifier name in Documentation/driver-model/devres.txt"
Previous message: Nigel Cunningham: "Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2:hang in atomic copy)"
In reply to: Christoph Lameter: "Re: [00/17] Large Blocksize Support V3"
Next in thread: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Christoph Lameter wrote:

On Thu, 26 Apr 2007, Nick Piggin wrote:

mapping through the radix tree. You just need to change the way the
filesystem looks up pages.

You didn't think any of the criticisms of higher order page cache size
were valid?

They are all known points that have been discussed to death.

I missed the part where you showed that it was a better solution than
the alternatives.

What are the exact requirement you are trying to address?

Block size > page cache size.

But what do you mean with it? A block is no longer a contiguous section of memory. So you have redefined the term.

I don't understand what you mean at all. A block has always been a
contiguous area of disk.

You guys have a couple of problems, firstly you need to have ia64
filesystems accessable to x86_64. And secondly you have these controllers
without enough sg entries for nice sized IOs.

This is not sgi specific sorry.

I sympathise, and higher order pagecache might solve these in a way, but
I don't think it is the right way to go, mainly because of the fragmentation
issues.

And you dont care about Mel's work on that level?

I actually don't like it too much because it can't provide a robust
solution. What do you do on systems with small memories, or those that
eventually do get fragmented?

Actually, I don't know why people are so excited about being able to
use higher order allocations (I would rather be more excited about
never having to use them). But for those few places that really need
it, I'd rather see them use a virtually mapped kernel with proper
defragmentation rather than putting hacks all through the core code.

Increasing PAGE_SIZE, support for block size > page cache size, and getting
io controllers matched to a 4K page size IMO would be some good ways to
solve these problems. I know they are probably harder...

No this has been tried before and does not work. Why should we loose the capability to work with 4k pages just because there is some data that has to be thrown around in quantity? I'd like to have flexibility here.

Is that a big problem? Really? You use 16K pages on your IPF systems,
don't you?

The fragmentation problem is solvable and we already have a solution in mm. So I do not really see a problem there?

I don't think that it is solved, and I think the heuristics that are
there would be put under more stress if they become widely used. And
it isn't only about whether we can get the page or not, but also about
the cost. Look up Linus's arguments about page colouring, which are
similar and I also think are pretty valid.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Tejun Heo: "Re: [DOC] Fix wrong identifier name in Documentation/driver-model/devres.txt"
Previous message: Nigel Cunningham: "Re: suspend2 merge (was Re: [Suspend2-devel] Re: CFS and suspend2:hang in atomic copy)"
In reply to: Christoph Lameter: "Re: [00/17] Large Blocksize Support V3"
Next in thread: Mel Gorman: "Re: [00/17] Large Blocksize Support V3"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]