Re: [00/17] Large Blocksize Support V3

From: Mel Gorman
Date: Thu Apr 26 2007 - 06:48:32 EST


On (26/04/07 17:42), Nick Piggin didst pronounce:
> Christoph Lameter wrote:
> >On Thu, 26 Apr 2007, Nick Piggin wrote:
> >
> >
> >>>mapping through the radix tree. You just need to change the way the
> >>>filesystem looks up pages.
> >>
> >>You didn't think any of the criticisms of higher order page cache size
> >>were valid?
> >
> >
> >They are all known points that have been discussed to death.
>
> I missed the part where you showed that it was a better solution than
> the alternatives.
>
>
> >>>What are the exact requirement you are trying to address?
> >>
> >>Block size > page cache size.
> >
> >
> >But what do you mean with it? A block is no longer a contiguous section of
> >memory. So you have redefined the term.
>
> I don't understand what you mean at all. A block has always been a
> contiguous area of disk.
>

Yes, but what you seem to be proposing is that lower layers be
able to treat non-contiguous pages as one IO artifact - like
non-contiguous-compound-pages. Ultimately both are probably needed. i.e.
Use contiguous pages for large blocks where possible but be able to deal
with the compound page consisting of multiple smaller pages with a
regression in performance when necessary.

I don't think what Christoph and Nick are proposing are mutually
exclusive. The arguement is really "which do we deal with first".

>
> >>You guys have a couple of problems, firstly you need to have ia64
> >>filesystems accessable to x86_64. And secondly you have these controllers
> >>without enough sg entries for nice sized IOs.
> >
> >
> >This is not sgi specific sorry.
> >
> >
> >>I sympathise, and higher order pagecache might solve these in a way, but
> >>I don't think it is the right way to go, mainly because of the
> >>fragmentation
> >>issues.
> >
> >
> >And you dont care about Mel's work on that level?
>
> I actually don't like it too much because it can't provide a robust
> solution. What do you do on systems with small memories, or those that
> eventually do get fragmented?
>

They won't be creating filesystems with large blocks for a start but
even if they did, they would need to your proposal.

Again, I don't think they are mutually exclusive as such.

> Actually, I don't know why people are so excited about being able to
> use higher order allocations (I would rather be more excited about
> never having to use them). But for those few places that really need
> it, I'd rather see them use a virtually mapped kernel with proper
> defragmentation rather than putting hacks all through the core code.
>

That involves creating a vmalloc-like area or breaking the 1:1 physical:virtual
mapping in the kernel address space. Of those two, I think a vmalloc area
would be easier, have less performance impact and might even be a starting
point for non-contiguous-compound-pages.

hmm.

> >>Increasing PAGE_SIZE, support for block size > page cache size, and
> >>getting
> >>io controllers matched to a 4K page size IMO would be some good ways to
> >>solve these problems. I know they are probably harder...
> >
> >
> >No this has been tried before and does not work. Why should we loose the
> >capability to work with 4k pages just because there is some data that
> >has to be thrown around in quantity? I'd like to have flexibility here.
>
> Is that a big problem? Really? You use 16K pages on your IPF systems,
> don't you?
>
>
> >The fragmentation problem is solvable and we already have a solution in
> >mm. So I do not really see a problem there?
>
> I don't think that it is solved, and I think the heuristics that are
> there would be put under more stress if they become widely used. And
> it isn't only about whether we can get the page or not, but also about
> the cost. Look up Linus's arguments about page colouring, which are
> similar and I also think are pretty valid.
>

Page coloring has come up a lot in the past. Can you point me at an
example you have in mind and I'll see if the same arguments really
apply.

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/