Re: [00/41] Large Blocksize Support V7 (adds memmap support)

From: Christoph Lameter
Date: Fri Sep 14 2007 - 14:09:10 EST


On Fri, 14 Sep 2007, Nick Piggin wrote:

> However fsblock can do everything that higher order pagecache can
> do in terms of avoiding vmap and giving contiguous memory to block
> devices by opportunistically allocating higher orders of pages, and falling
> back to vmap if they cannot be satisfied.

fsblock is restricted to the page cache and cannot be used in other
contexts where subsystems can benefit from larger linear memory.

> So if you argue that vmap is a downside, then please tell me how you
> consider the -ENOMEM of your approach to be better?

That is again pretty undifferentiated. Are we talking about low page
orders? There we will reclaim the all of reclaimable memory before getting
an -ENOMEM. Given the quantities of pages on todays machine--a 1 G machine
has 256 milllion 4k pages--and the unmovable ratios we see today it
would require a very strange setup to get an allocation failure while
still be able to allocate order 0 pages.

With the ZONE_MOVABLE you can remove the unmovable objects into a defined
pool then higher order success rates become reasonable.

> If, by special software layer, you mean the vmap/vunmap support in
> fsblock, let's see... that's probably all of a hundred or two lines.
> Contrast that with anti-fragmentation, lumpy reclaim, higher order
> pagecache and its new special mmap layer... Hmm, seems like a no
> brainer to me. You really still want to persue the "extra layer"
> argument as a point against fsblock here?

Yes sure. You code could not live without these approaches. Without the
antifragmentation measures your fsblock code would not be very successful
in getting the larger contiguous segments you need to improve performance.

(There is no new mmap layer, the higher order pagecache is simply the old
API with set_blocksize expanded).


> Of course I wouldn't state that. On the contrary, I categorically state that
> I have already solved it :)

Well then I guess that you have not read the requirements...

> > Because it has already been rejected in another form and adds more
>
> You have rejected it. But they are bogus reasons, as I showed above.

Thats not me. I am working on this because many of the filesystem people
have repeatedly asked me to do this. I am no expert on filesystems.

> You also describe some other real (if lesser) issues like number of page
> structs to manage in the pagecache. But this is hardly enough to reject
> my patch now... for every downside you can point out in my approach, I
> can point out one in yours.
>
> - fsblock doesn't require changes to virtual memory layer

Therefore it is not a generic change but special to the block layer. So
other subsystems still have to deal with the single page issues on
their own.

> > Maybe we coud get to something like a hybrid that avoids some of these
> > issues? Add support so something like a virtual compound page can be
> > handled transparently in the filesystem layer with special casing if
> > such a beast reaches the block layer?
>
> That's conceptually much worse, IMO.

Why: It is the same approach that you use. If it is barely ever used and
satisfies your concern then I am fine with it.

> And practically worse as well: vmap space is limited on 32-bit; fsblock
> approach can avoid vmap completely in many cases; for two reasons.
>
> The fsblock data accessor APIs aren't _that_ bad changes. They change
> zero conceptually in the filesystem, are arguably cleaner, and can be
> essentially nooped if we wanted to stay with a b_data type approach
> (but they give you that flexibility to replace it with any implementation).

The largeblock changes are generic. They improve general handling of
compound pages, they make the existing APIs work for large units of
memory, they are not adding additional new API layers.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/