Re: [PATCH RFC] extent mapped page cache

From: Chris Mason
Date: Wed Jul 25 2007 - 08:19:55 EST


On Wed, 25 Jul 2007 04:32:17 +0200
Nick Piggin <npiggin@xxxxxxx> wrote:

> On Tue, Jul 24, 2007 at 07:25:09PM -0400, Chris Mason wrote:
> > On Tue, 24 Jul 2007 23:25:43 +0200
> > Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
> >
> > The tree is a critical part of the patch, but it is also the
> > easiest to rip out and replace. Basically the code stores a range
> > by inserting an object at an index corresponding to the end of the
> > range.
> >
> > Then it does searches by looking forward from the start of the
> > range. More or less any tree that can search and return the first
> > key >= than the requested key will work.
> >
> > So, I'd be happy to rip out the tree and replace with something
> > else. Going completely lockless will be tricky, its something that
> > will deep thought once the rest of the interface is sane.
>
> Just having the other tree and managing it is what makes me a little
> less positive of this approach, especially using it to store pagecache
> state when we already have the pagecache tree.
>
> Having another tree to store block state I think is a good idea as I
> said in the fsblock thread with Dave, but I haven't clicked as to why
> it is a big advantage to use it to manage pagecache state. (and I can
> see some possible disadvantages in locking and tree manipulation
> overhead).

Yes, there are definitely costs with the state tree, it will take some
careful benchmarking to convince me it is a feasible solution. But,
storing all the state in the pages themselves is impossible unless the
block size equals the page size. So, we end up with something like
fsblock/buffer heads or the state tree.

One advantage to the state tree is that it separates the state from
the memory being described, allowing a simple kmap style interface
that covers subpages, highmem and superpages.

It also more naturally matches the way we want to do IO, making for
easy clustering.

O_DIRECT becomes a special case of readpages and writepages....the
memory used for IO just comes from userland instead of the page cache.

The ability to put in additional tracking info like the process that
first dirtied a range is also significant. So, I think it is worth
trying.

-chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/