Re: userspace pagecache management tool

From: Peter Zijlstra
Date: Sun Mar 04 2007 - 09:36:05 EST


On Sun, 2007-03-04 at 04:07 -0800, Andrew Morton wrote:
> On Sat, 03 Mar 2007 20:56:27 -0500 Rik van Riel <riel@xxxxxxxxxx> wrote:
>
> > Andrew Morton wrote:
> >
> > >>> Doing a refault thing would help a bit, but stops working at a certain point.
> > >> At what point does it stop working?
> > >
> > > We need to store that this-page-got-reclaimed info somewhere. I don't know
> > > how space-efficient that is. Did anyone ever do an implementation?
> >
> > One 32 bit word per evicted page that we keep track of.
>
> ok...
>
> I wonder if we really need a new data structure to track that. I mean,
> once a file-backed (or indeed swapcache) page has been reclaimed, its
> radix-tree slot is just sitting there with zeroes in it, asking us to reuse
> that space for something interesting, no?
>
> Of course, if all 64 pages in a radix-tree node get removed, we'll
> currently free the node itself. We could stop doing that, but the effects
> of that might be pretty bad sometimes. Instead, it sounds sensible to
> populate the now-null slot in the parent radix-tree node with an
> average/max/min/per-child-bitmap/whatever of the metrics for the 64
> non-resident pages which that non-leaf slot represents. So as the period
> since a single page got evicted increases and increases, our information
> about its state becomes less and less accurate.
>
> If that inaccuracy is a problem then perhaps we could defer the collapsing
> of a now-empty node into its parent in some manner.

Getting the refault distance out of such a radix tree would be tricky.
One solution I can think of would entail keeping a global fault count
and storing the current fault count in the radix node and on refault
subtract from the global count. The downside however is this global
thing, perhaps we could do some smart percpu count aggregate to fix it.

The other point you mention is when to we reap these radix tree nodes,
normally nonresident information gets dropped once the distance is
further than our memory is big, but these nodes donÂt have explicit
order.

The collapsing idea is interesting, esp. if we could delay the collapse
so that the avg refault distance would be in some relation to the error.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/