Re: [PATCH 6/9] clockpro-clockpro.patch

From: Marcelo Tosatti
Date: Sat Dec 31 2005 - 21:58:20 EST


On Sat, Dec 31, 2005 at 11:48:37AM +0100, Peter Zijlstra wrote:
> On Fri, 2005-12-30 at 22:24 -0200, Marcelo Tosatti wrote:
> > Hi Peter,
> >
> > _Nice_ work!
>
> Thanks!
>
> > IMHO you're going into the right direction, abstracting away page
> > replacement policy from page reclaim.
> >
> > I think that final objective should be to abstract it away completly,
> > making it possible to select between different policies, allowing
> > further experimentation and implementations such as energy efficient
> > algorithms.
> >
> > How hard do you think would it be to enhance your patches to allow for
> > compile-time selectable policies?
>
> Not that much more work, it would need abstracing all the usage of the
> list counters (nr_active/nr_inactive vs. nr_resident/nr_cold).

That would be very interesting.

> > For instance, moving "page reclaim scanner" specific information into
> > its own container:
> >
> > @@ -140,12 +140,13 @@ struct zone {
> > /* Fields commonly accessed by the page reclaim scanner */
> > - spinlock_t lru_lock;
> > - struct list_head active_list;
> > - struct list_head inactive_list;
> > - unsigned long nr_scan_active;
> > - unsigned long nr_active;
> > - unsigned long nr_inactive;
> > + spinlock_t lru_lock;
> > + struct list_head list_hand[2];
> > + unsigned long nr_resident;
> > + unsigned long nr_cold;
> > + unsigned long nr_cold_target;
> > + unsigned long nr_nonresident_scale;
> > +
> >
> > Such as "struct reclaim_policy_data" or a better name.
>
> Yes, I have toyed with that idea, rik didn't like it and I didn't spend
> any effort on it, but it could very well be done.

I'll come up with a proposal for review on top of your work.

> > About CLOCK-Pro itself, I think that a small document with a short
> > introduction would be very useful... explaining that it uses inter
> > reference distance instead of recency for the page replacement criteria,
> > and why this criteria is fundamentally more appropriate for a large set
> > of common access patterns aka "a resume of the CLOCK-Pro paper".
>
> Ok, I shall give this Documentation/vm/clockpro.txt thing a try.
>
>
> > > Implementation wise I use something based on Rik van Riel's nonresident code
> > > which actually aproximates a clock with reduced order.
> >
> > I'm curious about hash collisions, would like to know more details about
> > the hash distribution under different loads.
> >
> > Would be nice to measure the rate of updates on each hash bucket and
> > confirm that they are approximate.
>
> I have/had a patch that prints stats on each bucket, I did some stats a
> few months back and the deviation in bucket usage was not very high,
> which would indicate a rather good distribution.
>
> Could revive that patch so you can have a go at it if you wish.

Please, will give it a try.

> > > The resident clock with two hands is implemented using two lists which are to
> > > be seen as laid head to tail to form the clock. When one hand laps the other
> > > the lists are swapped.
> >
> > How does that differ from the original CLOCK-Pro algorithm, and why, and what are
> > the expected outcomes? Please make it easier for others to understand why the hands
> > swap, and when, and why.
>
> The original clockpro algorithm has one clock with 3 hands. In order to
> make it work with multiple resident zones, the non-resident pages have
> to be isolated.
>
> I did that by having two clocks, one resident with two hands (per zone)
> and one non-resident with one hand (global), where the non-resident
> clock should be viewed as an overlay on the resident one (imagine the
> single zone case).
>
> This loses some page order information, ie. the exact position of the
> non-resident pages wrt. the resident pages, however it is a good
> approximation when the rotation speeds of the respective hands are tied
> together such that:
> when the resident hot hand has made a full revolution so too has the
> non-resident hand.

> > > Each page has 3 state bits:
> > >
> > > hot -> PageHot()
> > > test -> PageTest()
> > > ref -> page_referenced()
> > >
> > > (PG_active will be renamed to PG_hot in a following patch, since the semantics
> > > changed also change the name in order to avoid confusion))
> > >
> > > The HandCold rotation is driven by page reclaim needs. HandCold in turn
> > > drives HandHot, for every page HandCold promotes to hot HandHot needs to
> > > degrade one hot page to cold.
> >
> > Why do you use only two clock hands and not three (HandHot, HandCold and HandTest)
> > as in the original paper?
>
> As explanied above, the multi-zone thing requires the non-resident pages
> to be separated.
>
> > > + * res | h/c | tst | ref || Hcold | Hhot | Htst || Flt
> > > + * ----+-----+-----+-----++-------+------+------++-----
> > > + * 1 | 1 | 0 | 1 ||=1101 | 1100 |=1101 ||
> > > + * 1 | 1 | 0 | 0 ||=1100 | 1000 |=1100 ||
> > > + * ----+-----+-----+-----++-------+------+------++-----
> > > + * 1 | 0 | 1 | 1 || 1100 | 1001 | 1001 ||
> > > + * 1 | 0 | 1 | 0 ||X0010 | 1000 | 1000 ||
> > > + * 1 | 0 | 0 | 1 || 1010 |=1001 |=1001 ||
> > > + * 1 | 0 | 0 | 0 ||X0000 |=1000 |=1000 ||
> > > + * ----+-----+-----+-----++-------+------+------++-----
> > > + * ----+-----+-----+-----++-------+------+------++-----
> > > + * 0 | 0 | 1 | 1 || | | || 1100
> > > + * 0 | 0 | 1 | 0 ||=0010 |X0000 |X0000 ||
> > > + * 0 | 0 | 0 | 1 || | | || 1010
> >
> > What does this mean? Can you make it easier for ignorant people like
> > myself to understand?
>
> state table, it describes how (in the original paper) the three hands
> modify the page state. Given the state in the first four columns, the
> next three columns give a new state for each hand; hand cold, hot and
> test. The last column describes the action of a pagefault.
>
> Ex. given a resident cold page in its test period that is referenced
> (1011):
> - Hand cold will make it 1100, that is, a resident hot page;
> - Hand hot will make it 1001, that is, a resident cold page with a
> reference; and
> - Hand test will also make it 1001.
>
> (The prefixes '=' and 'X' are used to indicate: not changed, and remove
> from list - that can be either move from resident->non-resident or
> remove altogether).

I see - can you add this info to the patch?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/