Re: [PATCH 17/33] readahead: context based method

From: Wu Fengguang
Date: Wed May 24 2006 - 21:25:25 EST


On Wed, May 24, 2006 at 05:53:36PM +0200, Peter Zijlstra wrote:
> On Wed, 2006-05-24 at 21:33 +0800, Wu Fengguang wrote:
> > On Wed, May 24, 2006 at 02:37:48PM +0200, Peter Zijlstra wrote:
> > > On Wed, 2006-05-24 at 19:13 +0800, Wu Fengguang wrote:
> > >
> > > > +#define PAGE_REFCNT_0 0
> > > > +#define PAGE_REFCNT_1 (1 << PG_referenced)
> > > > +#define PAGE_REFCNT_2 (1 << PG_active)
> > > > +#define PAGE_REFCNT_3 ((1 << PG_active) | (1 << PG_referenced))
> > > > +#define PAGE_REFCNT_MASK PAGE_REFCNT_3
> > > > +
> > > > +/*
> > > > + * STATUS REFERENCE COUNT
> > > > + * __ 0
> > > > + * _R PAGE_REFCNT_1
> > > > + * A_ PAGE_REFCNT_2
> > > > + * AR PAGE_REFCNT_3
> > > > + *
> > > > + * A/R: Active / Referenced
> > > > + */
> > > > +static inline unsigned long page_refcnt(struct page *page)
> > > > +{
> > > > + return page->flags & PAGE_REFCNT_MASK;
> > > > +}
> > > > +
> > > > +/*
> > > > + * STATUS REFERENCE COUNT TYPE
> > > > + * __ 0 fresh
> > > > + * _R PAGE_REFCNT_1 stale
> > > > + * A_ PAGE_REFCNT_2 disturbed once
> > > > + * AR PAGE_REFCNT_3 disturbed twice
> > > > + *
> > > > + * A/R: Active / Referenced
> > > > + */
> > > > +static inline unsigned long cold_page_refcnt(struct page *page)
> > > > +{
> > > > + if (!page || PageActive(page))
> > > > + return 0;
> > > > +
> > > > + return page_refcnt(page);
> > > > +}
> > > > +
> > >
> > > Why all of this if all you're ever going to use is cold_page_refcnt.
> >
> > Well, the two functions have a long history...
> >
> > There has been a PG_activate which makes the two functions quite
> > different. It was later removed for fear of the behavior changes it
> > introduced. However, there's still possibility that someone
> > reintroduce similar flags in the future :)
> >
> > > What about something like this:
> > >
> > > static inline int cold_page_referenced(struct page *page)
> > > {
> > > if (!page || PageActive(page))
> > > return 0;
> > > return !!PageReferenced(page);
> > > }
> >
> > Ah, here's another theory: the algorithm uses reference count
> > conceptually, so it may be better to retain the current form.
>
> Reference count of what exactly, if you were to say of the page, I'd
> have expected only the first function, page_refcnt().
>
> What I don't exactly understand is why you specialise to the inactive
> list. Why do you need that?
>
> The reason I'm asking is that when I merge this with my page replacement
> work, I need to find a generalised concept. cold_page_refcnt() would
> become to mean something like: number of references for those pages that
> are direct reclaim candidates. And honestly, that doesn't make a lot of
> sense.
>
> If you could explain the concept behind this, I'd be grateful.

Good question, and sorry for mentioning this...

There are some background info here:

[DISTURBS] section of
http://marc.theaimsgroup.com/?l=linux-kernel&m=112678976802381&w=2

[DELAYED ACTIVATION] section of
http://marc.theaimsgroup.com/?l=linux-kernel&m=112679176611006&w=2

It involves a tricky situation where there are two sequential readers
that come close enough, so that the follower retouched the pages
visited by the leader:

chunk 1 chunk 2 chunk 3
========== =============------- --------------------
follower ^ leader ^

It is all ok if the revisited pages still stay in the inactive list,
these pages will act as measurement of len(inactive list)/speed(leader).
But if the revisited pages(marked by '=') are sent to active list
immediately, the measurement will no longer be as accurate. The trace
is 'disturbed'. In this case, using page_refcnt() can be aggressive
and unsafe from thrashing, while cold_page_refcnt() can be conservative.

So either one of page_refcnt()/cold_page_refcnt() should be ok, as
long as we know the consequence of this situation. After all, it is
really uncommon to see much invocation of the context based method,
and even rare for this kind of situation to happen.

Regards,
Wu
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/