Re: [PATCH] vmscan: scan pages until it founds eligible pages
From: Minchan Kim
Date: Wed May 10 2017 - 03:03:18 EST
On Wed, May 10, 2017 at 08:13:12AM +0200, Michal Hocko wrote:
> On Wed 10-05-17 10:46:54, Minchan Kim wrote:
> > On Wed, May 03, 2017 at 08:00:44AM +0200, Michal Hocko wrote:
> [...]
> > > @@ -1486,6 +1486,12 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan,
> > > continue;
> > > }
> > >
> > > + /*
> > > + * Do not count skipped pages because we do want to isolate
> > > + * some pages even when the LRU mostly contains ineligible
> > > + * pages
> > > + */
> >
> > How about adding comment about "why"?
> >
> > /*
> > * Do not count skipped pages because it makes the function to return with
> > * none isolated pages if the LRU mostly contains inelgible pages so that
> > * VM cannot reclaim any pages and trigger premature OOM.
> > */
>
> I am not sure this is necessarily any better. Mentioning a pre-mature
> OOM would require a much better explanation because a first immediate
> question would be "why don't we scan those pages at priority 0". Also
> decision about the OOM is at a different layer and it might change in
> future when this doesn't apply any more. But it is not like I would
> insist...
>
> > > + scan++;
> > > switch (__isolate_lru_page(page, mode)) {
> > > case 0:
> > > nr_pages = hpage_nr_pages(page);
> >
> > Confirmed.
>
> Hmm. I can clearly see how we could skip over too many pages and hit
> small reclaim priorities too quickly but I am still scratching my head
> about how we could hit the OOM killer as a result. The amount of pages
> on the active anonymous list suggests that we are not able to rotate
> pages quickly enough. I have to keep thinking about that.
I explained it but seems to be not enouggh. Let me try again.
The problem is that get_scan_count determines nr_to_scan with
eligible zones.
size = lruvec_lru_size(lruvec, lru, sc->reclaim_idx);
size = size >> sc->priority;
Assumes sc->priority is 0 and LRU list is as follows.
N-N-N-N-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H-H
(Ie, small eligible pages are in the head of LRU but others are
almost ineligible pages)
In that case, size becomes 4 so VM want to scan 4 pages but 4 pages
from tail of the LRU are not eligible pages.
If get_scan_count counts skipped pages, it doesn't reclaim remained
pages after scanning 4 pages.
If it's more helpful to understand the problem, I will add it to
the description.