Re: Kernel falls apart under light memory pressure (i.e. linkingvmlinux)

From: Mel Gorman
Date: Mon May 23 2011 - 13:35:47 EST


On Mon, May 23, 2011 at 06:42:25PM +0200, Andrea Arcangeli wrote:
> On Mon, May 23, 2011 at 08:12:50AM +0900, Minchan Kim wrote:
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 292582c..1663d24 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -231,8 +231,11 @@ unsigned long shrink_slab(struct shrink_control *shrink,
> > if (scanned == 0)
> > scanned = SWAP_CLUSTER_MAX;
> >
> > - if (!down_read_trylock(&shrinker_rwsem))
> > - return 1; /* Assume we'll be able to shrink next time */
> > + if (!down_read_trylock(&shrinker_rwsem)) {
> > + /* Assume we'll be able to shrink next time */
> > + ret = 1;
> > + goto out;
> > + }
>
> It looks cleaner to return -1 here to differentiate the failure in
> taking the lock from when we take the lock and just 1 object is
> freed. Callers seems to be ok with -1 already and more intuitive for
> the while (nr > 10) loops too (those loops could be changed to "while
> (nr > 0)" if all shrinkers are accurate and not doing something
> inaccurate like the above code did, the shrinkers retvals I didn't
> check yet).
>

Only one caller reads the value of shrink_slab() and while it would
survive -1 being returned, it gains nothing. I don't see it as being
much clearer than the existing return value of 1.

> > up_read(&shrinker_rwsem);
> > +out:
> > + cond_resched();
> > return ret;
> > }
>
> If we enter the loop some of the shrinkers will reschedule but it
> looks good for the last iteration that may have still run for some
> time before returning.

Yes.

> The actual failure of shrinker_rwsem seems only
> theoretical though (but ok to cover it too with the cond_resched, but
> in practice this should be more for the case where shrinker_rwsem
> doesn't fail).
>

Profiles from some users imply that this condition is being hit. I
can't 100% prove it as I can't reproduce the problem locally
(seems to require a sandybridge laptop for some reason). Tests did
show that kswapd CPU usage was reduced as well as the liklihood
of hanging when shrink_slab used cond_resched() like this. See
https://lkml.org/lkml/2011/5/17/274 .

> > @@ -2331,7 +2336,7 @@ static bool sleeping_prematurely(pg_data_t
> > *pgdat, int order, long remaining,
> > * must be balanced
> > */
> > if (order)
> > - return pgdat_balanced(pgdat, balanced, classzone_idx);
> > + return !pgdat_balanced(pgdat, balanced, classzone_idx);
> > else
> > return !all_zones_ok;
> > }
>
> I now wonder if this is why compaction in kswapd didn't work out well
> and kswapd would spin at 100% load so much when compaction was added,

It's possible.

> plus with kswapd-compaction patch I think this code should be changed
> to:
>
> if (!COMPACTION_BUILD && order)
> return !pgdat_balanced();
> else
> return !all_zones_ok;
>
> (but only with kswapd-compaction)
>

Why? kswapd can enter lumpy reclaim when !COMPACTION_BUILD. While this
is hardly desirable, I don't see why kswapd should use different logic
for balancing depending on whether compaction is used or not.

> I should probably give kswapd-compaction another spin after fixing
> this, because with compaction kswapd should be super successful at
> satisfying zone_watermark_ok_safe(zone, _order_...) in the
> sleeping_prematurely high watermark check, leading to pgdat_balanced
> returning true most of the time (which would make kswapd go crazy spin
> instead of stopping as it was supposed to). Mel, do you also think
> it's worth another try with a fixed sleeping_prematurely like above?
>

It's worth a try anyway although I think it's more important to figure
out if all_unreclaimable is being improperly set or not.

> Another thing, I'm not excited of the schedule_timeout(HZ/10) in
> kswapd_try_to_sleep(), it seems all for the statistics.

It's to catch where kswapd balances a zone but continual allocations put
the zone under the high watermark quickly. It's to keep kswapd awake to
reduce the likelihood that processes get hit the min watermark and
stall.

--
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/