Re: INFO: rcu_sched detected stalls on CPUs/tasks with `kswapd` and `mem_cgroup_shrink_node`

From: Paul E. McKenney
Date: Thu Dec 01 2016 - 14:39:21 EST


On Thu, Dec 01, 2016 at 09:10:01PM +0300, Boris Zhmurov wrote:
> Michal Hocko 30/11/16 21:25:
>
> >>> Do I get it right that s@cond_resched_rcu_qs@cond_resched@ didn't help?
> >>
> >> I didn't try that. I've tried 4 patches from Paul's linux-rcu tree.
> >> I can try another portion of patches, no problem :)
> >
> > Replacing cond_resched_rcu_qs in shrink_node_memcg by cond_resched would
> > be really helpful to tell whether we are missing a real scheduling point
> > or whether something more serious is going on here.
>
> Well, I can confirm, that replacing cond_resched_rcu_qs in
> shrink_node_memcg by cond_resched also makes dmesg clean from RCU CPU
> stall warnings.
>
> I've attached patch (just modification of Paul's patch), that fixes RCU
> stall messages in situations, when all memory is used by
> couchbase/memcached + fs cache and linux starts to use swap.
>
>
> --
> Boris Zhmurov
> System/Network Administrator
> mailto: bb@xxxxxxxxxxxxxx
> "wget http://kernelpanic.ru/bb_public_key.pgp -O - | gpg --import"

> --- a/mm/vmscan.c.orig 2016-11-30 21:52:58.314895320 +0300
> +++ b/mm/vmscan.c 2016-11-30 21:53:29.502895320 +0300
> @@ -2352,6 +2352,7 @@
> nr_reclaimed += shrink_list(lru, nr_to_scan,
> lruvec, sc);
> }
> + cond_resched();
> }
>
> if (nr_reclaimed < nr_to_reclaim || scan_adjusted)

Nice!

Just to double-check, could you please also test your patch above with
these two commits from -rcu?

d2db185bfee8 ("rcu: Remove short-term CPU kicking")
f8f127e738e3 ("rcu: Add long-term CPU kicking")

Thanx, Paul