Re: Is it safe for kthreadd to drain_all_pages?

From: Michal Hocko
Date: Fri Apr 07 2017 - 13:29:32 EST


On Fri 07-04-17 09:58:17, Hugh Dickins wrote:
> On Fri, 7 Apr 2017, Michal Hocko wrote:
> > On Fri 07-04-17 09:25:33, Hugh Dickins wrote:
> > [...]
> > > 24 hours so far, and with a clean /var/log/messages. Not conclusive
> > > yet, and of course I'll leave it running another couple of days, but
> > > I'm increasingly sure that it works as you intended: I agree that
> > >
> > > mm-move-pcp-and-lru-pcp-drainging-into-single-wq.patch
> > > mm-move-pcp-and-lru-pcp-drainging-into-single-wq-fix.patch
> > >
> > > should go to Linus as soon as convenient. Though I think the commit
> > > message needs something a bit stronger than "Quite annoying though".
> > > Maybe add a line:
> > >
> > > Fixes serious hang under load, observed repeatedly on 4.11-rc.
> >
> > Yeah, it is much less theoretical now. I will rephrase and ask Andrew to
> > update the chagelog and send it to Linus once I've got your final go.
>
> I don't know akpm's timetable, but your fix being more than a two-liner,
> I think it would be better if it could get into rc6, than wait another
> week for rc7, just in case others then find problems with it. So I
> think it's safer *not* to wait for my final go, but proceed on the
> assumption that it will follow a day later.

Fair enough. Andrew, could you update the changelog of
mm-move-pcp-and-lru-pcp-drainging-into-single-wq.patch
and send it to Linus along with
mm-move-pcp-and-lru-pcp-drainging-into-single-wq-fix.patch before rc6?

I would add your Teste-by Hugh but I guess you want to give your testing
more time before feeling comfortable to give it.
---
mm: move pcp and lru-pcp draining into single wq

We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches. This seems more than necessary because both can run
on a single WQ. Both do not block on locks requiring a memory allocation
nor perform any allocations themselves. We will save one rescuer thread
this way.

On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created). Initially
we thought this would be more of a theoretical problem but Hugh Dickins
has reported:
: 4.11-rc has been giving me hangs after hours of swapping load. At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().

This worker should be using WQ_RECLAIM as well in order to guarantee
a forward progress. We can reuse the same one as for lru draining and
vmstat.

Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@xxxxxxxxxx
Fixes: 0ccce3b92421 ("mm, page_alloc: drain per-cpu pages from workqueue context")
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
Suggested-by: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Acked-by: Vlastimil Babka <vbabka@xxxxxxx>
Acked-by: Mel Gorman <mgorman@xxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
--
Michal Hocko
SUSE Labs