Re: [PATCH] workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue

From: Tejun Heo
Date: Fri Jan 29 2016 - 06:09:58 EST


Hello, Peter.

On Thu, Jan 28, 2016 at 11:12:10AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 26, 2016 at 06:38:43PM +0100, Thierry Reding wrote:
> > > Task or work item involved in memory reclaim trying to flush a
> > > non-WQ_MEM_RECLAIM workqueue or one of its work items can lead to
> > > deadlock. Trigger WARN_ONCE() if such conditions are detected.
> > I've started noticing the following during boot on some of the devices I
> > work with:
> >
> > [ 4.723705] WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
> > [ 4.736818] workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
...
> Right, also, I think it makes sense to do lru_add_drain_all() from a
> WQ_MEM_RECLAIM workqueue, it is, after all, aiding in getting memory
> freed.
>
> Does something like the below cure things?
>
> TJ does this make sense to you?

The problem here is that deferwq which has nothing to do with memory
reclaim is marked with WQ_MEM_RECLAIM because it was created the old
create_singlethread_workqueue() which doesn't distinguish workqueues
which may be used on mem reclaim path and thus has to mark all as
needing forward progress guarantee. I posted a patch to disable
disable flush dependency checks on those workqueues and there's a
outreachy project to weed out the users of the old interface, so
hopefully this won't be an issue soon.

As for whether lru drain should have WQ_MEM_RECLAIM, I'm not sure.
Cc'ing linux-mm. The rule here is that any workquee which is depended
upon during memory reclaim should have WQ_MEM_RECLAIM set. IOW, if a
work item failing to start execution under memory pressure can prevent
memory from being reclaimed, it should be scheduled on a
WQ_MEM_RECLAIM workqueue. Would this be the case for
lru_add_drain_per_cpu()?

Thanks.

--
tejun