Re: deadlock in lru_add_drain ? (3.14rc5)

From: Tejun Heo
Date: Mon Mar 10 2014 - 11:01:30 EST


Hello,

On Sat, Mar 08, 2014 at 05:18:34PM -0800, Linus Torvalds wrote:
> Adding more appropriate people to the cc.
>
> That semaphore was added by commit 5fbc461636c3 ("mm: make
> lru_add_drain_all() selective"), and acked by Tejun. But we've had

It's essentially custom static implementation of
schedule_on_each_cpu() which uses the mutex to protect the static
buffers. schedule_on_each_cpu() is different in that it uses dynamic
allocation and can be reentered.

> problems before with holding locks and then calling flush_work(),
> since that has had a tendency of deadlocking. I think we have various
> lockdep hacks in place to make "flush_work()" trigger some of the
> problems, but I'm not convinced it necessarily works.

If this were caused by lru_add_drain_all() entering itself, the
offender must be pretty clear in its stack trace. It probably
involves more elaborate dependency chain. No idea why wq lockdep
annotation would trigger on it tho. The flush_work() annotation is
pretty straight-forward.

> On Sat, Mar 8, 2014 at 2:00 PM, Dave Jones <davej@xxxxxxxxxx> wrote:
> > I left my fuzzing box running for the weekend, and checked in on it this evening,
> > to find that none of the child processes were making any progress.
> > cat'ing /proc/n/stack shows them all stuck in the same place..
> > Some examples:

Dave, any chance you can post full sysrq-t dump?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/