Re: net/sunrpc: v4.14-rc4 lockdep warning

From: tj@xxxxxxxxxx
Date: Tue Oct 10 2017 - 10:03:50 EST


Hello, Trond.

On Mon, Oct 09, 2017 at 06:32:13PM +0000, Trond Myklebust wrote:
> On Mon, 2017-10-09 at 19:17 +0100, Lorenzo Pieralisi wrote:
> > I have run into the lockdep warning below while running v4.14-rc3/rc4
> > on an ARM64 defconfig Juno dev board - reporting it to check whether
> > it is a known/genuine issue.
> >
> > Please let me know if you need further debug data or need some
> > specific tests.
> >
> > [ 6.209384] ======================================================
> > [ 6.215569] WARNING: possible circular locking dependency detected
> > [ 6.221755] 4.14.0-rc4 #54 Not tainted
> > [ 6.225503] ------------------------------------------------------
> > [ 6.231689] kworker/4:0H/32 is trying to acquire lock:
> > [ 6.236830] ((&task->u.tk_work)){+.+.}, at: [<ffff0000080e64cc>]
> > process_one_work+0x1cc/0x3f0
> > [ 6.245472]
> > but task is already holding lock:
> > [ 6.251309] ("xprtiod"){+.+.}, at: [<ffff0000080e64cc>]
> > process_one_work+0x1cc/0x3f0
> > [ 6.259158]
> > which lock already depends on the new lock.
> >
> > [ 6.267345]
> > the existing dependency chain (in reverse order) is:
..
> Adding Tejun and Lai, since this looks like a workqueue locking issue.

It looks a bit cryptic but it's warning against the following case.

1. Memory pressure is high and rescuer kicks in for the xprtiod
workqueue. There are no other kworkers serving the workqueue.

2. The rescuer runs the xptr_destroy path and ends up calling
cancel_work_sync() on a work item which is queued on xprtiod.

3. The work item is pending on the same workqueue and assuming that
memory pressure doesn't let off (let's say reclaim is trying to
kick off nfs pages), the only way it can get executed is by the
rescuer which is waiting for the work item - an A-B-A deadlock.

Thanks.

--
tejun