RE: nfsd oops on Linus' current tree.

From: Myklebust, Trond
Date: Fri Dec 21 2012 - 18:36:48 EST


Please reread what I said. There was no obvious circular dependency, because nfsiod and rpciod are separate workqueues, both created with WQ_MEM_RECLAIM. Dros' experience shows, however that a call to rpc_shutdown_client in an nfsiod work item will deadlock with rpciod if the RPC task's work item has been assigned to the same CPU as the one running the rpc_shutdown_client work item.

I can't tell right now if that is intentional (in which case the WARN_ON in the rpc code is correct), or if it is a bug in the workqueue code. For now, we're assuming the former.

________________________________________
From: J. Bruce Fields [bfields@xxxxxxxxxxxx]
Sent: Friday, December 21, 2012 6:26 PM
To: Myklebust, Trond
Cc: Dave Jones; Linux Kernel; linux-nfs@xxxxxxxxxxxxxxx; Adamson, Dros
Subject: Re: nfsd oops on Linus' current tree.

On Fri, Dec 21, 2012 at 11:15:40PM +0000, Myklebust, Trond wrote:
> Apologies for top-posting. The SSD on my laptop died, and so I'm stuck using webmail for this account...

Fun! If that happens to me on this trip, I've got a week trying to hack
the kernel from my cell phone....

> Our experience with nfsiod is that the WQ_MEM_RECLAIM option still deadlocks despite the "rescuer thread". The CPU that is running the workqueue will deadlock with any rpciod task that is assigned to the same CPU. Interestingly enough, the WQ_UNBOUND option also appears able to deadlock in the same situation.
>
> Sorry, I have no explanation why...

As I said:

> there shouldn't be any deadlock as long as there's no circular
> dependency among the three.

There was a circular dependency (of rpciod on itself), so having a
dedicated rpciod rescuer thread wouldn't help--once the rescuer thread
is waiting for work queued to do the same queue you're asking for
trouble.

The last argument in

alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1);

ensures that it will never allow more than 1 piece of work to run per
CPU, so the deadlock should be pretty easy to hit.

And with UNBOUND that's only one piece of work globally, so yeah all you
need is an rpc at shutdown time and it should deadlock every time.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/