Re: nfsd oops on Linus' current tree.

From: J. Bruce Fields
Date: Fri Dec 21 2012 - 18:26:13 EST


On Fri, Dec 21, 2012 at 11:15:40PM +0000, Myklebust, Trond wrote:
> Apologies for top-posting. The SSD on my laptop died, and so I'm stuck using webmail for this account...

Fun! If that happens to me on this trip, I've got a week trying to hack
the kernel from my cell phone....

> Our experience with nfsiod is that the WQ_MEM_RECLAIM option still deadlocks despite the "rescuer thread". The CPU that is running the workqueue will deadlock with any rpciod task that is assigned to the same CPU. Interestingly enough, the WQ_UNBOUND option also appears able to deadlock in the same situation.
>
> Sorry, I have no explanation why...

As I said:

> there shouldn't be any deadlock as long as there's no circular
> dependency among the three.

There was a circular dependency (of rpciod on itself), so having a
dedicated rpciod rescuer thread wouldn't help--once the rescuer thread
is waiting for work queued to do the same queue you're asking for
trouble.

The last argument in

alloc_workqueue("rpciod", WQ_MEM_RECLAIM, 1);

ensures that it will never allow more than 1 piece of work to run per
CPU, so the deadlock should be pretty easy to hit.

And with UNBOUND that's only one piece of work globally, so yeah all you
need is an rpc at shutdown time and it should deadlock every time.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/