Re: calling flush_scheduled_work()

From: Stefan Rompf
Date: Sat Mar 13 2004 - 06:49:52 EST


Andrew Morton wrote:

>> In short we have a case where mntput() is called from the kevetd
>> workqueue.
>> When that mntput() hit an NFS mount, we got a deadlock. It turns out
>> that
>> deep in the RPC code, someone calls flush_scheduled_work(). Deadlock.
>
> Seems simple enough to fix the workqueue code to handle this situation.

Code fixing one corner case won't help. Some time ago, there has been a
deadlock between a network driver that called flush_scheduled_work() while
the kernel held the rtnl semaphore and work scheduled by the linkwatch code
that needs rtnl.

I had posted a patch that changed linkwatch not to block waiting for rtnl,
however it was dropped in favor of fixing the driver (I don't own that card,
so I can't tell you if it works by now)

However, this is another example for the problem: Any code can
schedule_work(), any other code can wait in any place for this work to
complete. As long as we don't have some known consent on what functions that
runs inside the keventd workqueue may (not) do, and when it is ok to call
flush_scheduled_work(), we are always at risk that the workqueue mechanism
creates a deadlock by accident.

Stefan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/