Re: [PATCH] Add kerneldoc for flush_scheduled_work()

From: Alan Stern
Date: Wed Aug 12 2009 - 12:23:42 EST


On Wed, 12 Aug 2009, James Bottomley wrote:

> This all boils down to "the race window for a deadlock may be narrower".
>
> Instead of training programmers to narrow deadlock races, we should be
> training them to avoid them.
>
> The entangled deadlock problem occurs in all of our _sync() APIs as well
> as interrupt and other workqueue stuff.
>
> The rules are something like
>
> Never use synchronous operations if you can avoid them. If you must use
> operations that wait for another thread to complete (say because you're
> about to destroy data structures that queued elements may be using):
> 1. Never hold any locks or mutexes while waiting for the completion
> of synchronous operations
> 2. If you have to hold a lock while waiting:
> 1. If it's a global lock, make sure you're using a local
> queue and that nothing you submitted to the queue can
> take the lock
> 2. If it's a local lock, you may use a global queue but
> must still make sure that nothing you submitted to the
> queue can take the lock.

I haven't seen these rules written down anywhere in the kernel
documentation. Presumably people are supposed to be aware of them
already?

Anyway, 2.1 and 2.2 are wrong. They should read: "... make sure that
nothing submitted to the queue calls any of your routines that can take
the lock." The point being that even though _you_ don't submit
anything bad to the queue, somebody else might do so.

If you use cancel_work_sync() instead of flush_scheduled_work() then
the rules become less onerous. In place of your 2.1 and 2.2, we have:

Make sure the work item you are cancelling cannot take
the lock.

This is much easier to verify.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/