Re: cmpxchg() in kernel/workqueue.c breaks things

From: Andrew Morton
Date: Thu Dec 07 2006 - 06:29:50 EST


On Thu, 07 Dec 2006 11:03:49 +0000
David Howells <dhowells@xxxxxxxxxx> wrote:

> David Miller <davem@xxxxxxxxxxxxx> wrote:
>
> > David, you have to fix the locking scheme used in kernel/workqueue.c,
> > you absolutely cannot assume that cmpxchg() is available on all
> > platforms. This breaks the build on the platforms that don't
> > have such an instruction, and no it cannot emulated.
>
> Yeah, I've figured that one out. Also, having considered things last night, I
> think the use of cmpxchg() is unnecessary.
>
> I was trying to handle against two possibilities:
>
> (1) The pending flag may have been unset or may be cleared. However, given
> where it's called, the pending flag is _always_ set. I don't think it
> can be unset whilst we're in set_wq_data().
>
> Once the work is enqueued to be actually run, the only way off the queue
> is for it to be actually run.
>
> If it's a delayed work item, then the bit can't be cleared by the timer
> because we haven't started the timer yet. Also, the pending bit can't be
> cleared by cancelling the delayed work _until_ the work item has had its
> timer started.
>
> (2) The workqueue pointer might change. This can only happen in two cases:
>
> (a) The work item has just been queued to actually run, and so we're
> protected by the appropriate workqueue spinlock.
>
> (b) A delayed work item is being queued, and so the timer hasn't been
> started yet, and so no one else knows about the work item or can
> access it (the pending bit protects us).
>
> Besides, set_wq_data() _sets_ the workqueue pointer unconditionally, so
> it can be assigned instead.
>
> So, I think replacing the set_wq_data() with a straight assignment would be
> okay in most cases. The problem is where we end up tangling with
> test_and_set_bit() emulated using spinlocks, and even then it's not a problem
> _provided_ test_and_set_bit() doesn't attempt to modify the word if the bit
> was set.
>

I don't see why the 2.6.19 logic needed changing.

a) Nobody should be freeing the work_struct itself without running
flush_scheduled_work() and

b) even if the work_struct _did_ get freed, the callback function won't
care, because there's nothing in that work_struct which it's interested
in.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/