Re: [RFC/PATCH] debug workqueue deadlocks with lockdep
From: Oleg Nesterov
Date: Sat Jun 30 2007 - 07:46:24 EST
On 06/30, Ingo Molnar wrote:
>
> On Thu, 2007-06-28 at 19:33 +0200, Johannes Berg wrote:
> > No, that's not right either, but Arjan just helped me a bit with how
> > lockdep works and I think I have the right idea now. Ignore this for
> > now, I'll send a new patch in a few days.
>
> ok. But in general, this is a very nice idea!
>
> i've Cc:-ed Oleg. Oleg, what do you think? I think we should keep all
> the workqueue APIs specified in a form that makes them lockdep coverable
> like Johannes did. This debug mechanism could have helped with the
> recent DVB lockup that Thomas Sattler reported.
I think this idea is great!
Johannes, could you change wait_on_work() as well? Most users of
flush_workqueue() should be converted to use it.
> @@ -342,6 +351,9 @@ static int flush_cpu_workqueue(struct cp
> } else {
> struct wq_barrier barr;
>
> + lock_acquire(&cwq->wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_release(&cwq->wq->lockdep_map, 0, _THIS_IP_);
> +
> active = 0;
> spin_lock_irq(&cwq->lock);
I am not sure why you skip "if (cwq->thread == current)" case, it can
deadlock in the same way.
But, perhaps we should not change flush_cpu_workqueue(). If we detect the
deadlock, we will have num_online_cpus() reports, yes?
And,
> if (!list_empty(&cwq->worklist) || cwq->current_work != NULL) {
> @@ -376,6 +388,8 @@ void fastcall flush_workqueue(struct wor
> int cpu;
>
> might_sleep();
> + lock_acquire(&wq->lockdep_map, 0, 0, 0, 2, _THIS_IP_);
> + lock_release(&wq->lockdep_map, 0, _THIS_IP_);
one of the 2 callers was already modified. Perhaps it is better to add
lock_acquire() into the second caller, cleanup_workqueue_thread(), but
skip flush_cpu_workqueue() ?
Oleg.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/