Re: [PATCH 02/10 V4] workqueue: fix deadlock in rebind_workers()

From: Tejun Heo
Date: Tue Sep 04 2012 - 20:54:07 EST


How about something like the following? This is more consistent with
the existing code and as the fixes need to go separately through
for-3.6-fixes, it's best to stay consistent regardless of the end
result after all the restructuring. It's not tested yet. If you
don't object, I'll split it into two patches, test and route them
through for-3.6-fixes w/ your Original-patch-by.

Thanks.
---
kernel/workqueue.c | 51 ++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 38 insertions(+), 13 deletions(-)

Index: work/kernel/workqueue.c
===================================================================
--- work.orig/kernel/workqueue.c
+++ work/kernel/workqueue.c
@@ -1326,6 +1326,15 @@ static void idle_worker_rebind(struct wo

/* we did our part, wait for rebind_workers() to finish up */
wait_event(gcwq->rebind_hold, !(worker->flags & WORKER_REBIND));
+
+ /*
+ * rebind_workers() shouldn't finish until all workers passed the
+ * above WORKER_REBIND wait. Tell it when done.
+ */
+ spin_lock_irq(&worker->pool->gcwq->lock);
+ if (!--worker->idle_rebind->cnt)
+ complete(&worker->idle_rebind->done);
+ spin_unlock_irq(&worker->pool->gcwq->lock);
}

/*
@@ -1422,19 +1431,7 @@ retry:
goto retry;
}

- /*
- * All idle workers are rebound and waiting for %WORKER_REBIND to
- * be cleared inside idle_worker_rebind(). Clear and release.
- * Clearing %WORKER_REBIND from this foreign context is safe
- * because these workers are still guaranteed to be idle.
- */
- for_each_worker_pool(pool, gcwq)
- list_for_each_entry(worker, &pool->idle_list, entry)
- worker->flags &= ~WORKER_REBIND;
-
- wake_up_all(&gcwq->rebind_hold);
-
- /* rebind busy workers */
+ /* all idle workers are rebound, rebind busy workers */
for_each_busy_worker(worker, i, pos, gcwq) {
struct work_struct *rebind_work = &worker->rebind_work;
unsigned long worker_flags = worker->flags;
@@ -1454,6 +1451,34 @@ retry:
worker->scheduled.next,
work_color_to_flags(WORK_NO_COLOR));
}
+
+ /*
+ * All idle workers are rebound and waiting for %WORKER_REBIND to
+ * be cleared inside idle_worker_rebind(). Clear and release.
+ * Clearing %WORKER_REBIND from this foreign context is safe
+ * because these workers are still guaranteed to be idle.
+ *
+ * We need to make sure all idle workers passed WORKER_REBIND wait
+ * in idle_worker_rebind() before returning; otherwise, workers can
+ * get stuck at the wait if hotplug cycle repeats.
+ */
+ idle_rebind.cnt = 1;
+ INIT_COMPLETION(idle_rebind.done);
+
+ for_each_worker_pool(pool, gcwq) {
+ list_for_each_entry(worker, &pool->idle_list, entry) {
+ worker->flags &= ~WORKER_REBIND;
+ idle_rebind.cnt++;
+ }
+ }
+
+ wake_up_all(&gcwq->rebind_hold);
+
+ if (--idle_rebind.cnt) {
+ spin_unlock_irq(&gcwq->lock);
+ wait_for_completion(&idle_rebind.done);
+ spin_lock_irq(&gcwq->lock);
+ }
}

static struct worker *alloc_worker(void)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/