Re: [PATCH v6 4/4] workqueue: Unbind kworkers before sending them to exit()
From: Lai Jiangshan
Date: Wed Nov 30 2022 - 22:02:05 EST
On Tue, Nov 29, 2022 at 2:31 AM Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
> @@ -3627,8 +3668,11 @@ static bool wq_manager_inactive(struct worker_pool *pool)
> static void put_unbound_pool(struct worker_pool *pool)
> {
> DECLARE_COMPLETION_ONSTACK(detach_completion);
> + struct list_head cull_list;
> struct worker *worker;
>
> + INIT_LIST_HEAD(&cull_list);
> +
> lockdep_assert_held(&wq_pool_mutex);
>
> if (--pool->refcnt)
> @@ -3651,17 +3695,19 @@ static void put_unbound_pool(struct worker_pool *pool)
> * Because of how wq_manager_inactive() works, we will hold the
> * spinlock after a successful wait.
> */
> + mutex_lock(&wq_pool_attach_mutex);
> rcuwait_wait_event(&manager_wait, wq_manager_inactive(pool),
> TASK_UNINTERRUPTIBLE);
> pool->flags |= POOL_MANAGER_ACTIVE;
Hello, Valentin
I'm afraid it might deadlock here.
If put_unbound_pool() is called while manage_workers() is sleeping
on allocating memory, put_unbound_pool() will get the wq_pool_attach_mutex
earlier than the manager which prevents the manager from getting the
lock to attach the newly created worker and deadlock.
I think mutex_lock(&wq_pool_attach_mutex) can be moved into
wq_manager_inactive(), and handle it in the same way as pool->lock.
>
> while ((worker = first_idle_worker(pool)))
> - destroy_worker(worker);
> + set_worker_dying(worker, &cull_list);
> WARN_ON(pool->nr_workers || pool->nr_idle);
> raw_spin_unlock_irq(&pool->lock);
>
> - mutex_lock(&wq_pool_attach_mutex);
> - if (!list_empty(&pool->workers))
> + wake_dying_workers(&cull_list);
> +
> + if (!list_empty(&pool->workers) || !list_empty(&pool->dying_workers))
> pool->detach_completion = &detach_completion;
> mutex_unlock(&wq_pool_attach_mutex);
>
> --
> 2.31.1
>