Re: [PATCH] workqueue: Fix pool->nr_running type back to atomic

From: Tejun Heo
Date: Tue Feb 06 2024 - 11:52:51 EST


Hello,

On Tue, Feb 06, 2024 at 04:00:24PM +0800, Yunlong Xing wrote:
> In CPU-hotplug test, when plug the core, set_cpus_allowed_ptr() restoring
> the cpus_mask of the per-cpu worker may fail, the cpus_mask of the worker
> remain wq_unbound_cpumask until the core hotpluged next time. so, workers
> in the same per-cpu pool can run concurrently and change nr_running at the
> same time, atomic problem occur.

How would set_cpus_allowed_ptr() fail? That should trigger WARN_ON, right?
If set_cpus_allowed_ptr() fails, nr_running getting desynchronized is only a
part of the problem. We will end up running per-cpu work items which must
execute on the same CPU on foreign CPUs.

Thanks.

--
tejun