Re: [RFC PATCH] workqueue: handle CPU hotplug without updating worker pool's attrs

From: Lai Jiangshan
Date: Fri Dec 11 2020 - 03:38:36 EST


On Fri, Dec 11, 2020 at 3:21 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
>
> On Fri, 11 Dec 2020 10:29:55 +0800 Lai Jiangshan wrote:
> >On Wed, Dec 9, 2020 at 4:34 PM Hillf Danton <hdanton@xxxxxxxx> wrote:
> >>
> >> This makes the cpumask intact for worker pools of unbound workqueues
> >> when CPUs go offline because we need to rebind workers to the original
> >> cpumask(of the original pool) when they come back, as represented by
> >> the cpu check in restore_unbound_workers_cpumask().
> >>
> >> Note this is now only for comments.
> >>
> >> Signed-off-by: Hillf Danton <hdanton@xxxxxxxx>
> >> ---
> >>
> >> --- a/kernel/workqueue.c
> >> +++ b/kernel/workqueue.c
> >> @@ -4117,6 +4117,32 @@ static void wq_update_unbound_numa(struc
> >> copy_workqueue_attrs(target_attrs, wq->unbound_attrs);
> >> pwq = unbound_pwq_by_node(wq, node);
> >>
> >> + cpumask = pwq->pool->attrs->cpumask;
> >> + if (!online || !cpumask_test_cpu(cpu, cpumask))
> >> + return;
> >> + do {
> >> + struct worker *worker;
> >> + int i;
> >> +
> >> + for_each_cpu_and(i, cpu_online_mask, cpumask) {
> >> + if (i != cpu)
> >> + return;
> >> + }
> >> +
> >> + /*
> >> + * rebind workers only when the first CPU in
> >> + * pool's attrs cpumask comes back because scheduler
> >> + * may have unbound them when the last CPU went offline
> >> + */
> >> + mutex_lock(&wq_pool_attach_mutex);
> >> +
> >> + for_each_pool_worker(worker, pwq->pool)
> >> + set_cpus_allowed_ptr(worker->task, cpumask);
> >> +
> >
> >There might be multiple pwqs that share the same pool, this line of
> >code might update the same pool multiple times.
>
> Thanks for taking a look at it.
>
> It is the pool (shared between multiple pwqs or not) attrs cpumask, in
> combination with the above i != cpu check, that ensures single update.

When the pool is shared by multiple pwqs, it will be visited
multiple times when each pwq is visited, and your code touches
the pool multiple times.

wq_update_unbound_numa() updates the pwq table.
restore_unbound_workers_cpumask() updates the scheduler cpumasks.

>
> >
> >Please keep the logic in restore_unbound_workers_cpumask().
> >
> >> + mutex_unlock(&wq_pool_attach_mutex);
> >> + return;
> >> + } while (0);
> >> +
> >> /*
> >> * Let's determine what needs to be done. If the target cpumask is
> >> * different from the default pwq's, we need to compare it to @pwq's
> >> @@ -5004,34 +5030,6 @@ static void rebind_workers(struct worker
> >> raw_spin_unlock_irq(&pool->lock);
> >> }
> >>
> >> -/**
> >> - * restore_unbound_workers_cpumask - restore cpumask of unbound workers
> >> - * @pool: unbound pool of interest
> >> - * @cpu: the CPU which is coming up
> >> - *
> >> - * An unbound pool may end up with a cpumask which doesn't have any online
> >> - * CPUs. When a worker of such pool get scheduled, the scheduler resets
> >> - * its cpus_allowed. If @cpu is in @pool's cpumask which didn't have any
> >> - * online CPU before, cpus_allowed of all its workers should be restored.
> >> - */
> >> -static void restore_unbound_workers_cpumask(struct worker_pool *pool, int cpu)
> >> -{
> >> - static cpumask_t cpumask;
> >> - struct worker *worker;
> >> -
> >> - lockdep_assert_held(&wq_pool_attach_mutex);
> >> -
> >> - /* is @cpu allowed for @pool? */
> >> - if (!cpumask_test_cpu(cpu, pool->attrs->cpumask))
> >> - return;
> >> -
> >> - cpumask_and(&cpumask, pool->attrs->cpumask, cpu_online_mask);
> >
> >Good catch of the problem.
> >But please fix it where the problem is found (here!)
>
> The cause of the issue is the update of pool->attrs->cpumask in
> wq_update_unbound_numa() for offline CPUs, and it is too late to
> add changes only when they come back online.
>

pool->attrs->cpumask is read only, it is not updated.

This function just updates the pwq table, using(allocating when needed)
the pwq for the node or using the default pwq. It does not change the
attrs->cpumask.

And
+ if (!online || !cpumask_test_cpu(cpu, cpumask))
+ return;

is totally incorrect for !online. In both cases (online&offline)
we may need to update the pwq table.

> >Like this:
> >
> >+ /* only restore the cpumask of the pool's workers when @cpu is
> >+ the first online cpu in @pool's cpumask */
> >+ if (cpumask_weight(cpumask) > 1)
> >+ return;
> >
> > /* as we're called from CPU_ONLINE, the following shouldn't fail */
> > for_each_pool_worker(worker, pool)
> >- WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
> >+ WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask) < 0);
>
> Got it. And we have the bonus of cutting static cpumask_t cpumask; if
> fix goes in wq_update_unbound_numa(). And some tiny followup cleanup there.

I don't see any problem related to wq_update_unbound_numa() so far.
The only the problem I see is that the code is not matched the comments:

* An unbound pool may end up with a cpumask which doesn't have any online
* CPUs. When a worker of such pool get scheduled, the scheduler resets
* its cpus_allowed. If @cpu is in @pool's cpumask which didn't have any
* online CPU before, cpus_allowed of all its workers should be restored.

The cpumask for any given worker pool is intact no matter what serials of
CPU hot[un]plot events happen. We just need to RESTORE the scheduler cpumasks
to the pool's cpumasks when the pool's cpumask includes this onlining CPU
but didn't have any online CPU before.

> >
> >> -
> >> - /* as we're called from CPU_ONLINE, the following shouldn't fail */
> >> - for_each_pool_worker(worker, pool)
> >> - WARN_ON_ONCE(set_cpus_allowed_ptr(worker->task, &cpumask) < 0);
> >> -}
> >> -
> >> int workqueue_prepare_cpu(unsigned int cpu)
> >> {
> >> struct worker_pool *pool;
> >> @@ -5058,8 +5056,6 @@ int workqueue_online_cpu(unsigned int cp
> >>
> >> if (pool->cpu == cpu)
> >> rebind_workers(pool);
> >> - else if (pool->cpu < 0)
> >> - restore_unbound_workers_cpumask(pool, cpu);
> >>
> >> mutex_unlock(&wq_pool_attach_mutex);
> >> }
>