Re: [PATCH] workqueue: Ensure that cpumask set for pools created after boot

From: Tejun Heo
Date: Tue May 16 2017 - 11:55:42 EST


Hello, Michael.

On Mon, May 15, 2017 at 10:48:04AM -0500, Michael Bringmann wrote:
> >> --- a/kernel/workqueue.c
> >> +++ b/kernel/workqueue.c
> >> @@ -3366,6 +3366,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
> >> copy_workqueue_attrs(pool->attrs, attrs);
> >> pool->node = target_node;
> >>
> >> + cpumask_copy(pool->attrs->cpumask, cpumask_of(smp_processor_id()));
> >
> > What prevents a cpu getting added right here tho?
>
> PowerPC has only one control path to add/remove CPUs via DLPAR operations.
> Even so, the underlying code is protected through multiple locks.

The more I look at the patch, the less sense it seems to make. So,
whenever we create a new pool, we ignore the requested cpumask and
override it with the cpumask of the current thread?

> > Maybe the right thing to do is protecting the whole thing with hotplug
> > readlock?
>
> The operation is already within a hotplug readlock when performing DLPAR
> add/remove. Adding a CPU to the system, requires it to be brought online.
> Removing a CPU from the system, requires it to be taken offline. These
> involve calls to cpu_up / cpu_down, which go through _cpu_up / _cpu_down,
> which acquire the hotplug locks, among others along the path of execution.
>
> The locks are acquired before getting to the workqueue code, the pool
> creation/attachment code (which is where the cpu mask needs to be set),
> or trying to wakeup the initial created task in 'sched.c'.

A new unbound workqueue and thus unbound pool can also be created from
paths outside cpu hotplug, so get_unbound_pool() can race against
hotplug. Can you please explain the failures that you see in more
detail? I'm sure your patch works around the issue somehow but it
doesn't look like the right fix.

Thanks.

--
tejun