Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()

From: Valentin Schneider
Date: Thu Jul 21 2022 - 09:54:00 EST


On 21/07/22 11:35, Lai Jiangshan wrote:
>> @@ -1999,6 +2011,16 @@ static void destroy_worker(struct worker *worker)
>>
>> list_del_init(&worker->entry);
>> worker->flags |= WORKER_DIE;
>> +
>> + /*
>> + * We're sending that thread off to die, so any CPU would do. This is
>> + * especially relevant for pcpu kworkers affined to an isolated CPU:
>> + * we'd rather not interrupt an isolated CPU just for a kworker to
>> + * do_exit().
>> + */
>> + if (!(worker->flags & WORKER_UNBOUND))
>> + unbind_worker(worker);
>> +
>> wake_up_process(worker->task);
>> }
>
> destroy_worker() is called with raw_spin_lock_irq(pool->lock), so
> it cannot call the sleepable set_cpus_allowed_ptr().
>
> From __set_cpus_allowed_ptr:
>> * NOTE: the caller must have a valid reference to the task, the
>> * task must not exit() & deallocate itself prematurely. The
>> * call is not atomic; no spinlocks may be held.
>

Heh, I somehow forgot that this is blocking. Now in this particular case I
think pcpu kworkers are "safe" - they shouldn't be running when
destroy_worker() is invoked on them (though AFAICT that is not a "hard"
guarantee), and it doesn't make any sense for them to use
migrate_disable(). Still, yeah, not ideal.

> I think it needs something like task_set_cpumask_possible() which is
> documented as being usable in (raw) spinlocks and set the task's cpumask
> to cpu_possible_mask and let the later ttwu help migrate it to a
> proper non-isolated CPU or let it keep running.
>

I'll see what I can come up with, thanks for the suggestion.