Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()

From: Tejun Heo
Date: Wed Jul 20 2022 - 14:04:06 EST


On Tue, Jul 19, 2022 at 05:57:43PM +0100, Valentin Schneider wrote:
> It has been reported that isolated CPUs can suffer from interference due to
> per-CPU kworkers waking up just to die.
>
> A surge of workqueue activity (sleeping workfn's exacerbate this) during
> initial setup can cause extra per-CPU kworkers to be spawned. Then, a
> latency-sensitive task can be running merrily on an isolated CPU only to be
> interrupted sometime later by a kworker marked for death (cf.
> IDLE_WORKER_TIMEOUT, 5 minutes after last kworker activity).
>
> Affine kworkers to the wq_unbound_cpumask (which doesn't contain isolated
> CPUs, cf. HK_TYPE_WQ) before waking them up after marking them with
> WORKER_DIE.
>
> This follows the logic of CPU hot-unplug, which has been packaged into
> helpers for the occasion.

Idea-wise, seems fine to me, but we have some other issues around twiddling
cpu affinities right now, so let's wait a bit till Lai chimes in.

Thanks.

--
tejun