Re: [PATCH] kthread_worker: re-set CPU affinities if CPU come online
From: Tejun Heo
Date: Mon Oct 26 2020 - 12:53:25 EST
Hello, Petr.
On Mon, Oct 26, 2020 at 05:45:55PM +0100, Petr Mladek wrote:
> > I don't think this works. The kthread may have changed its binding while
> > running using set_cpus_allowed_ptr() as you're doing above. Besides, when a
> > cpu goes offline, the bound kthread can fall back to other cpus but its cpu
> > mask isn't cleared, is it?
>
> If I get it correctly, select_fallback_rq() calls
> do_set_cpus_allowed() explicitly or in cpuset_cpus_allowed_fallback().
> It seems that the original mask gets lost.
Oh, I see.
> It would make sense to assume that kthread_worker API will take care of
> the affinity when it was set by kthread_create_worker_on_cpu().
I was for some reason thinking this was for all kthreads. Yeah, for
kthread_workers it does make sense.
> But is it safe to assume that the work can be safely proceed also
> on another CPU? We should probably add a warning into
> kthread_worker_fn() when it detects wrong CPU.
Per-cpu workqueues behave like that too. When the CPU goes down, per-cpu
workers on that CPU are unbound and may run anywhere. They get rebound when
CPU comes back up.
> BTW: kthread_create_worker_on_cpu() is currently used only by
> start_power_clamp_worker(). And it has its own CPU hotplug
> handling. The kthreads are stopped and started again
> in powerclamp_cpu_predown() and powerclamp_cpu_online().
And users which have hard dependency on CPU binding are expected to
implement hotplug events so that e.g. per-cpu work items are flushed when
CPU goes down and scheduled back when it comes back online.
There are pros and cons to the current workqueue behavior but it'd be a good
idea to keep kthread_worker's behavior in sync.
> I havn't checked all details yet. But in principle, the patch looks
> sane to me.
Yeah, agreed.
Thanks.
--
tejun