Re: [RFC PATCH] workqueue: Unbind workers before sending them to exit()
From: Lai Jiangshan
Date: Wed Jul 27 2022 - 01:39:08 EST
On Wed, Jul 27, 2022 at 4:36 AM Valentin Schneider <vschneid@xxxxxxxxxx> wrote:
>
> On 26/07/22 07:30, Tejun Heo wrote:
> > Hello,
> >
> > On Mon, Jul 25, 2022 at 11:21:37AM +0100, Valentin Schneider wrote:
> >> Hm so my choice of words in the changelog wasn't great - "initial setup"
> >> can be kernel init, but *also* setup of whatever workload is being deployed
> >> onto the system.
> >>
> >> So you can be having "normal" background activity (I've seen some IRQs end
> >> up with schedule_work() on isolated CPUs, they're not moved away at boot
> >> time but rather shortly before launching the latency-sensitive app), some
> >> preliminary stats collection / setup to make sure the CPU will be quiet
> >> (e.g. refresh_vm_stats()), and *then* the application starts with
> >> fresh-but-no-longer-required extra pcpu kworkers assigned to its CPU.
> >
> > Ah, I see. I guess we'll need to figure out how to unbind the workers then.
> >
>
> I've been playing with different ways to unbind & wake the workers in a
> sleepable context, but so far I haven't been happy with any of my
> experiments.
I'm writing code to handle the problems of cpu affinity and prematurely
waking up of newly created worker.
This work of unbinding the dying worker is also on the list.
I haven't figured out a good solution.
I was planning to add set_cpus_allowed_ptr_off_rq() which only set
cpumasks to the task only if it is sleeping and returns -EBUSY otherwise.
And it is ensured and documented as being usable in an atomic context
and it is recommended to be used for dying tasks only.
I can't really ensure it would be implemented as I'm expecting since
it touches scheduler code.
I'd better back off.
>
> What hasn't changed much between my attempts is transferring to-be-destroyed
> kworkers from their pool->idle_list to a reaper_list which is walked by
> *something* that does unbind+wakeup. AFAIA as long as the kworker is off
> the pool->idle_list we can play with it (i.e. unbind+wake) off the
> pool->lock.
>
> It's the *something* that's annoying to get right, I don't want it to be
> overly complicated given most users are probably not impacted by what I'm
> trying to fix, but I'm getting the feeling it should still be a per-pool
> kthread. I toyed with a single reaper kthread but a central synchronization
> for all the pools feels like a stupid overhead.
I think fixing it in the workqueue.c is complicated.
Nevertheless, I will also try to fix it inside workqueue only to see
what will come up.
>
> If any of that sounds ludicrous please shout, otherwise I'm going to keep
> tinkering :)
>
> > Thanks.
> >
> > --
> > tejun
>