Re: [PATCH 00/10] workqueue: break affinity initiatively
From: Lai Jiangshan
Date: Tue Dec 15 2020 - 00:46:08 EST
On Tue, Dec 15, 2020 at 1:36 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Mon, Dec 14, 2020 at 11:54:47PM +0800, Lai Jiangshan wrote:
> > From: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> >
> > 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > said that scheduler will not force break affinity for us.
> >
> > But workqueue highly depends on the old behavior. Many parts of the codes
> > relies on it, 06249738a41a ("workqueue: Manually break affinity on hotplug")
> > is not enough to change it, and the commit has flaws in itself too.
> >
> > We need to thoroughly update the way workqueue handles affinity
> > in cpu hot[un]plug, what is this patchset intends to do and
> > replace the Valentin Schneider's patch [1].
>
> So the actual problem is with per-cpu kthreads, the new assumption is
> that hot-un-plug will make all per-cpu kthreads for the dying CPU go
> away.
Hello, Peter
"new assumption" is all needed to be aligned. I haven't read the code.
I thought I understood to some extent which is enough for me to know
that workqueue does violate that.
Workqueue does not break affinity for all per-cpu kthreads in several
cases such as hot-un-plug and workers detaching from pool (those workers
will not be searchable from pools and should be handled alike to hot-un-plug).
But workqueue has not only per-cpu kthreads but also per-node threads.
And per-node threads may be bound to multiple CPUs or may be bound to
a single CPU. I don't know how the scheduler distinguishes all these
different cases under the "new assumption". But at least workqueue
handle these different cases at the same few places. Since workqueue
have to "break affinity" for per-cpu kthreads, it can also "break affinity"
for other cases. Making workqueue totally do not rely on scheduler's
work to "break affinity" is worth doing since we have to do it for the
most parts.
I haven't read the code about "new assumption", if possible, I'll first
try to find out how will scheduler handle these cases:
If a per-node thread has only cpu 4, and when it goes down, does
workqueue need to "break affinity" for it?
If a per-node thread has only cpu 41,42, and when both go down, does
workqueue need to "break affinity" for it?
Thanks
Lai
>
> Workqueues violated that. I fixed the obvious site, and Valentin's patch
> avoids workqueues from quickly creating new ones while we're not
> looking.
>
> What other problems did you find?