Re: [PATCH] sched/schedutil: Fix deadlock between cpuset and cpu hotplug when using schedutil

From: Qais Yousef
Date: Tue Jul 12 2022 - 13:14:16 EST

On 07/12/22 06:13, Tejun Heo wrote:
> On Tue, Jul 12, 2022 at 01:57:02PM +0100, Qais Yousef wrote:
> > Is there a lot of subsystems beside cpuset that needs the cpus_read_lock()?
> > A quick grep tells me it's the only one.
> >
> > Can't we instead use cpus_read_trylock() in cpuset_can_attach() so that we
> > either hold the lock successfully then before we go ahead and call
> > cpuset_attach(), or bail out and cancel the whole attach operation which should
> > unlock the threadgroup_rwsem() lock?
> But now we're failing user-initiated operations randomly. I have a hard time

True. That might appear more random than necessary. It looked neat and
I thought since hotplug operations aren't that common and users must be
prepared for failures for other reasons, it might be okay.

> seeing that as an acceptable solution. The only thing we can do, I think, is
> establishing a locking order between the two locks by either nesting

That might be enough if no other paths can exist which would hold them in
reverse order again. It would be more robust to either hold them both or wait
until we can. Then potential ordering problems can't happen again, because of
this path at least.

> threadgroup_rwsem under cpus_read_lock or disallowing thread creation during
> hotplug operations.

I think that's what Xuewen tried to do in the proposed patch. But it fixes it
for a specific user. If we go with that we'll need nuts and bolts to help warn
when other users do that.


Qais Yousef