Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock

From: Michal Koutný
Date: Thu Sep 26 2024 - 08:53:58 EST


Hello Hillf.

(sorry for later reply)

On Wed, Sep 11, 2024 at 07:15:42PM GMT, Hillf Danton <hdanton@xxxxxxxx> wrote:
> > However, there is no ordering between (I) and (II) so they can also happen
> > in opposite
> >
> > thread T system_wq worker
> >
> > down(cpu_hotplug_lock.read)
> > smp_call_on_cpu
> > queue_work_on(cpu, system_wq, scss) (I)
> > lock(cgroup_mutex) (II)
> > ...
> > unlock(cgroup_mutex)
> > scss.func
> > wait_for_completion(scss)
> > up(cpu_hotplug_lock.read)
> >
> > And here the thread T + system_wq worker effectively call
> > cpu_hotplug_lock and cgroup_mutex in the wrong order. (And since they're
> > two threads, it won't be caught by lockdep.)
> >
> Given no workqueue work executed without being dequeued, any queued work,
> regardless if they are more than 2048, that acquires cgroup_mutex could not
> prevent the work queued by thread-T from being executed, so thread-T can
> make safe forward progress, therefore with no chance left for the ABBA
> deadlock you spotted where lockdep fails to work.

Is there a forgotten negation and did you intend to write: "any queued
work ... that acquired cgroup_mutex could prevent"?

Or if the negation is correct, why do you mean that processed work item
is _not_ preventing thread T from running (in the case I left quoted
above)?

Thanks,
Michal

Attachment: signature.asc
Description: PGP signature