Re: [PATCH -next] cgroup: Fix AA deadlock caused by cgroup_bpf_release

From: Roman Gushchin
Date: Sun Jun 09 2024 - 22:48:24 EST


Hi Chen!

Was this problem found in the real life? Do you have a LOCKDEP splash available?

> On Jun 7, 2024, at 4:09 AM, Chen Ridong <chenridong@xxxxxxxxxx> wrote:
>
> We found an AA deadlock problem as shown belowed:
>
> cgroup_destroy_wq TaskB WatchDog system_wq
>
> ...
> css_killed_work_fn:
> P(cgroup_mutex)
> ...
> ...
> __lockup_detector_reconfigure:
> P(cpu_hotplug_lock.read)
> ...
> ...
> percpu_down_write:
> P(cpu_hotplug_lock.write)
> ...
> cgroup_bpf_release:
> P(cgroup_mutex)
> smp_call_on_cpu:
> Wait system_wq
>
> cpuset_css_offline:
> P(cpu_hotplug_lock.read)
>
> WatchDog is waiting for system_wq, who is waiting for cgroup_mutex, to
> finish the jobs, but the owner of the cgroup_mutex is waiting for
> cpu_hotplug_lock. This problem caused by commit 4bfc0bb2c60e ("bpf:
> decouple the lifetime of cgroup_bpf from cgroup itself")
> puts cgroup_bpf release work into system_wq. As cgroup_bpf is a member of
> cgroup, it is reasonable to put cgroup bpf release work into
> cgroup_destroy_wq, which is only used for cgroup's release work, and the
> preblem is solved.

I need to think more on this, but at first glance the fix looks a bit confusing. cgroup_bpf_release() looks quite innocent, it only takes a cgroup_mutex. It’s not obvious why it’s not ok and requires a dedicated work queue. What exactly is achieved by placing it back on the dedicated cgroup destroy queue?

I’m not trying to say your fix won’t work, but it looks like it might cover a more serious problem.