Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks

From: Michal Koutný
Date: Thu Jul 24 2025 - 09:36:49 EST


Hi Ridong.

On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@xxxxxxxxxxxxxxx> wrote:
> CPU0 CPU1
> mount perf_event umount net_prio
> cgroup1_get_tree cgroup_kill_sb
> rebind_subsystems // root destruction enqueues
> // cgroup_destroy_wq
> // kill all perf_event css
> // one perf_event css A is dying
> // css A offline enqueues cgroup_destroy_wq
> // root destruction will be executed first
> css_free_rwork_fn
> cgroup_destroy_root
> cgroup_lock_and_drain_offline
> // some perf descendants are dying
> // cgroup_destroy_wq max_active = 1
> // waiting for css A to die
>
> Problem scenario:
> 1. CPU0 mounts perf_event (rebind_subsystems)
> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
> 3. A dying perf_event CSS gets queued for offline after root destruction
> 4. Root destruction waits for offline completion, but offline work is
> blocked behind root destruction in cgroup_destroy_wq (max_active=1)

What's concerning me is why umount of net_prio hierarhy waits for
draining of the default hierachy? (Where you then run into conflict with
perf_event that's implicit_on_dfl.)

IOW why not this:
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -1346,7 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)

trace_cgroup_destroy_root(root);

- cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
+ cgroup_lock_and_drain_offline(cgrp);

BUG_ON(atomic_read(&root->nr_cgrps));
BUG_ON(!list_empty(&cgrp->self.children));

Does this correct the LTP scenario?

Thanks,
Michal

Attachment: signature.asc
Description: PGP signature