Re: [PATCH v3 1/1] cgroup: fix deadlock caused by cgroup_mutex and cpu_hotplug_lock
From: Tetsuo Handa
Date: Sat Sep 28 2024 - 04:11:19 EST
On 2024/09/11 20:15, Hillf Danton wrote:
> On Mon, 9 Sep 2024 16:19:38 +0200 Michal Koutny <mkoutny@xxxxxxxx>
>> On Sat, Aug 17, 2024 at 09:33:34AM GMT, Chen Ridong <chenridong@xxxxxxxxxx> wrote:
>>> The reason for this issue is cgroup_mutex and cpu_hotplug_lock are
>>> acquired in different tasks, which may lead to deadlock.
>>> It can lead to a deadlock through the following steps:
>>> 1. A large number of cpusets are deleted asynchronously, which puts a
>>> large number of cgroup_bpf_release works into system_wq. The max_active
>>> of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are
>>> cgroup_bpf_release works, and many cgroup_bpf_release works will be put
>>> into inactive queue. As illustrated in the diagram, there are 256 (in
>>> the acvtive queue) + n (in the inactive queue) works.
> Given no workqueue work executed without being dequeued, any queued work,
> regardless if they are more than 2048, that acquires cgroup_mutex could not
> prevent the work queued by thread-T from being executed, so thread-T can
> make safe forward progress, therefore with no chance left for the ABBA
> deadlock you spotted where lockdep fails to work.
I made a simple test which queues many work items into system_wq and
measures time needed for flushing last work item.
As number of work items increased, time needed also increased.
Although nobody uses flush_workqueue() on system_wq, several users
use flush_work() on work item in system_wq. Therefore, I think that
queuing thousands of work items in system_wq should be avoided,
regardless of whether there is possibility of deadlock.
----------------------------------------
#include <linux/module.h>
#include <linux/workqueue.h>
static void worker_func(struct work_struct *work)
{
schedule_timeout_uninterruptible(HZ);
}
#define MAX_WORKS 8192
static struct work_struct works[MAX_WORKS];
static int __init test_init(void)
{
int i;
unsigned long start, end;
for (i = 0; i < MAX_WORKS; i++) {
INIT_WORK(&works[i], worker_func);
schedule_work(&works[i]);
}
start = jiffies;
flush_work(&works[MAX_WORKS - 1]);
end = jiffies;
printk("%u: Took %lu jiffies. (HZ=%u)\n", MAX_WORKS, end - start, HZ);
for (i = 0; i < MAX_WORKS; i++)
flush_work(&works[i]);
return -EINVAL;
}
module_init(test_init);
MODULE_LICENSE("GPL");
----------------------------------------
12 CPUs
256: Took 1025 jiffies. (HZ=1000)
512: Took 2091 jiffies. (HZ=1000)
1024: Took 4105 jiffies. (HZ=1000)
2048: Took 8321 jiffies. (HZ=1000)
4096: Took 16382 jiffies. (HZ=1000)
8192: Took 32770 jiffies. (HZ=1000)
1 CPU
256: Took 1133 jiffies. (HZ=1000)
512: Took 2047 jiffies. (HZ=1000)
1024: Took 4117 jiffies. (HZ=1000)
2048: Took 8210 jiffies. (HZ=1000)
4096: Took 16424 jiffies. (HZ=1000)
8192: Took 32774 jiffies. (HZ=1000)