Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock

From: Peter Zijlstra
Date: Thu Sep 07 2017 - 13:45:25 EST


On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote:
> Remove circular dependency deadlock in a scenario where hotplug of CPU is
> being done while there is updation in cgroup and cpuset triggered from
> userspace.
>
> Process A => kthreadd => Process B => Process C => Process A

> Process A
> cpu_subsys_offline();
> cpu_down();
> _cpu_down();
> percpu_down_write(&cpu_hotplug_lock); //held
> cpuhp_invoke_callback();
> workqueue_offline_cpu();
> wq_update_unbound_numa();
> kthread_create_on_node();
> wake_up_process(); //wakeup kthreadd

TJ, I'm puzzled, why would we need to spawn new threads to update NUMA
affinity when taking a CPU out? That doesn't make sense to me, we can
either shrink the affinity of an existing thread or completely kill of a
thread if the mask becomes empty. But why spawn a new thread?

> flush_work();
> wait_for_completion();
>
> kthreadd
> kthreadd();
> kernel_thread();
> do_fork();
> copy_process();
> percpu_down_read(&cgroup_threadgroup_rwsem);
> __rwsem_down_read_failed_common(); //waiting

So this will eventually do our:

complete() to make A go.


> Process B
> kernfs_fop_write();
> cgroup_file_write();
> cgroup_procs_write();
> percpu_down_write(&cgroup_threadgroup_rwsem); //held
> cgroup_attach_task();
> cgroup_migrate();
> cgroup_migrate_execute();
> cpuset_can_attach();
> mutex_lock(&cpuset_mutex); //waiting
>
> Process C
> kernfs_fop_write();
> cgroup_file_write();
> cpuset_write_resmask();
> mutex_lock(&cpuset_mutex); //held
> update_cpumask();
> update_cpumasks_hier();
> rebuild_sched_domains_locked();
> get_online_cpus();
> percpu_down_read(&cpu_hotplug_lock); //waiting


So the whole thing looks like:


A B C D

L(hotplug)
L(threadgroup)
L(cpuset)
L(threadgroup)
WFC(c)
L(cpuset)
L(hotplug)
C(c)

Yes, inverting cpuset and hotplug would break that chain, but I'm still
wondering why workqueue needs to spawn threads on CPU down.