Re: [PATCH] cgroup/cpuset: remove circular dependency deadlock

From: Prateek Sood
Date: Thu Sep 07 2017 - 22:13:38 EST


On 09/07/2017 11:15 PM, Peter Zijlstra wrote:
> On Thu, Sep 07, 2017 at 07:26:23PM +0530, Prateek Sood wrote:
>> Remove circular dependency deadlock in a scenario where hotplug of CPU is
>> being done while there is updation in cgroup and cpuset triggered from
>> userspace.
>>
>> Process A => kthreadd => Process B => Process C => Process A
>
>> Process A
>> cpu_subsys_offline();
>> cpu_down();
>> _cpu_down();
>> percpu_down_write(&cpu_hotplug_lock); //held
>> cpuhp_invoke_callback();
>> workqueue_offline_cpu();
>> wq_update_unbound_numa();
>> kthread_create_on_node();
>> wake_up_process(); //wakeup kthreadd
>
> TJ, I'm puzzled, why would we need to spawn new threads to update NUMA
> affinity when taking a CPU out? That doesn't make sense to me, we can
> either shrink the affinity of an existing thread or completely kill of a
> thread if the mask becomes empty. But why spawn a new thread?
>
>> flush_work();
>> wait_for_completion();
>>

>
> Yes, inverting cpuset and hotplug would break that chain, but I'm still
> wondering why workqueue needs to spawn threads on CPU down.
>

Thanks for the comments Peter

You rightly mentioned that a new thread will not be spawn
while updating NUMA affinity when taking a CPU out.

While a CPU is made offline, attempt is made to unbind per-cpu
worker for CPU going down. This is done by queuing unbind work
to system_highpri_wq. It results in an attempt to create one
bounded worker thread as there is none.

wait_for_completion() in flush_work() waits for unbinding to
finish for CPU going down.

Process A
cpu_subsys_offline();
cpu_down();
_cpu_down();
percpu_down_write(&cpu_hotplug_lock); //held
cpuhp_invoke_callback();
workqueue_offline_cpu();
queue_work_on(system_highpri_wq);
__queue_work();
insert_work();
wake_up_worker(); //pool->nr_running = 0
flush_work();
wait_for_completion();


worker_thread();
need_more_worker(); // returns true
manage_workers();
maybe_create_worker();
create_worker();
kthread_create_on_node();
wake_up_process(kthreadd_task);



--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project