Re: [PATCH] cgroup/cpuset: fix circular locking dependency

From: Prateek Sood
Date: Thu Dec 28 2017 - 15:37:38 EST


On 12/13/2017 09:36 PM, Tejun Heo wrote:
> Hello, Prateek.
>
> On Wed, Dec 13, 2017 at 01:20:46PM +0530, Prateek Sood wrote:
>> This change makes the usage of cpuset_hotplug_workfn() from cpu
>> hotplug path synchronous. For memory hotplug it still remains
>> asynchronous.
>
> Ah, right.
>
>> Memory migration happening from cpuset_hotplug_workfn() is
>> already asynchronous by queuing cpuset_migrate_mm_workfn() in
>> cpuset_migrate_mm_wq.
>>
>> cpuset_hotplug_workfn()
>> cpuset_hotplug_workfn(()
>> cpuset_migrate_mm()
>> queue_work(cpuset_migrate_mm_wq)
>>
>> It seems that memory migration latency might not have
>> impact with this change.
>>
>> Please let me know if you meant something else by cpuset
>> migration taking time when memory migration is turned on.
>
> No, I didn't. I was just confused about which part became
> synchronous. So, I don't have anything against making the cpu part
> synchronous, but let's not do that as the fix to the deadlocks cuz,
> while we can avoid them by changing cpuset, I don't think cpuset is
> the root cause for them. If there are benefits to making cpuset cpu
> migration synchronous, let's do that for those benefits.
>
> Thanks.
>

TJ,

One more deadlock scenario

Task: sh
[<ffffff874f917290>] wait_for_completion+0x14
[<ffffff874e8a82e0>] cpuhp_kick_ap_work+0x80 //waiting for cpuhp/2
[<ffffff874f913780>] _cpu_down+0xe0
[<ffffff874e8a9934>] cpu_down+0x38
[<ffffff874ef6e4ec>] cpu_subsys_offline+0x10

Task: cpuhp/2
[<ffffff874f91645c>] schedule+0x38
[<ffffff874e92c76c>] _synchronize_rcu_expedited+0x2ec
[<ffffff874e92c874>] synchronize_sched_expedited+0x60
[<ffffff874e92c9f8>] synchronize_sched+0xb0
[<ffffff874e9104e4>] sugov_stop+0x58
[<ffffff874f36967c>] cpufreq_stop_governor+0x48
[<ffffff874f36a89c>] cpufreq_offline+0x84
[<ffffff874f36aa30>] cpuhp_cpufreq_offline+0xc
[<ffffff874e8a797c>] cpuhp_invoke_callback+0xac
[<ffffff874e8a89b4>] cpuhp_down_callbacks+0x58
[<ffffff874e8a95e8>] cpuhp_thread_fun+0xa8

_synchronize_rcu_expedited is waiting for execution of rcu
expedited grace period work item wait_rcu_exp_gp()

Task: kworker/2:1
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f916870>] schedule_preempt_disabled+0x20
[<ffffff874f918df8>] __mutex_lock_slowpath+0x158
[<ffffff874f919004>] mutex_lock+0x14
[<ffffff874e8a77b0>] get_online_cpus+0x34 //waiting for cpu_hotplug_lock
[<ffffff874e96452c>] rebuild_sched_domains+0x30
[<ffffff874e964648>] cpuset_hotplug_workfn+0xb8
[<ffffff874e8c27b8>] process_one_work+0x168
[<ffffff874e8c2ff4>] worker_thread+0x140
[<ffffff874e8c95b8>] kthread+0xe0

cpu_hotplug_lock is acquired by task: sh


Task: kworker/2:3
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f91a384>] schedule_timeout+0x1d8
[<ffffff874f9171d4>] wait_for_common+0xb4
[<ffffff874f917304>] wait_for_completion_killable+0x14 //waiting for kthreadd
[<ffffff874e8c931c>] __kthread_create_on_node+0xec
[<ffffff874e8c9448>] kthread_create_on_node+0x64
[<ffffff874e8c2d88>] create_worker+0xb4
[<ffffff874e8c3194>] worker_thread+0x2e0
[<ffffff874e8c95b8>] kthread+0xe0


Task: kthreadd
[<ffffff874e8858ec>] __switch_to+0x94
[<ffffff874f915ed8>] __schedule+0x2a8
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f91a0ec>] rwsem_down_read_failed+0xe8
[<ffffff874e913dc4>] __percpu_down_read+0xfc
[<ffffff874e8a4ab0>] copy_process.isra.72.part.73+0xf60
[<ffffff874e8a53b8>] _do_fork+0xc4
[<ffffff874e8a5720>] kernel_thread+0x34
[<ffffff874e8ca83c>] kthreadd+0x144

kthreadd is waiting for cgroup_threadgroup_rwsem acquired
by task T

Task: T
[<ffffff874f91645c>] schedule+0x38
[<ffffff874f916870>] schedule_preempt_disabled+0x20
[<ffffff874f918df8>] __mutex_lock_slowpath+0x158
[<ffffff874f919004>] mutex_lock+0x14
[<ffffff874e962ff4>] cpuset_can_attach+0x58
[<ffffff874e95d640>] cgroup_taskset_migrate+0x8c
[<ffffff874e95d9b4>] cgroup_migrate+0xa4
[<ffffff874e95daf0>] cgroup_attach_task+0x100
[<ffffff874e95df28>] __cgroup_procs_write.isra.35+0x228
[<ffffff874e95e00c>] cgroup_tasks_write+0x10
[<ffffff874e958294>] cgroup_file_write+0x44
[<ffffff874eaa4384>] kernfs_fop_write+0xc0

task T is waiting for cpuset_mutex acquired
by kworker/2:1

sh ==> cpuhp/2 ==> kworker/2:1 ==> sh

kworker/2:3 ==> kthreadd ==> Task T ==> kworker/2:1

It seems that my earlier patch set should fix this scenario:
1) Inverting locking order of cpuset_mutex and cpu_hotplug_lock.
2) Make cpuset hotplug work synchronous.


Could you please share your feedback.


Thanks

--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project