Re: [PATCH] cgroup/cpuset: fix circular locking dependency
From: Prateek Sood
Date: Fri Dec 08 2017 - 06:46:15 EST
On 12/08/2017 03:10 PM, Prateek Sood wrote:
> On 12/05/2017 04:31 AM, Peter Zijlstra wrote:
>> On Mon, Dec 04, 2017 at 02:58:25PM -0800, Tejun Heo wrote:
>>> Hello, again.
>>>
>>> On Mon, Dec 04, 2017 at 12:22:19PM -0800, Tejun Heo wrote:
>>>> Hello,
>>>>
>>>> On Mon, Dec 04, 2017 at 10:44:49AM +0530, Prateek Sood wrote:
>>>>> Any feedback/suggestion for this patch?
>>>>
>>>> Sorry about the delay. I'm a bit worried because it feels like we're
>>>> chasing a squirrel. I'll think through the recent changes and this
>>>> one and get back to you.
>>>
>>> Can you please take a look at the following pending commit?
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git/commit/?h=for-4.15-fixes&id=e8b3f8db7aad99fcc5234fc5b89984ff6620de3d
>>>
>>> AFAICS, this should remove the circular dependency you originally
>>> reported. I'll revert the two cpuset commits for now.
>>
>> So I liked his patches in that we would be able to go back to
>> synchronous sched_domain building.
>>
> https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git/commit/?h=for-4.15-fixes&id=e8b3f8db7aad99fcc5234fc5b89984ff6620de3d
>
> This will fix the original circular locking dependency issue.
> I will let you both (Peter & TJ) to decide on which one to
> pick.
>
>
>
TJ & Peter,
There is one deadlock issue during cgroup migration from cpu
hotplug path when a task T is being moved from source to
destination cgroup.
kworker/0:0
cpuset_hotplug_workfn()
cpuset_hotplug_update_tasks()
hotplug_update_tasks_legacy()
remove_tasks_in_empty_cpuset()
cgroup_transfer_tasks() // stuck in iterator loop
cgroup_migrate()
cgroup_migrate_add_task()
In cgroup_migrate_add_task() it checks for PF_EXITING flag of task T.
Task T will not migrate to destination cgroup. css_task_iter_start()
will keep pointing to task T in loop waiting for task T cg_list node
to be removed.
Task T
do_exit()
exit_signals() // sets PF_EXITING
exit_task_namespaces()
switch_task_namespaces()
free_nsproxy()
put_mnt_ns()
drop_collected_mounts()
namespace_unlock()
synchronize_rcu()
_synchronize_rcu_expedited()
schedule_work() // on cpu0 low priority worker pool
wait_event() // waiting for work item to execute
Task T inserted a work item in the worklist of cpu0 low priority
worker pool. It is waiting for expedited grace period work item
to execute. This work item will only be executed once kworker/0:0
complete execution of cpuset_hotplug_workfn().
kworker/0:0 ==> Task T ==>kworker/0:0
Following suggested patch might not be able to fix the above
mentioned case:
https://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git/commit/?h=for-4.15-fixes&id=e8b3f8db7aad99fcc5234fc5b89984ff6620de3d
Combination of following patches fixes above mentioned scenario
as well:
1) Inverting cpuset_mutex and cpu_hotplug_lock locking sequence
2) Making cpuset hotplug work synchronous
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project