[PATCH] sched/fair: adjust the depth of a sched_entity when its parent changes

From: Shayan Pooya
Date: Tue Sep 15 2015 - 00:25:09 EST


Fixes commit fed14d45f945 ("sched/fair: Track cgroup depth")
Hit this kernel panic mentioned in https://lkml.org/lkml/2014/2/15/217
when running docker with kernel 3.16.

The issue has been reported other places including:

https://github.com/docker/docker/issues/13940
https://gist.github.com/burke/c60dc5b8f0ba9bfd9275

The latter also has an analysis and a similar patch (which was never
submitted to lkml).

Looking into the panic (RIP: check_preempt_wakeup+255) and the code:
<check_preempt_wakeup+248>: mov 0x148(%rbx),%rbx
<check_preempt_wakeup+255>: mov 0x150(%r12),%rdi
<check_preempt_wakeup+263>: cmp 0x150(%rbx),%rdi

And:
crash> p &((struct sched_entity *)0)->cfs_rq
$10 = (struct cfs_rq **) 0x150

Which suggests the inlined function find_matching_se and the while loop
in it. Looking into the task that was about to get scheduled in the
check_preempt_wakeup function:

crash> p ((struct task_struct *) 0xffff8808506fd180)->se.depth
$2 = 1
crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent
$3 = (struct sched_entity *) 0xffff8808533c0c00
crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent->depth
$4 = 1

Which is incorrect and the root-cause of the panic.
The modified code is the only place that the depth was not adjusted after
potentially modifying the parent.

Signed-off-by: Shayan Pooya <shayan@xxxxxxxxxx>
---
kernel/sched/fair.c | 1 -
kernel/sched/sched.h | 1 +
2 files changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 6e2e348..ced5534 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8035,7 +8035,6 @@ static void task_move_group_fair(struct
task_struct *p, int queued)
if (!queued)
se->vruntime -= cfs_rq_of(se)->min_vruntime;
set_task_rq(p, task_cpu(p));
- se->depth = se->parent ? se->parent->depth + 1 : 0;
if (!queued) {
cfs_rq = cfs_rq_of(se);
se->vruntime += cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 68cda11..507d30f 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -931,6 +931,7 @@ static inline void set_task_rq(struct task_struct
*p, unsigned int cpu)
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = tg->cfs_rq[cpu];
p->se.parent = tg->se[cpu];
+ p->se.depth = p->se.parent ? p->se.parent->depth + 1 : 0;
#endif

#ifdef CONFIG_RT_GROUP_SCHED
--
2.1.0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/