Re: [PATCH] sched/fair: adjust the depth of a sched_entity when its parent changes
From: Peter Zijlstra
Date: Tue Sep 15 2015 - 05:20:16 EST
On Mon, Sep 14, 2015 at 10:32:42PM -0700, Shayan Pooya wrote:
> Fixes commit fed14d45f945 ("sched/fair: Track cgroup depth")
> Hit this kernel panic mentioned in https://lkml.org/lkml/2014/2/15/217
> when running docker with kernel 3.16.
v3.16 includes the fix from that thread (and I had to look in my own
archives, because lkml.org fancies showing blank pages today :/).
> The issue has been reported other places including:
>
> https://github.com/docker/docker/issues/13940
> https://gist.github.com/burke/c60dc5b8f0ba9bfd9275
>
> The latter also has an analysis and a similar patch (which was never
> submitted to lkml).
Pretty good write up that, sad you did not Cc the guy.
I got defeated by the github web shite (again!) and could not locate an
email address for him :( Ah.. Google to the rescue!
> Which suggests the inlined function find_matching_se and the while loop
> in it. Looking into the task that was about to get scheduled in the
> check_preempt_wakeup function:
>
> crash> p ((struct task_struct *) 0xffff8808506fd180)->se.depth
> $2 = 1
> crash> p ((struct task_struct *) 0xffff8808506fd180)->se.parent->depth
> $4 = 1
Yep, buggered.
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6e2e348..ced5534 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -8035,7 +8035,6 @@ static void task_move_group_fair(struct
> task_struct *p, int queued)
> if (!queued)
> se->vruntime -= cfs_rq_of(se)->min_vruntime;
> set_task_rq(p, task_cpu(p));
> - se->depth = se->parent ? se->parent->depth + 1 : 0;
> if (!queued) {
> cfs_rq = cfs_rq_of(se);
> se->vruntime += cfs_rq->min_vruntime;
So at this point I'm left wondering about that depth update we have in
switched_to_fair().
Which leads me to suggest the following (note that some of this code has
_just_ changed a lot).
Does that work for you? (not been near a compiler).
---
kernel/sched/fair.c | 10 +---------
kernel/sched/sched.h | 1 +
2 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9176f7c588a8..fc3ef8fb6891 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8000,13 +8000,7 @@ static void attach_task_cfs_rq(struct task_struct *p)
struct sched_entity *se = &p->se;
struct cfs_rq *cfs_rq = cfs_rq_of(se);
-#ifdef CONFIG_FAIR_GROUP_SCHED
- /*
- * Since the real-depth could have been changed (only FAIR
- * class maintain depth value), reset depth properly.
- */
- se->depth = se->parent ? se->parent->depth + 1 : 0;
-#endif
+ set_task_rq(p, task_cpu(p));
/* Synchronize task with its cfs_rq */
attach_entity_load_avg(cfs_rq, se);
@@ -8072,8 +8066,6 @@ void init_cfs_rq(struct cfs_rq *cfs_rq)
static void task_move_group_fair(struct task_struct *p)
{
detach_task_cfs_rq(p);
- set_task_rq(p, task_cpu(p));
-
#ifdef CONFIG_SMP
/* Tell se's cfs_rq has been changed -- migrated */
p->se.avg.last_update_time = 0;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 167ab4844ee6..dde8881f16bc 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -931,6 +931,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = tg->cfs_rq[cpu];
p->se.parent = tg->se[cpu];
+ p->se.depth = p->se.parent ? p->se.parent->depth + 1 : 0;
#endif
#ifdef CONFIG_RT_GROUP_SCHED
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/