Re: [PATCH 1/2] sched: make fair sched class can handle the cgroup change by other class

From: Peter Zijlstra
Date: Tue Oct 13 2015 - 08:04:51 EST


On Tue, Oct 13, 2015 at 08:26:45PM +0900, Byungchul Park wrote:
> On Tue, Oct 13, 2015 at 11:06:54AM +0200, Peter Zijlstra wrote:
> > On Mon, Oct 05, 2015 at 06:16:23PM +0900, byungchul.park@xxxxxxx wrote:
> > > From: Byungchul Park <byungchul.park@xxxxxxx>
> > >
> > > Original fair sched class can handle the cgroup change occured within its
> > > class with task_move_group_fair(), but there is no way to know it if the
> > > change happened outside. This patch makes the fair sched class can handle
> > > the change of cgroup which happened even at other sched class.
> > >
> > > Additionally, it makes sched_move_task() more flexable so that any other
> > > sched class can add task_move_group_xx() callback easily in future when
> > > it is needed.
> >
> > I don't get the problem... when !fair, set_task_rq() will do what needs
> > doing.
>
> set_task_rq() changes se's cfs_rq properly.
>
> >
> > The only reason we need task_move_group_fair() is the extra accounting
> > required when we actually _are_ of the fair class, it needs to
> > unaccount, move and reaccount.
>
> i agree with you mostly. but let's consider following sequence.
>
> 1. switch se's class from fair to rt
> 2. change se's group within the rt class
> 3. switch se's class back to fair
>
> now, se->avg.last_update_time has a wrong value which is not synced with
> the current cfs_rq yet before calling attach_entity_load_avg(). so
> ATTACH_AGE_LOAD won't work expectedly. to be honest with you, no problem
> if we disable ATTACH_AGE_LOAD. but i think ATTACH_AGE_LOAD is a valuable
> feature, so i hope this patch will be added so that the ATTACH_AGE_LOAD
> feature works properly.

Ah, see details like that make or break a Changelog, since you've
clearly thought about it, you might as well write it down and safe me
the trouble of trying to puzzle it out on me own ;-)

OK, now that I understand the problem, let me consider it a bit.

One alternative solution would be to make set_task_rq() do the clear,
right?

---
kernel/sched/fair.c | 8 --------
kernel/sched/sched.h | 4 ++++
2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 700eb548315f..9469f023ed74 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5020,9 +5020,6 @@ static void migrate_task_rq_fair(struct task_struct *p)
*/
remove_entity_load_avg(&p->se);

- /* Tell new CPU we are migrated */
- p->se.avg.last_update_time = 0;
-
/* We have migrated, no longer consider this task hot */
p->se.exec_start = 0;
}
@@ -8080,11 +8077,6 @@ static void task_move_group_fair(struct task_struct *p)
{
detach_task_cfs_rq(p);
set_task_rq(p, task_cpu(p));
-
-#ifdef CONFIG_SMP
- /* Tell se's cfs_rq has been changed -- migrated */
- p->se.avg.last_update_time = 0;
-#endif
attach_task_cfs_rq(p);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index efd3bfc7e347..f5c39cb83ee5 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -935,6 +935,10 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = tg->cfs_rq[cpu];
p->se.parent = tg->se[cpu];
+#ifdef CONFIG_SMP
+ /* Tell se's cfs_rq has been changed -- migrated */
+ p->se.avg.last_update_time = 0;
+#endif
#endif

#ifdef CONFIG_RT_GROUP_SCHED
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/