Re: [PATCH 1/2] sched: make fair sched class can handle the cgroup change by other class

From: Byungchul Park
Date: Tue Oct 13 2015 - 20:00:07 EST


On Tue, Oct 13, 2015 at 02:04:38PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 13, 2015 at 08:26:45PM +0900, Byungchul Park wrote:
> > On Tue, Oct 13, 2015 at 11:06:54AM +0200, Peter Zijlstra wrote:
> > > On Mon, Oct 05, 2015 at 06:16:23PM +0900, byungchul.park@xxxxxxx wrote:
> > > > From: Byungchul Park <byungchul.park@xxxxxxx>
> > > >
> > > > Original fair sched class can handle the cgroup change occured within its
> > > > class with task_move_group_fair(), but there is no way to know it if the
> > > > change happened outside. This patch makes the fair sched class can handle
> > > > the change of cgroup which happened even at other sched class.
> > > >
> > > > Additionally, it makes sched_move_task() more flexable so that any other
> > > > sched class can add task_move_group_xx() callback easily in future when
> > > > it is needed.
> > >
> > > I don't get the problem... when !fair, set_task_rq() will do what needs
> > > doing.
> >
> > set_task_rq() changes se's cfs_rq properly.
> >
> > >
> > > The only reason we need task_move_group_fair() is the extra accounting
> > > required when we actually _are_ of the fair class, it needs to
> > > unaccount, move and reaccount.
> >
> > i agree with you mostly. but let's consider following sequence.
> >
> > 1. switch se's class from fair to rt
> > 2. change se's group within the rt class
> > 3. switch se's class back to fair
> >
> > now, se->avg.last_update_time has a wrong value which is not synced with
> > the current cfs_rq yet before calling attach_entity_load_avg(). so
> > ATTACH_AGE_LOAD won't work expectedly. to be honest with you, no problem
> > if we disable ATTACH_AGE_LOAD. but i think ATTACH_AGE_LOAD is a valuable
> > feature, so i hope this patch will be added so that the ATTACH_AGE_LOAD
> > feature works properly.
>
> Ah, see details like that make or break a Changelog, since you've
> clearly thought about it, you might as well write it down and safe me
> the trouble of trying to puzzle it out on me own ;-)

i am sorry for that, i will try to add more description on patches in
future.

>
> OK, now that I understand the problem, let me consider it a bit.
>
> One alternative solution would be to make set_task_rq() do the clear,
> right?

yes. i re-implemented it last night just within my head. fortunately, it
is similar to what you recommended. but i am not sure if it is good to
reset p->se.avg.last_update_time unconditionally for all cases calling
set_task_rq(). let me think about it more.

thank you,
byungchul

>
> ---
> kernel/sched/fair.c | 8 --------
> kernel/sched/sched.h | 4 ++++
> 2 files changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 700eb548315f..9469f023ed74 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5020,9 +5020,6 @@ static void migrate_task_rq_fair(struct task_struct *p)
> */
> remove_entity_load_avg(&p->se);
>
> - /* Tell new CPU we are migrated */
> - p->se.avg.last_update_time = 0;
> -
> /* We have migrated, no longer consider this task hot */
> p->se.exec_start = 0;
> }
> @@ -8080,11 +8077,6 @@ static void task_move_group_fair(struct task_struct *p)
> {
> detach_task_cfs_rq(p);
> set_task_rq(p, task_cpu(p));
> -
> -#ifdef CONFIG_SMP
> - /* Tell se's cfs_rq has been changed -- migrated */
> - p->se.avg.last_update_time = 0;
> -#endif
> attach_task_cfs_rq(p);
> }
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index efd3bfc7e347..f5c39cb83ee5 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -935,6 +935,10 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
> #ifdef CONFIG_FAIR_GROUP_SCHED
> p->se.cfs_rq = tg->cfs_rq[cpu];
> p->se.parent = tg->se[cpu];
> +#ifdef CONFIG_SMP
> + /* Tell se's cfs_rq has been changed -- migrated */
> + p->se.avg.last_update_time = 0;
> +#endif
> #endif
>
> #ifdef CONFIG_RT_GROUP_SCHED
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/