Re: [PATCH RFC] sched/fair: let cpu's cfs_rq to reflect task migration
From: Leo Yan
Date: Tue Apr 05 2016 - 02:57:00 EST
On Mon, Apr 04, 2016 at 09:48:23AM +0100, Morten Rasmussen wrote:
> On Sat, Apr 02, 2016 at 03:11:54PM +0800, Leo Yan wrote:
> > On Fri, Apr 01, 2016 at 03:28:49PM -0700, Steve Muckle wrote:
> > > I think I follow - Leo please correct me if I mangle your intentions.
> > > It's an issue that Morten and Dietmar had mentioned to me as well.
>
> Yes. We have been working on this issue for a while without getting to a
> nice solution yet.
Good to know this. This patch is mainly for discussion purpose.
[...]
> > > Leo I noticed you did not modify detach_entity_load_average(). I think
> > > this would be needed to avoid the task's stats being double counted for
> > > a while after switched_from_fair() or task_move_group_fair().
>
> I'm afraid that the solution to problem is more complicated than that
> :-(
>
> You are adding/removing a contribution from the root cfs_rq.avg which
> isn't part of the signal in the first place. The root cfs_rq.avg only
> contains the sum of the load/util of the sched_entities on the cfs_rq.
> If you remove the contribution of the tasks from there you may end up
> double-accounting for the task migration. Once due to you patch and then
> again slowly over time as the group sched_entity starts reflecting that
> the task has migrated. Furthermore, for group scheduling to make sense
> it has to be the task_h_load() you add/remove otherwise the group
> weighting is completely lost. Or am I completely misreading your patch?
Here have one thing want to confirm firstly: though CFS has maintained
task group's hierarchy, but between task group's cfs_rq.avg and root
cfs_rq.avg, CFS updates these signals independently rather than accouting
them by crossing the hierarchy.
So currently CFS decreases the group's cfs_rq.avg for task's migration,
but it don't iterate task group's hierarchy to root cfs_rq.avg. I
don't understand your meantioned the second accounting by "then again
slowly over time as the group sched_entity starts reflecting that the
task has migrated."
Another question is: does cfs_rq.avg _ONLY_ signal historic behavior but
not present behavior? so even the task has been migrated we still need
decay it slowly? Or this will be different between load and util?
> I don't think the slow response time for _load_ is necessarily a big
> problem. Otherwise we would have had people complaining already about
> group scheduling being broken. It is however a problem for all the
> initiatives that built on utilization.
Or maybe we need seperate utilization and load, these two signals
have different semantics and purpose.
Thanks,
Leo Yan