Re: [PATCH 8/8] sched,fair: flatten hierarchical runqueues
From: Dietmar Eggemann
Date: Fri Jun 28 2019 - 06:26:32 EST
On 6/12/19 9:32 PM, Rik van Riel wrote:
> Flatten the hierarchical runqueues into just the per CPU rq.cfs runqueue.
>
> Iteration of the sched_entity hierarchy is rate limited to once per jiffy
> per sched_entity, which is a smaller change than it seems, because load
> average adjustments were already rate limited to once per jiffy before this
> patch series.
>
> This patch breaks CONFIG_CFS_BANDWIDTH. The plan for that is to park tasks
> from throttled cgroups onto their cgroup runqueues, and slowly (using the
> GENTLE_FAIR_SLEEPERS) wake them back up, in vruntime order, once the cgroup
> gets unthrottled, to prevent thundering herd issues.
>
> Signed-off-by: Rik van Riel <riel@xxxxxxxxxxx>
> ---
> include/linux/sched.h | 2 +
> kernel/sched/fair.c | 478 +++++++++++++++++-------------------------
> kernel/sched/pelt.c | 6 +-
> kernel/sched/pelt.h | 2 +-
> kernel/sched/sched.h | 2 +-
> 5 files changed, 194 insertions(+), 296 deletions(-)
>
[...]
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
[...]
> @@ -3491,7 +3544,7 @@ static inline bool update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s
> * track group sched_entity load average for task_h_load calc in migration
> */
> if (se->avg.last_update_time && !(flags & SKIP_AGE_LOAD))
> - updated = __update_load_avg_se(now, cfs_rq, se);
> + updated = __update_load_avg_se(now, cfs_rq, se, curr, curr);
I wonder if task migration is still working correctly.
migrate_task_rq_fair(p, ...) -> remove_entity_load_avg(&p->se) would use
cfs_rq = se->cfs_rq (i.e. root cfs_rq). So load (and util) will not
propagate through the taskgroup hierarchy.
[...]