[PATCH v2 07/10] sched/fair: use update_load_avg() to attach/detach entity load_avg

From: Chengming Zhou
Date: Wed Jul 13 2022 - 00:05:32 EST


Since update_load_avg() support DO_ATTACH and DO_DETACH now, we can
use update_load_avg() to implement attach/detach entity load_avg.

Another advantage of using update_load_avg() is that it will check
last_update_time before attach or detach, instead of unconditional
attach/detach in the current code.

This way can avoid some corner problematic cases of load tracking,
like twice attach problem, detach unattached NEW task problem.

1. switch to fair class (twice attach problem)

p->sched_class = fair_class; --> p.se->avg.last_update_time = 0
if (queued)
enqueue_task(p);
...
enqueue_entity()
update_load_avg(UPDATE_TG | DO_ATTACH)
if (!se->avg.last_update_time && (flags & DO_ATTACH)) --> true
attach_entity_load_avg() --> attached, will set last_update_time
check_class_changed()
switched_from() (!fair)
switched_to() (fair)
switched_to_fair()
attach_entity_load_avg() --> unconditional attach again!

2. change cgroup of NEW task (detach unattached task problem)

sched_move_group(p)
if (queued)
dequeue_task()
task_move_group_fair()
detach_task_cfs_rq()
detach_entity_load_avg() --> detach unattached NEW task
set_task_rq()
attach_task_cfs_rq()
attach_entity_load_avg()
if (queued)
enqueue_task()

These problems have been fixed in commit 7dc603c9028e
("sched/fair: Fix PELT integrity for new tasks"), which also
bring its own problems.

First, it add a new task state TASK_NEW and an unnessary limitation
that we would fail when change the cgroup of TASK_NEW tasks.

Second, it attach entity load_avg in post_init_entity_util_avg(),
in which we only set sched_avg last_update_time for !fair tasks,
will cause PELT integrity problem when switched_to_fair().

This patch make update_load_avg() the only location of attach/detach,
and can handle these corner cases like change cgroup of NEW tasks,
by checking last_update_time before attach/detach.

Signed-off-by: Chengming Zhou <zhouchengming@xxxxxxxxxxxxx>
---
kernel/sched/fair.c | 15 +++------------
1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 29811869c1fe..51fc20c161a3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4307,11 +4307,6 @@ static inline void update_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *s

static inline void remove_entity_load_avg(struct sched_entity *se) {}

-static inline void
-attach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-static inline void
-detach_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
-
static inline int newidle_balance(struct rq *rq, struct rq_flags *rf)
{
return 0;
@@ -11527,9 +11522,7 @@ static void detach_entity_cfs_rq(struct sched_entity *se)
struct cfs_rq *cfs_rq = cfs_rq_of(se);

/* Catch up with the cfs_rq and remove our load when we leave */
- update_load_avg(cfs_rq, se, 0);
- detach_entity_load_avg(cfs_rq, se);
- update_tg_load_avg(cfs_rq);
+ update_load_avg(cfs_rq, se, UPDATE_TG | DO_DETACH);
propagate_entity_cfs_rq(se);
}

@@ -11537,10 +11530,8 @@ static void attach_entity_cfs_rq(struct sched_entity *se)
{
struct cfs_rq *cfs_rq = cfs_rq_of(se);

- /* Synchronize entity with its cfs_rq */
- update_load_avg(cfs_rq, se, 0);
- attach_entity_load_avg(cfs_rq, se);
- update_tg_load_avg(cfs_rq);
+ /* Synchronize entity with its cfs_rq and attach our load */
+ update_load_avg(cfs_rq, se, UPDATE_TG | DO_ATTACH);
propagate_entity_cfs_rq(se);
}

--
2.36.1