[PATCH v3 3/3] sched: make it possible to account fair class load avg consistently

From: byungchul.park
Date: Thu Oct 15 2015 - 05:02:30 EST


From: Byungchul Park <byungchul.park@xxxxxxx>

Current code can account fair class load average for the time the task
was absent from the fair class thanks to ATTACH_AGE_LOAD. However, it
doesn't work in the cases that either migration or group change happened
in the other sched classes.

This patch introduces more general way to care fair class load average
accounting, and it works consistently in any case e.g. migration or
cgroup change in other sched classes.

Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
---
kernel/sched/fair.c | 23 ++++++++++++++++++-----
kernel/sched/sched.h | 4 ++++
2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 08589a0..ad5d34c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2827,6 +2827,24 @@ static inline u64 get_last_update_time(struct cfs_rq *cfs_rq)
return last_update_time;
}

+#ifdef CONFIG_FAIR_GROUP_SCHED
+/*
+ * Called within set_task_rq() right before setting a task's cpu. The
+ * caller only guarantees p->pi_lock is held; no other assumptions,
+ * including the state of rq->lock, should be made. Thus we need to
+ * use get_last_update_time(). See get_last_update_time().
+ */
+void update_last_update_time(struct sched_entity *se,
+ struct cfs_rq *prev,
+ struct cfs_rq *next)
+{
+ if (se->avg.last_update_time) {
+ se->avg.last_update_time -= get_last_update_time(prev);
+ se->avg.last_update_time += get_last_update_time(next);
+ }
+}
+#endif
+
/*
* Task first catches up with cfs_rq, and then subtract
* itself from the cfs_rq (task must be off the queue now).
@@ -8088,11 +8106,6 @@ static void task_move_group_fair(struct task_struct *p)
{
detach_task_cfs_rq(p);
set_task_rq(p, task_cpu(p));
-
-#ifdef CONFIG_SMP
- /* Tell se's cfs_rq has been changed */
- p->se.avg.last_update_time = 0;
-#endif
attach_task_cfs_rq(p);
}

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 66d0552..559f9c7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -335,6 +335,9 @@ extern void sched_move_task(struct task_struct *tsk);

#ifdef CONFIG_FAIR_GROUP_SCHED
extern int sched_group_set_shares(struct task_group *tg, unsigned long shares);
+extern void update_last_update_time(struct sched_entity *se,
+ struct cfs_rq *prev,
+ struct cfs_rq *next);
#endif

#else /* CONFIG_CGROUP_SCHED */
@@ -933,6 +936,7 @@ static inline void set_task_rq(struct task_struct *p, unsigned int cpu)
#endif

#ifdef CONFIG_FAIR_GROUP_SCHED
+ update_last_update_time(&p->se, p->se.cfs_rq, tg->cfs_rq[cpu]);
p->se.cfs_rq = tg->cfs_rq[cpu];
p->se.parent = tg->se[cpu];
#endif
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/