[PATCH v3 1/2] sched: make __update_cpu_load() handle active tickless case

From: byungchul.park
Date: Fri Oct 02 2015 - 03:49:41 EST


From: Byungchul Park <byungchul.park@xxxxxxx>

There are some cases where distance between ticks is more then one tick,
while the cpu is not idle, e.g.

- full NOHZ
- tracing
- long lasting callbacks
- being scheduled away when running in a VM

However __update_cpu_load() assumes it is the idle tickless case if the
distance between ticks is more than 1, even though it can be the active
tickless case. Thus in the active tickless case, updating cpu load
cannot be performed correctly. To update cpu load properly, this patch
changes __update_cpu_load() so that it can handle active tickless case.

We can consider the new_load for each omitted tick during tickless though
the each new_load would be zero in idle tickless case, while it's not
true in non-idle tickless case e.g. full NOHZ. We can approximately
consider the new_load ~= the last this_rq->cpu_load[0] during tickless
in non-idle tickless case e.g. full NOHZ. Now, let's define a symbol L
which is representing the each new_load during tickless so that L = 0 in
idle tickless case, L = this_rq->cpu_load[0] in non-idle tickless case.
Then, we can get the cpu_load(n) during tickless since last update, by

cpu_load(n) = (1 - 1/s) * cpu_load(n-1) + (1/s) * L
where n = the current tick - 1, s = scale
= A * cpu_load(n-1) + B
where A = 1 - 1/s, B = (1/s) * L
= A * (A * cpu_load(n-2) + B) + B
= A * (A * (A * cpu_load(n-3) + B) + B) + B
= A^3 * cpu_load(n-3) + A^2 * B + A * B + B
= A^i * cpu_load(n-i) + (A^(i-1) + A^(i-2) + ... + 1) * B
where i = pending_updates - 1
= A^i * cpu_load(n-i) + B * (A^i - 1) / (A - 1)
by geometric series formula for sum
= (1 - 1/s)^i * (cpu_load(n-i) - L) + L
by extending A and B

Calculation is over!

1. (1 - 1/s)^i can be handled by decay_load_missed().
2. cpu_load(n-i) is the "old_load".
3. L is the this_rq->cpu_load[0], updated at the last time.
4. cpu_load(n) for current tick will be handled at this time.

Signed-off-by: Byungchul Park <byungchul.park@xxxxxxx>
---
kernel/sched/fair.c | 46 +++++++++++++++++++++++++++++++++++++++-------
1 file changed, 39 insertions(+), 7 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 4d5f97b..9e76871 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4249,13 +4249,44 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)

/*
* Update rq->cpu_load[] statistics. This function is usually called every
- * scheduler tick (TICK_NSEC). With tickless idle this will not be called
- * every tick. We fix it up based on jiffies.
+ * scheduler tick (TICK_NSEC). With tickless this will not be called every
+ * tick. We fix it up based on jiffies.
+ *
+ * We can consider the new_load for each omitted tick during tickless though
+ * the each new_load would be zero in idle tickless case, while it's not
+ * true in non-idle tickless case e.g. full NOHZ. We can approximately
+ * consider the new_load ~= the last this_rq->cpu_load[0] during tickless
+ * in non-idle tickless case e.g. full NOHZ. Now, let's define a symbol L
+ * which is representing the each new_load during tickless so that L = 0 in
+ * idle tickless case, L = this_rq->cpu_load[0] in non-idle tickless case.
+ * Then, we can get the cpu_load(n) during tickless since last update, by
+ *
+ * cpu_load(n) = (1 - 1/s) * cpu_load(n-1) + (1/s) * L
+ * where n = the current tick - 1, s = scale
+ * = A * cpu_load(n-1) + B
+ * where A = 1 - 1/s, B = (1/s) * L
+ * = A * (A * cpu_load(n-2) + B) + B
+ * = A * (A * (A * cpu_load(n-3) + B) + B) + B
+ * = A^3 * cpu_load(n-3) + A^2 * B + A * B + B
+ * = A^i * cpu_load(n-i) + (A^(i-1) + A^(i-2) + ... + 1) * B
+ * where i = pending_updates - 1
+ * = A^i * cpu_load(n-i) + B * (A^i - 1) / (A - 1)
+ * by geometric series formula for sum
+ * = (1 - 1/s)^i * (cpu_load(n-i) - L) + L
+ * by extending A and B
+ *
+ * Calculation is over!
+ *
+ * 1. (1 - 1/s)^i can be handled by decay_load_missed().
+ * 2. cpu_load(n-i) is the "old_load".
+ * 3. L is the this_rq->cpu_load[0], updated at the last time.
+ * 4. cpu_load(n) for current tick will be handled at this time.
*/
static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
- unsigned long pending_updates)
+ unsigned long pending_updates, int active)
{
int i, scale;
+ unsigned long tickless_load = active ? this_rq->cpu_load[0] : 0;

this_rq->nr_load_updates++;

@@ -4266,8 +4297,9 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,

/* scale is effectively 1 << i now, and >> i divides by scale */

- old_load = this_rq->cpu_load[i];
+ old_load = this_rq->cpu_load[i] - tickless_load;
old_load = decay_load_missed(old_load, pending_updates - 1, i);
+ old_load += tickless_load;
new_load = this_load;
/*
* Round up the averaging division if load is increasing. This
@@ -4322,7 +4354,7 @@ static void update_idle_cpu_load(struct rq *this_rq)
pending_updates = curr_jiffies - this_rq->last_load_update_tick;
this_rq->last_load_update_tick = curr_jiffies;

- __update_cpu_load(this_rq, load, pending_updates);
+ __update_cpu_load(this_rq, load, pending_updates, 0);
}

/*
@@ -4345,7 +4377,7 @@ void update_cpu_load_nohz(void)
* We were idle, this means load 0, the current load might be
* !0 due to remote wakeups and the sort.
*/
- __update_cpu_load(this_rq, 0, pending_updates);
+ __update_cpu_load(this_rq, 0, pending_updates, 0);
}
raw_spin_unlock(&this_rq->lock);
}
@@ -4361,7 +4393,7 @@ void update_cpu_load_active(struct rq *this_rq)
* See the mess around update_idle_cpu_load() / update_cpu_load_nohz().
*/
this_rq->last_load_update_tick = jiffies;
- __update_cpu_load(this_rq, load, 1);
+ __update_cpu_load(this_rq, load, 1, 1);
}

/*
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/