Re: [RFC PATCH 0/4] sched: Improve cpu load accounting with nohz
From: Dietmar Eggemann
Date: Thu Jan 14 2016 - 16:19:18 EST
On 13/01/16 16:01, Frederic Weisbecker wrote:
> Most of the remaining nohz full work is about making the scheduler
> resilient to full dynticks (such that we can remove the 1Hz one day).
> Byungchul Park started interesting work toward that with cpu load
> updates in nohz full. So here is one more step forward.
> Patches 1 and 2 concern both dyntick-idle and dyntick-full. The rest
> is rather about full dynticks. Note that this isn't complete support for
> cpu load in nohz full, we still need to think about a way to make
> target_load() able to return up to date values on full dynticks targets.
> HEAD: f1d7d0f5382be3819490a859313f692f142dfb74
> Frederic Weisbecker (4):
> sched: Don't account tickless CPU load on tick
> sched: Consolidate nohz CPU load update code
> sched: Move cpu load stats functions above fair queue callbacks
> sched: Upload nohz full CPU load on task enqueue/dequeue
> kernel/sched/fair.c | 306 ++++++++++++++++++++++++++++++----------------------
> 1 file changed, 175 insertions(+), 131 deletions(-)
I noticed during the test of these patches on a NO_HZ_FULL cpu, that the
rq->cpu_load values can be abnormally high. This happens w/ and w/o your
patches but it happens way more w/ the patches applied.
It might be related to commit 59543275488d "sched/fair: Prepare
__update_cpu_load() to handle active tickless", at least the following
patch cures it for me.
-- >8 --
Subject: [PATCH] sched/fair: Avoid unsigned underflow in __update_cpu_load()
tickless_load, which is rq->cpu_load in the active case, can be
greater than rq->cpu_load[i] (0 < i < CPU_LOAD_IDX_MAX) which then
leads to an unsigned underflow when calculating old_load.
In the NO_HZ_FULL case (tick_nohz_full_update_tick() ->
tick_nohz_restart_sched_tick() -> update_cpu_load_nohz() ->
__update_cpu_load()) with pending_updates > 1, rq->cpu_load[i]
(0 < i < CPU_LOAD_IDX_MAX) can end up with abnormally high values:
Fix this by only assigning the difference out of rq->cpu_load[i]
and tickless_load to old_load if the former is greater, otherwise
set it to 0.
Also bail out of decay_load_missed() if it is called with load = 0.
E.g. running a pinned 50% task (w/ heavy tracing) on a cpu in
NO_HZ_FULL mode showed these max values for rq->cpu_load w/o
W/ this patch, the values are back to normal:
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>
kernel/sched/fair.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9deda2ac319f..4bf8f7c2c8b7 100644
@@ -4276,7 +4276,7 @@ decay_load_missed(unsigned long load, unsigned long missed_updates, int idx)
int j = 0;
- if (!missed_updates)
+ if (!load || !missed_updates)
if (missed_updates >= degrade_zero_ticks[idx])
@@ -4346,7 +4346,10 @@ static void __update_cpu_load(struct rq *this_rq, unsigned long this_load,
/* scale is effectively 1 << i now, and >> i divides by scale */
- old_load = this_rq->cpu_load[i] - tickless_load;
+ if (this_rq->cpu_load[i] > tickless_load)
+ old_load = this_rq->cpu_load[i] - tickless_load;
+ old_load = 0;
old_load = decay_load_missed(old_load, pending_updates - 1, i);
old_load += tickless_load;
new_load = this_load;
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.