[PATCH] sched: Reduce the rate of needless idle load balancing

From: Tim Chen
Date: Tue May 20 2014 - 16:17:44 EST


The current no_hz idle load balancer do load balancing on *all* idle cpus,
even though the time due to load balance for a particular
idle cpu could be still a while in future. This introduces a much
higher load balancing rate than what is necessary. The patch
changes the behavior by only doing idle load balancing on
behalf of an idle cpu only when time is due for load balancing.

On SGI's systems with over 3000 cores, the cpu responsible for idle balancing
got overwhelmed with idle balancing, and introduces a lot of OS noise
to workloads. This patch fixes the issue.

Thanks.

Tim

Acked-by: Russ Anderson <rja@xxxxxxx>
Signed-off-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
---
kernel/sched/fair.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 9b4c4f3..97132db 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6764,12 +6764,17 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

rq = cpu_rq(balance_cpu);

- raw_spin_lock_irq(&rq->lock);
- update_rq_clock(rq);
- update_idle_cpu_load(rq);
- raw_spin_unlock_irq(&rq->lock);
-
- rebalance_domains(rq, CPU_IDLE);
+ /*
+ * If time for next balance is due,
+ * do the balance.
+ */
+ if (time_after(jiffies + 1, rq->next_balance)) {
+ raw_spin_lock_irq(&rq->lock);
+ update_rq_clock(rq);
+ update_idle_cpu_load(rq);
+ raw_spin_unlock_irq(&rq->lock);
+ rebalance_domains(rq, CPU_IDLE);
+ }

if (time_after(this_rq->next_balance, rq->next_balance))
this_rq->next_balance = rq->next_balance;
--
1.7.11.7


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/