[PATCH] timer: Lazily wakup nohz CPU when adding new timer.

From: Yunhong Jiang
Date: Mon Sep 28 2015 - 14:56:42 EST


Currently, when a new timer added to timer wheel for a nohz_active CPU,
the target CPU will always be waked up.

In fact, if the new added timer is after the base->next_timer, we don't
need wake up the target CPU since it will not change the sleep time. A
lazy wake up is better in such scenario.

I cooked a test scenario. On my 32 cores system, a driver on CPU 15
continuous enqueues timer to CPU 8/9/10/11 with random expire and then
checks the idle_calls difference after 10 seconds. Below data shows
that lazy wake up do reduce the wakeup a lot.

w/o Lazy w/ lazy
CPU 8: 135 88
CPU 9: 238 43
CPU 10: 157 83
CPU 11: 172 70

Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx>
---
kernel/time/timer.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index d3f5e92f722a..a039d9e6b55a 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -414,6 +414,8 @@ __internal_add_timer(struct tvec_base *base, struct timer_list *timer)

static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
{
+ bool kick_nohz = false;
+
/* Advance base->jiffies, if the base is empty */
if (!base->all_timers++)
base->timer_jiffies = jiffies;
@@ -424,9 +426,17 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
*/
if (!(timer->flags & TIMER_DEFERRABLE)) {
if (!base->active_timers++ ||
- time_before(timer->expires, base->next_timer))
+ time_before(timer->expires, base->next_timer)) {
base->next_timer = timer->expires;
- }
+ /*
+ * CPU in dynticks need reevaluate the timer wheel
+ * if newer timer added with next_timer updated.
+ */
+ if (base->nohz_active)
+ kick_nohz = true;
+ }
+ } else if (base->nohz_active && tick_nohz_full_cpu(base->cpu))
+ kick_nohz = true;

/*
* Check whether the other CPU is in dynticks mode and needs
@@ -441,11 +451,8 @@ static void internal_add_timer(struct tvec_base *base, struct timer_list *timer)
* require special care against races with idle_cpu(), lets deal
* with that later.
*/
- if (base->nohz_active) {
- if (!(timer->flags & TIMER_DEFERRABLE) ||
- tick_nohz_full_cpu(base->cpu))
- wake_up_nohz_cpu(base->cpu);
- }
+ if (kick_nohz)
+ wake_up_nohz_cpu(base->cpu);
}

#ifdef CONFIG_TIMER_STATS
--
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/