Re: [patch 0/4] timer/nohz: Fix timer/nohz woes

From: Thomas Gleixner
Date: Wed Dec 27 2017 - 15:58:39 EST


On Sat, 23 Dec 2017, Paul E. McKenney wrote:
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index ffebcf878fba..94cce780c574 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -1875,6 +1875,7 @@ int timers_dead_cpu(unsigned int cpu)
>
> BUG_ON(old_base->running_timer);
>
> + old_base->must_forward_clk = true;
> for (i = 0; i < WHEEL_SIZE; i++)
> migrate_timer_list(new_base, old_base->vectors + i);
>

That's daft because base->next_expiry is also stale and that might cause
interesting issues in forward_timer_base(). But you are right, leaving the
base stale on cpu hotplug is not really a brilliant thing to do.

Instead of that workaround, I think we want a proper callback in the
prepare stage which brings the bases of the to be plugged CPU up to date.

Thanks,

tglx

8<---------------------
Subject: timers: Reinitialize per cpu bases on hotplug
From: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Date: Wed, 27 Dec 2017 21:37:25 +0100

The timer wheel bases are not (re)initialized on CPU hotplug. That leaves
them with a potentially stale clk and next_expiry valuem, which can cause
trouble then the CPU is plugged.

Add a prepare callback which forwards the clock, sets next_expiry to far in
the future and reset the control flags to a known state.

Set base->must_forward_clk so the first timer which is queued will try to
forward the clock to current jiffies.

Fixes: 500462a9de65 ("timers: Switch to a non-cascading wheel")
Reported-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
include/linux/cpuhotplug.h | 2 +-
include/linux/timer.h | 4 +++-
kernel/cpu.c | 4 ++--
kernel/time/timer.c | 14 ++++++++++++++
4 files changed, 20 insertions(+), 4 deletions(-)

--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -86,7 +86,7 @@ enum cpuhp_state {
CPUHP_MM_ZSWP_POOL_PREPARE,
CPUHP_KVM_PPC_BOOK3S_PREPARE,
CPUHP_ZCOMP_PREPARE,
- CPUHP_TIMERS_DEAD,
+ CPUHP_TIMERS_PREPARE,
CPUHP_MIPS_SOC_PREPARE,
CPUHP_BP_PREPARE_DYN,
CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20,
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -207,9 +207,11 @@ unsigned long round_jiffies_up(unsigned
unsigned long round_jiffies_up_relative(unsigned long j);

#ifdef CONFIG_HOTPLUG_CPU
+int timers_prepare_cpu(unsigned int cpu);
int timers_dead_cpu(unsigned int cpu);
#else
-#define timers_dead_cpu NULL
+#define timers_prepare_cpu NULL
+#define timers_dead_cpu NULL
#endif

#endif
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1277,9 +1277,9 @@ static struct cpuhp_step cpuhp_bp_states
* before blk_mq_queue_reinit_notify() from notify_dead(),
* otherwise a RCU stall occurs.
*/
- [CPUHP_TIMERS_DEAD] = {
+ [CPUHP_TIMERS_PREPARE] = {
.name = "timers:dead",
- .startup.single = NULL,
+ .startup.single = timers_prepare_cpu,
.teardown.single = timers_dead_cpu,
},
/* Kicks the plugged cpu into life */
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1853,6 +1853,20 @@ static void migrate_timer_list(struct ti
}
}

+int timers_prepare_cpu(unsigned int cpu)
+{
+ struct timer_base *base;
+ int b;
+
+ for (b = 0; b < NR_BASES; b++) {
+ base = per_cpu_ptr(&timer_bases[b], cpu);
+ base->clk = jiffies;
+ base->next_expiry = base->clk + NEXT_TIMER_MAX_DELTA;
+ base->is_idle = false;
+ base->must_forward_clk = true;
+ }
+}
+
int timers_dead_cpu(unsigned int cpu)
{
struct timer_base *old_base;