Re: Timer refuses to expire
From: Paul E. McKenney
Date: Thu Dec 07 2017 - 16:45:23 EST
On Thu, Dec 07, 2017 at 06:56:17AM -0800, Paul E. McKenney wrote:
> On Thu, Dec 07, 2017 at 03:03:50PM +0800, Boqun Feng wrote:
[ . . . ]
> > > What I did instead was to dump out the state of the task that
> > > __cpuhp_kick_ap() waits on, please see the patch at the very end of this
> > > email. This triggered as shown below, and you guessed it, that task is
> > > waiting on a grace period. Which I am guessing won't happen until the
> > > outgoing CPU reaches CPUHP_TIMERS_DEAD state and calls timers_dead_cpu().
> > > Which will prevent RCU's grace-period kthread from ever awakening, which
> > > will prevent the task that __cpuhp_kick_ap() waits on from ever awakening,
> > > which will prevent the outgoing CPU from reaching CPUHP_TIMERS_DEAD state.
> > >
> > > Deadlock.
> >
> > There is one thing I'm confused here. Sure, this is a deadlock, but the
> > timer should still work in such a deadlock, right? I mean, the timer of
> > schedule_timeout() should be able to wake up rcu_gp_kthread() even in
> > this case? And yes, the gp kthread will continue to wait due to the
> > deadlock, but the deadlock can not explain the "Waylayed timer", right?
>
> My belief is that the timer cannot fire because it is still on the
> offlined CPU, and that CPU has not yet reached timers_dead_cpu().
> But I might be missing something subtle in either the timers code or the
> CPU-hotplug code, so please do check my reasoning here. (I am relying on
> the "timer->flags: 0x40000007" and the "cpuhp/7" below, which I believe
> means that the timer is on CPU 7 and that it is CPU 7 that is in the
> process of going offline.)
>
> The "Waylayed timer" happens because the RCU CPU stall warning code
> wakes up the grace-period kthread. This is driven out of the
> scheduling-clock tick, so is unaffected by timers, though it does
> rely on the jiffies counter continuing to be incremented.
>
> So what am I missing here?
Well, last night's runs had situations where the ->flags CPU didn't
match the CPU going offline, so I am clearly missing something or another.
One thing I might have been missing was the CPU-online processing.
What happens if a CPU goes offline, comes back online, but before ->clk
gets adjusted there is a schedule_timeout()? Now, schedule_timeout()
does compute the absolute ->expires time using jiffies, so the wakeup time
should not be too far off of the desired time. Except that the timers
now have something like 8% slop, and that slop will be calculated on the
difference between the desired expiration time and the (way outdated)
->clk value. So the 8% might be a rather large number. For example,
if the CPU was offline for 12 minutes (unlikely but entirely possible
with rcutorture testing's random onlining and offlining), the slop on
a 3-millisecond timer might be a full minute.
To my timer-naive eyes, it looks like a simple fix is to set
old_base->must_forward_clk to true in timers_dead_cpu() for each
timer_base, as shown below. The other possibility that I considered
was to instead set ->is_idle, but that looked like an engraved
invitation to send IPIs to offline CPUs.
I am giving it a spin. I still believe that the offline deadlock
scenario can happen, but one thing at a time...
Thoughts?
Thanx, Paul
------------------------------------------------------------------------
commit e7e66e48125cba05b52242deb5b3fcb787aafe0e
Author: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
Date: Thu Dec 7 13:18:44 2017 -0800
timers: Ensure that timer_base ->clk accounts for time offline
The timer_base ->must_forward_clk is set to indicate that the next timer
operation on that timer_base must check for passage of time. One instance
of time passage is when the timer wheel goes idle, and another is when
the corresponding CPU is offline. Note that it is not appropriate to set
->is_idle because that could result in IPIing an offline CPU. Therefore,
this commit instead sets ->must_forward_clk at CPU-offline time.
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
diff --git a/kernel/time/timer.c b/kernel/time/timer.c
index fdc086129682..0ab767c2bff2 100644
--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -1921,6 +1921,7 @@ int timers_dead_cpu(unsigned int cpu)
BUG_ON(old_base->running_timer);
+ old_base->must_forward_clk = true;
for (i = 0; i < WHEEL_SIZE; i++)
migrate_timer_list(new_base, old_base->vectors + i);