Re: Crashes with 874bbfe600a6 in 3.18.25

From: Thomas Gleixner
Date: Thu Feb 04 2016 - 05:48:26 EST


On Thu, 4 Feb 2016, Mike Galbraith wrote:
> On Wed, 2016-02-03 at 12:06 -0500, Tejun Heo wrote:
> > On Wed, Feb 03, 2016 at 06:01:53PM +0100, Mike Galbraith wrote:
> > > Hm, so it's ok to queue work to an offline CPU? What happens if it
> > > doesn't come back for an eternity or two?
> >
> > Right now, it just loses affinity....
>
> WRT affinity...
>
> Somebody somewhere queues a delayed work, a timer is started on CPUX,
> work is targeted at CPUX. Now wash/rinse/repeat mod_delayed_work()
> along with migrations. Should __queue_delayed_work() not refrain from
> altering dwork->cpu once set?
>
> I'm also wondering why 22b886dd only applies to kernels >= 4.2.
>
> <quote>
> Regardless of the previous CPU a timer was on, add_timer_on()
> currently simply sets timer->flags to the new CPU. As the caller must
> be seeing the timer as idle, this is locally fine, but the timer
> leaving the old base while unlocked can lead to race conditions as
> follows.
>
> Let's say timer was on cpu 0.
>
> cpu 0 cpu 1
> -----------------------------------------------------------------------------
> del_timer(timer) succeeds
> del_timer(timer)
> lock_timer_base(timer) locks cpu_0_base
> add_timer_on(timer, 1)
> spin_lock(&cpu_1_base->lock)
> timer->flags set to cpu_1_base
> operates on @timer operates on @timer
> </quote>
>
> What's the difference between...
> timer->flags = (timer->flags & ~TIMER_BASEMASK) | cpu;
> and...
> timer_set_base(timer, base);
>
> ...that makes that fix unneeded prior to 4.2? We take the same locks
> in < 4.2 kernels, so seemingly both will diddle concurrently above.

Indeed, you are right.

The same can happen on pre 4.2, just the fix does not apply as we changed the
internals how the base is managed in the timer itself. Backport below.

Thanks,

tglx

8<----------------------------

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -956,13 +956,26 @@ EXPORT_SYMBOL(add_timer);
*/
void add_timer_on(struct timer_list *timer, int cpu)
{
- struct tvec_base *base = per_cpu(tvec_bases, cpu);
+ struct tvec_base *new_base = per_cpu(tvec_bases, cpu);
+ struct tvec_base *base;
unsigned long flags;

timer_stats_timer_set_start_info(timer);
BUG_ON(timer_pending(timer) || !timer->function);
- spin_lock_irqsave(&base->lock, flags);
- timer_set_base(timer, base);
+
+ /*
+ * If @timer was on a different CPU, it must be migrated with the
+ * old base locked to prevent other operations proceeding with the
+ * wrong base locked. See lock_timer_base().
+ */
+ base = lock_timer_base(timer, &flags);
+ if (base != new_base) {
+ timer_set_base(timer, NULL);
+ spin_unlock(&base->lock);
+ base = new_base;
+ spin_lock(&base->lock);
+ timer_set_base(timer, base);
+ }
debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
spin_unlock_irqrestore(&base->lock, flags);