Re: [GIT pull] timer updates for 4.9

From: Thomas Gleixner
Date: Mon Oct 24 2016 - 10:54:14 EST


On Mon, 24 Oct 2016, Thomas Gleixner wrote:
> On Sun, 23 Oct 2016, Linus Torvalds wrote:
> > So I found what looks like a bug in lock_timer_base() wrt migration.
> >
> > This code:
> >
> > for (;;) {
> > struct timer_base *base;
> > u32 tf = timer->flags;
> >
> > if (!(tf & TIMER_MIGRATING)) {
> > base = get_timer_base(tf);
> > spin_lock_irqsave(&base->lock, *flags);
> > if (timer->flags == tf)
> > return base;
> > spin_unlock_irqrestore(&base->lock, *flags);
> > }
> > cpu_relax();
> > }
> >
> > looks subtly buggy. I think that load of "tf" needs a READ_ONCE() to
> > make sure that gcc doesn't simply reload the valid of "timer->flags"
> > at random points.
>
> You are right, that needs a READ_ONCE(). Stupid me.
>
> > Yes, the spin_lock_irqsave() is a barrier, but that's the only one.
> > Afaik, gcc could decide that "I need to spill tf, so I'll just reload
> > it" after looking up get_timer_base().
> >
> > And no, I don't think this is the cause of my problem, but I suspect
> > that something _like_ fragility in lock_timer_base() could cause this.
>
> It might explain it, when this really ends up with the wrong base.

Can you please check in the disassembly whether gcc really reloads
timer->flags? Mine does not...

Another thing you might try is to enable debugobjects. As this happens only
at shutdown time the problem might be caused by something else which
wreckages timers in some subtle way.

Thanks,

tglx