Re: Race condition in HR timers that cause double insertion and hard lockup -- all latest versions

From: Thomas Gleixner
Date: Tue Sep 02 2014 - 15:08:38 EST


On Tue, 2 Sep 2014, Linus Torvalds wrote:

> On Tue, Sep 2, 2014 at 8:45 AM, Itzcak Pechtalt
> <itzcak@xxxxxxxxxxxxxxxxx> wrote:
> >
> > I opened a bug in https://bugzilla.kernel.org/show_bug.cgi?id=83601 for this subject with full description.
> > There is also a short fix patch for kernel/hrtimer.c file.
> > Even if this bug occurs rary, however it resolves system hard lockup option.
>
> The patch is whitespace-damaged, but with a small oneliner like this
> that doesn't much matter (the timer files moved to kernel/time/ during
> this merge window, so the patch wouldn't apply as-is anyway).
>
> It needs a sign-off (see Documentation/SubmittingPatches), but even
> more importantly it needs to go to the right people for
> double-checking.
>
> But the patch is more broken than whitespace and even lack of
> sign-off. It cannot even have compiled. I'm assuming "timer_state" was
> intended to be "timer->state". Also, every caller but one already has
> "HRTIMER_STATE_CALLBACK" set unconditionally or to the old state in
> "newstate", so I suspect if this patch is the real fix (which I'll
> leave for Thomas to comment more on), afaik the actual problem can
> only happen through migrate_hrtimer_list() which uconditionally sets
> the whole state to HRTIMER_STATE_MIGRATE.
>
> Thomas? Leaving damaged patch quoted below.

Right. It's been fixed long ago and the migrate path cannot suffer
from this problem because at this point a callback running on the dead
cpu would cause the

BUG_ON(hrtimer_callback_running(timer));

a few lines above the remove_hrtimer() call to trigger and send the
machine into lala land.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/