Re: frequent lockups in 3.18rc4

From: Thomas Gleixner
Date: Tue Nov 18 2014 - 14:28:13 EST


On Tue, 18 Nov 2014, Linus Torvalds wrote:
> On Tue, Nov 18, 2014 at 6:52 AM, Dave Jones <davej@xxxxxxxxxx> wrote:
> >
> > Here's the first hit. Curiously, one cpu is missing.
>
> That might be the CPU3 that isn't responding to IPIs due to some bug..
>
> > NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [trinity-c180:17837]
> > RIP: 0010:[<ffffffffa91a0db0>] [<ffffffffa91a0db0>] bad_range+0x0/0x90
>
> Hmm. Something looping in the page allocator? Not waiting for a lock,
> but livelocked? I'm not seeing anything here that should trigger the
> NMI watchdog at all.
>
> Can the NMI watchdog get confused somehow?

That's the soft lockup detector which runs from the timer interrupt
not from NMI.

> So it does look like CPU3 is the problem, but sadly, CPU3 is
> apparently not listening, and doesn't even react to the NMI, much less

As I said in the other mail. It gets the NMI and reacts on it. It's
just mangled into the CPU0 backtrace.

Thanks,

tglx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/