Re: Yet more softlockups.

From: Dave Jones
Date: Fri Jul 12 2013 - 11:46:14 EST


On Fri, Jul 12, 2013 at 08:38:52AM -0700, Dave Hansen wrote:

> The warning comes from calling perf_sample_event_took(), which is only
> called from one place: perf_event_nmi_handler().
>
> So we can be pretty sure that the perf NMI is firing, or at least that
> this handler code is running.
>
> nmi_handle() says:
> /*
> * NMIs are edge-triggered, which means if you have enough
> * of them concurrently, you can lose some because only one
> * can be latched at any given time. Walk the whole list
> * to handle those situations.
> */
>
> perf_event_nmi_handler() probably gets _called_ when the watchdog NMI
> goes off. But, it should hit this check:
>
> if (!atomic_read(&active_events))
> return NMI_DONE;
>
> and return quickly. This is before it has a chance to call
> perf_sample_event_took().
>
> Dave, for your case, my suspicion would be that it got turned on
> inadvertently, or that we somehow have a bug which bumped up
> perf_event.c's 'active_events' and we're running some perf code that we
> don't have to.

What do you 'inadvertantly' ? I see this during bootup every time.
Unless systemd or something has started playing with perf, (which afaik it isn't)

> But, I'm suspicious. I was having all kinds of issues with perf and
> NMIs taking hundreds of milliseconds. I never isolated it to having a
> real, single, cause. I attributed it to my large NUMA system just being
> slow. Your description makes me wonder what I missed, though.

Here's a fun trick:

trinity -c perf_event_open -C4 -q -l off

Within about a minute, that brings any of my boxes to its knees.
The softlockup detector starts going nuts, and then the box wedges solid.

(You may need to bump -C depending on your CPU count. I've never seen it happen
with a single process, but -C2 seems to be a minimum)

That *is* using perf though, so I kind of expect bad shit to happen when there are bugs.
The "during bootup" case is still a head-scratcher.

Dave

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/