Re: [PATCH 13/14] x86/UV: Update UV support for external NMI signals

From: Mike Travis
Date: Wed Mar 20 2013 - 02:13:49 EST

On 3/14/2013 12:20 AM, Ingo Molnar wrote:
> * Mike Travis <travis@xxxxxxx> wrote:
>> There is an exception where the NMI_LOCAL notifier chain is used. When
>> the perf tools are in use, it's possible that our NMI was captured by
>> some other NMI handler and then ignored. We set a per_cpu flag for
>> those CPUs that ignored the initial NMI, and then send them an IPI NMI
>> signal.
> "Other" NMI handlers should never lose NMIs - if they do then they should
> be fixed I think.
> Thanks,
> Ingo

Hi Ingo,

I suspect that the other NMI handlers would not grab ours if we were
on the NMI_LOCAL chain to claim them. The problem though is the UV
Hub is not designed to have that amount of traffic reading the MMRs.
This was handled in previous kernel versions by a.) putting us at the
bottom of the chain; and b.) as soon as a handler claimed an NMI as
it's own, the search would be stopped.

Neither of these are true any more as all handlers are called for
all NMIs. (I measured anywhere from .5M to 4M NMIs per second on a
64 socket, 1024 cpu thread system [not sure why the rate changes]).
This was the primary motivation for placing the UV NMI handler on the
NMI_UNKNOWN chain, so it would be called only if all other handlers
"gave up", and thus not incur the overhead of the MMR reads on every
NMI event.

The good news is that I haven't yet encountered a case where the
"missing" cpus were not called into the NMI loop. Even better news
is that on the previous (3.0 vintage) kernels running two perf tops
would almost always cause either tons of the infamous "dazed and
confused" messages, or would lock up the system. Now it results in
quite a few messages like:

[ 961.119417] perf_event_intel: clearing PMU state on CPU#652

followed by a dump of a number of cpu PMC registers. But the system
remains responsive. (This was experienced in our Customer Training
Lab where multiple system admins were in the class.)

The bad news is I'm not sure why the errant NMI interrupts are lost.
I have noticed that restricting the 'perf tops' to separate and
distinct cpusets seems to lessen this "stomping on each other's perf
event handlers" effect, which might be more representative of actual
customer usage.

So in total the situation is vastly improved... :)

