Re: [PATCH 1/2] x86/platform: Add a low priority low frequency NMI call chain

From: Mike Travis
Date: Tue Mar 07 2017 - 10:39:57 EST




On 3/6/2017 11:42 PM, Ingo Molnar wrote:
>
> * Mike Travis <mike.travis@xxxxxxx> wrote:
>
>> Add a new NMI call chain that is called last after all other NMI handlers
>> have been checked and did not "handle" the NMI. This mimics the current
>> NMI_UNKNOWN call chain except it eliminates the WARNING message about
>> multiple NMI handlers registering on this call chain.
>>
>> This call chain dramatically lowers the NMI call frequency when high
>> frequency NMI tools are in use, notably the perf tools. It is required
>> for NMI handlers that cannot sustain a high NMI call rate without
>> ramifications to the system operability.
>
> So how about we just turn off that warning instead? I don't remember the last time
> it actually _helped_ us find any kernel or hardware bug - and it has caused tons
> of problems...

I can do that, with an even simpler patch...

>
> It's not like we warn about excess regular IRQs either - we either handle them or
> at most increase a counter somewhere. We could do the same for NMIs: introduce a
> counter somewhere that counts the number of seemingly unhandled NMIs.

Really "unknown" NMI errors are reported by either the "dazed and
confused" message or if the panic on unknown nmi is set, then the
system will panic. So unknown NMI occurrences are already being
dealt with.

Plus the following stats are being collected, though I'm not sure of
any reporting facility:

struct nmi_stats {
unsigned int normal;
unsigned int unknown;
unsigned int external;
unsigned int swallow;
};

static DEFINE_PER_CPU(struct nmi_stats, nmi_stats);


>
> But in any case, we should not spam the kernel log, neither with high, nor with
> low frequency.
>
> Thanks,
>
> Ingo
>