RE: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback
From: Mingarelli, Thomas
Date: Thu Sep 04 2008 - 17:07:18 EST
Ok regarding question #1. The die_notifier works as you mentioned; however, the fact that the watchdog timer ticks also come through as NMIs is a hinderance. Now, when the watchdog timer is configured through the LOCAL_APIC the issue isn't so bad. I think the hpwdt driver handles the NMI coming in because there isn't a flood of timer ticks coming through as in the IOAPIC case.
As for the KDUMP perhaps I am missing something. If I handle the NMI coming in and source it via our BIOS, I then stop the watchdog timer and the kdump will take place.
Tom
-----Original Message-----
From: Vivek Goyal [mailto:vgoyal@xxxxxxxxxx]
Sent: Thursday, September 04, 2008 3:57 PM
To: Mingarelli, Thomas
Cc: Andi Kleen; Don Zickus; Ingo Molnar; Prarit Bhargava; Peter Zijlstra; linux-kernel@xxxxxxxxxxxxxxx; arozansk@xxxxxxxxxx; ak@xxxxxxxxxxxxxxx; Alan Cox; H. Peter Anvin; Thomas Gleixner; Maciej W. Rozycki
Subject: Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback
On Thu, Sep 04, 2008 at 08:01:31PM +0000, Mingarelli, Thomas wrote:
> Exactly.
>
> The hpwdt driver is meant to be a catch-all for any NMI coming through on ProLiant HW only. Moreover, for newer ProLiant HW at that.
>
> Once the NMI comes in, we call into our BIOS for the true reason of the NMI. That message gets logged to the IML in NVRAM for the user to view. We then panic the system.
>
> Yes, kdump will work under this scenario because we stop the watchdog timer. This is a user configurable setting.
>
>
Sorry I did not get it. Few questions.
- So you want to capture every NMI and then do something. So what's the
harm in registering on die chain and look for both DIE_NMI_IPI and
DIE_NMI events and take appropriate action? Depending on reason code,
one or other will be called. If I read the code correctly, you will get
to see every NMI on that cpu irrespective of the reason and then you can
take the action accordingly.
- How would kdump continue to work above driver hijacks the nmi callback.
You will disable watchdog, log message and call panic(). panic() will
lead to kdump and kdump will send NMI IPI to reset of the cpus in the
system to save their state and halt these. The moment other cpus get
NMI IPI, above driver will hijack that NMI also and nobody gets a chance
to run? So kdump will not work?
Am I missing something?
Thanks
Vivek
> Tom
>
> -----Original Message-----
> From: Andi Kleen [mailto:andi@xxxxxxxxxxxxxx]
> Sent: Thursday, September 04, 2008 3:01 PM
> To: Vivek Goyal
> Cc: Don Zickus; Andi Kleen; Ingo Molnar; Prarit Bhargava; Peter Zijlstra; linux-kernel@xxxxxxxxxxxxxxx; arozansk@xxxxxxxxxx; Mingarelli, Thomas; ak@xxxxxxxxxxxxxxx; Alan Cox; H. Peter Anvin; Thomas Gleixner; Maciej W. Rozycki
> Subject: Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback
>
> > Add "kdump" to the list. It will also be broken if we decide to let one
> > driver hijack the NMI handler.
>
> kdump is a special case, similar to the NMI button panic mode. It should
> be always only active when the user configured it. When the user configured
> it should be always the fallback and override any other drivers.
>
> But watchdog is a special case. I assume the watchdog will just log
> (and do the work that a SMI should be doing) but then continue
> the chain so that kdump can dump on a watchdog timeout.
>
> -Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/