Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

From: Andi Kleen
Date: Thu Sep 04 2008 - 13:49:56 EST


On Thu, Sep 04, 2008 at 01:20:52PM -0400, Don Zickus wrote:
> On Thu, Sep 04, 2008 at 05:52:17PM +0200, Andi Kleen wrote:
> > Then if there's a chipset specific NMI driver it could
> > also check if the chipset raised it. That would be a possible
> > solution for HP -- they would need to implement such a driver
> > for their systems with the special watchdog.
>
> The thing with HP's special watchdog timer is that it does _not_ have a
> chipset specific NMI it is trying to catch. HP is going on the assumption
> that _all_ NMIs are /bad/ and they want to catch _every_ NMI, log it, and
> reboot the system.

That's my point. If you have drivers which can identify all other
NMIs then the left over NMIs must come from that watchdog driver.
So they just need drivers which can do that for their chipsets.

It's not race free, but that's simply not possible with the x86
NMI architecture.

Better would be probably to just configure the watchdog
to reboot the system directly on its own. Most other watchdogs
I'm aware of do that. That's more reliable anyways because the system
might be wedged enough to not be able to process NMIs anymore.

>
> Now obviously NMIs from kgdb and oprofile are not the ones a system should
> panic on but this breaks HP's assumptions.
>
> So that is part of the problem. How do you become a catch-all for NMIs in
> a system, to process as you wish, but ignore all the 'safe' NMIs?

To be fully reliable: you need a new NMI architecture or move the event
somewhere else.
To be reasonable reliable (assuming NMis are not very frequent): you
need drivers for all NMI sources that can identify them.

-Andi
--
ak@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/