Re: [PATCH RFC] NMI Re-introduce un[set]_nmi_callback

From: Andi Kleen
Date: Thu Sep 04 2008 - 11:52:32 EST


Ingo Molnar <mingo@xxxxxxx> writes:
>
> i'd much rather attack this general problem from this angle:
>
> static inline unsigned char get_nmi_reason(void)
> {
> return inb(0x61);
> }
>
> that port 61H read is both arcane (on modern chipsets) and broken on
> multiple levels.

Yes it is. I did some datasheet reading recently and unfortunately
there is no really standardized better way. So the only replacement
would be to have chipset specific NMI drivers that know
the particular registers of the chipset.

It's racy and SMP unsafe to begin with, if there's any
> mixture of intentional cross-CPU or CPU self-generated NMIs mixed with
> chipset generated NMIs.
>
> One possible approach would be to get rid of it, and to perhaps register

Removing the IO port accesses by default would be a good idea
I agree. They are hardly useful for anything on modern systems.

But you still need some way to catch the chipset NMIs
and give some indication of the problem.

The way so far was to ask all the other sources (software NMIs
in memory flags, perfmon IPIs check perf ctrs, etc.) first
and if it's none of them assume it's a chipset NMI
(or NMI button NMI if the sysctl is set).

Then if there's a chipset specific NMI driver it could
also check if the chipset raised it. That would be a possible
solution for HP -- they would need to implement such a driver
for their systems with the special watchdog.

Yes that's racy but the poor hardware support doesn't unfortunately
leave much wiggling room to do better.

> a low-priority die notifier on systems where we know port 61
> reads+writes to be safe and desired. Modern systems will emit MCEs in
> most cases anyway, not NMIs.

The chipsets will still trigger NMIs (depending on their
configuration) -- e.g. on some PCI or internal errors -- they cannot
trigger MCEs directly. Fortunately it's being replaced with PCI-AER
on PCI-Express, but PCI-X which doesn't do that is still very common
and shipping.

BTW the NMI handlers are also racy, it's not safe
to call printk in a NMI handler. They really should be taught
to start using mce_log()

-Andi
--
ak@xxxxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/