Re: [RFC] x86, NMI, Treat unknown NMI as hardware error

From: Cyrill Gorcunov
Date: Fri May 13 2011 - 11:17:36 EST


On 05/13/2011 12:23 PM, Huang Ying wrote:
> In general, unknown NMI is used by hardware and firmware to notify
> fatal hardware errors to OS. So the Linux should treat unknown NMI as
> hardware error and go panic upon unknown NMI for better error
> containment.
>
> But there are some legacy machine which would randomly send unknown
> NMIs for no good reason. To support these machines, a white list
> mechanism is provided to treat unknown NMI as hardware error only on
> some known working system.
>
> These systems are identified via the presentation of APEI HEST or
> some PCI ID of the host bridge. The PCI ID of host bridge instead of
> DMI ID is used, so that the checking can be done based on the platform
> type instead of motherboard. This should be simpler and sufficient.
>
> The method to identify the platforms is designed by Andi Kleen.
>
> Signed-off-by: Huang Ying <ying.huang@xxxxxxxxx>
> Cc: Andi Kleen <ak@xxxxxxxxxxxxxxx>
> Cc: Don Zickus <dzickus@xxxxxxxxxx>
> ---
...

Hi Ying,

just curious (regardless the concerns Don and Ingo have) -- if there still a need
for such semi-unknown nmi handling maybe it's worth to register a *notifier* for it
and we panic only when user *explicitly* specify how to treat this class of NMIs
(via say "hest-nmi-panic" boot option or something like that). Maybe such partially
modular scheme would be better? If only I don't miss anything.

--
Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/