Re: [PATCH -v3 5/6] x86, NMI, treat unknown NMI as hardware error
From: Andi Kleen
Date: Fri Oct 22 2010 - 05:24:15 EST
On Thu, Oct 21, 2010 at 09:49:55PM -0400, Don Zickus wrote:
> After re-reading Huang's patch, I am starting to understand what you mean
> by broken hardware. Basically you are trying to distinguish between
> legacy systems that were 'broken' in the sense they would randomly send
> uknown NMIs for no good reason, hence the 'Dazed and confused' messages
> and hardware errors on more modern systems that say, 'Hardware error,
> panicing check your BIOS for more info' (or whatever).
Yes that's it.
Unfortunately there are some cases where the BIOS lost it either,
so the fallback has to be panic (at least for the modern boxes)
>
> So Huang's patch was sort of acting like a switch. On legacy systems use
> 'Dazed and confused' for unknown NMIs. Whereas on whitelisted modern
> systems use a more relavant 'Check BIOS for error' message. Is that
> right?
Yes.
> > I don't think you need to worry about a lot more hardware NMI sources.
>
> Well until those machines dominate the marketplace, I'm stuck supporting
> those pre-Nahelam boxes with customers that committed to 10 years with
> last year's technology. ;-)
I should clarify that the NMI model I described long predates Nehalem.
If you assume 3-5 years deprecation cycles on servers it should be pretty
much universal in this space.
The HEDT detection was a proposed way to detect that, because most
of these systems should have HEDT.
The older machines still need to be supported, but it's ok to
just behave the same as today on them, no need for great improvements
here.
-Andi
--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/