Re: [PATCH -v2 6/7] x86, NMI, Add support to notify hardware errorwith unknown NMI

From: Don Zickus
Date: Thu Sep 30 2010 - 00:36:51 EST


On Wed, Sep 29, 2010 at 04:17:30PM +0800, huang ying wrote:
> On Tue, Sep 28, 2010 at 11:32 PM, Don Zickus <dzickus@xxxxxxxxxx> wrote:
>
> > But the problem is you have to export all this platform specific stuff to
> > traps.c and surround the code with #ifdef's, which start to look ugly.
>
> There is no #ifdef in my final default_do_nmi(), so I think the code
> can be cleaned up without converting everything into notifier block. I
> think the rule can be: architecture specific thing should go direct
> call, while device driver should be turned into notifier block.

That sounds like a good rule, but then my definition of architecture
specific is whatever is written in the intel/amd x86_64 architecture
manual (that sits on my desk, dated 2002), which wouldn't include any
of the error handling you propose, nor MCE, nor perf.

I guess I look at all that stuff as cpu features because not all the cpus
on the market have them. Shouldn't traps.c just contain core architecture
stuff and all those hardware error features could go under
arch/x86/kernel/cpu with the rest of the features, no?

>
> > Is there any reason why traps.c should know about MCA/HEST/<other hardware
> > errors>?  Shouldn't it be abstracted away?
>
> Yes. The device drivers should be abstracted away, leaving
> architectural logic, such as port 0x61 as direct call. We need
> notifier chain, but I just suggest reduce its usage if possible.
>
> > Honestly, I would be interested in creating a southbridge driver and
> > moving the port 0x61 code there and keeping the default_do_nmi() function
> > stupidly simple (just a call to the die_chain and the
> > unknown_nmi_error()).
>
> I think the southbridge drivers should go notifier block, but the port
> 0x61 code is architectural and should be kept in default_do_nmi().

Is port 0x61 architectural? I thought it a southbridge thing. In fact I
thought with modern chipsets you can access the same thing through port
0x70 or 0x71 (I can't seem to figure out which Intel doc I saw that in).
(Not that this conversation has any bearing on your patchset, just an idea
I had).

Cheers,
Don
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/