Re: [RFC 5/6] x86, NMI, Add support to notify hardware error withunknown NMI

From: Don Zickus
Date: Tue Sep 14 2010 - 09:45:43 EST


On Tue, Sep 14, 2010 at 02:21:31PM +0200, Ingo Molnar wrote:
>
> * Don Zickus <dzickus@xxxxxxxxxx> wrote:
>
> > > > > At least on PCI-E it may be enough to simply dump all recent AER
> > > > > data.
> > > >
> > > > This assumes AER is supported on the bridge? Which for newer
> > > > chips is probably true, but I wasn't sure about older ones.
> > >
> > > Today's servers should usually have AER at least.
> > >
> > > For old systems you only can get the few bits in PCI space.
> > >
> > > > How would I dump AER data from within the kernel?
> > >
> > > Would need a buffer that is dumped for past events and reading the
> > > registers for not yet reported. Right now some infrastructure is
> > > needed.
> >
> > Oh ok.
>
> The proper approach would be not to add hacks to the NMI code but to
> implement southbridge drivers - which would also have NMI callbacks.
> These are unchartered waters, but variance in that space is reducing
> systematically so it would be worth a shot.

Interesting. I think the only southbridge I see regularly are Intel, AMD
and Nvidia (with Nvidia being more problematic than others).
Unfortunately, getting specs for Nvidia is very difficult.

But that might help narrow down where the NMI problem is.

Cheers,
Don

>
> Thanks,
>
> Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/