Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Matthew Wilcox
Date: Tue Mar 01 2005 - 12:00:50 EST


On Tue, Mar 01, 2005 at 08:49:45AM -0800, Linus Torvalds wrote:
> On Tue, 1 Mar 2005, Jeff Garzik wrote:
> > A new API handles none of this.
>
> Ehh?

I think what Jeff meant was "this new API handles none of this".
And that's true, it doesn't handle DMA errors. But I think that's just
something that hasn't been written/designed yet.

So how should we handle it? Obviously the driver may not be executing
when a PCI parity error occurs, so we probably get to find out about
this through some architecture-specific whole-system error, let's call
it an MCA.

The MCA handler has to go and figure out what the hell just happened
(was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
was an error on PCI bus 73. At this point, I think the architecture
error handler needs to call into the PCI subsystem and say "Hey, there
was an error, you deal with it".

If we're lucky, we get all the information that allows us to figure
out which device it was (eg a destination address that matches a BAR),
then we could have a ->error method in the pci_driver that handles it.
If there's no ->error method, at leat call ->remove so one device only
takes itself down.

Does this make sense?

--
"Next the statesmen will invent cheap lies, putting the blame upon
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince
himself that the war is just, and will thank God for the better sleep
he enjoys after this process of grotesque self-deception." -- Mark Twain
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/