Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Jesse Barnes
Date: Tue Mar 01 2005 - 12:13:56 EST


On Tuesday, March 1, 2005 8:59 am, Matthew Wilcox wrote:
> The MCA handler has to go and figure out what the hell just happened
> (was it a DIMM error, PCI bus error, etc). OK, fine, it finds that it
> was an error on PCI bus 73. At this point, I think the architecture
> error handler needs to call into the PCI subsystem and say "Hey, there
> was an error, you deal with it".
>
> If we're lucky, we get all the information that allows us to figure
> out which device it was (eg a destination address that matches a BAR),
> then we could have a ->error method in the pci_driver that handles it.
> If there's no ->error method, at leat call ->remove so one device only
> takes itself down.
>
> Does this make sense?

This was my thought too last time we had this discussion. A completely
asynchronous call is probably needed in addition to Hidetoshi's proposed API,
since as you point out, the driver may not be running when an error occurs
(e.g. in the case of a DMA error or more general bus problem). The async
->error callback could do a total reset of the card, or something along those
lines as Jeff suggests, while the inline ioerr_clear/ioerr_check API could
potentially deal with errors as they happen (probably in the case of PIO
related errors), when the additional context may allow us to be smarter about
recovery.

Jesse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/