Re: [PATCH/RFC] I/O-check interface for driver's error handling

From: Benjamin Herrenschmidt
Date: Tue Mar 01 2005 - 17:33:41 EST


On Tue, 2005-03-01 at 12:33 -0600, Linas Vepstas wrote:

> The current proposal (and prototype) has a "master recovery thread"
> to handle the coordinated reset of the pci controller. This master
> recovery thyread makes three calls in struct pci_driver:
>
> void (*frozen) (struct pci_dev *); /* called when dev is first frozen */
> void (*thawed) (struct pci_dev *); /* called after card is reset */
> void (*perm_failure) (struct pci_dev *); /* called if card is dead */

See my other emails. I think only one callback is enough, and I think we
need more parameters.

> The master recovery thread runs in the kernel. Earlier suggestions said
> "run it in user space, use pci hotplug, use udev, etc." However, if
> you get a pci error on a scsi card, you can't shell script
> "umount /dev/sdX; rmmod scsi; clear_pci_error; insmod scsi; mount /dev/sdX"
> beacuse you can't umount an open filesystem, and you can't really close
> it (I fiddled with prototyping some of this, but its ugly and painful
> and bizarre and outside my area of expertise :)
>
> FWIW, the current prototype tries to do a pci hotplug if the above
> routines aren't implemented in struct pci_driver. It can recover
> from pci errors on ethernet cards, and I have one scsi driver that
> successfully recovers with above API, and am working on adding recovery
> to the symbios driver.
>
> --linas
--
Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/