[PATCH 0/5] s390/pci: automatic error recovery
From: Niklas Schnelle
Date: Mon Sep 06 2021 - 05:49:45 EST
Hello,
This series implements automatic error recovery for PCI devices on s390
following the scheme outlined at Documentation/PCI/pci-error-recovery.rst
it applies on top of currenct master.
The patches have are almost completely s390 specific except for two patches
exporting existing functionality for use by arch/s390/pci/ code. Nevertheless
I would also appreciate any feedback, especially on the last patch, concerning
the implementation of the error recovery flow. I believe we might be the first
implementation of PCI device recovery in a virtualized setting requiring us to
coordinate the device reset with the hypervisor platform by issuing a disable
and re-enable to the platform as well as starting the recovery following
a platform event.
The outline of the patches is as follows:
Patch 1 and 2 add s390 specific code implementing a reset mechanism that
takes the PCI function out of the platform specific error state.
Patches 3 and 4 export existing common code functions for use by the s390
specific recovery code.
Patch 3 I already sent separately resulting in the discussion below but without
a final conclusion.
https://lore.kernel.org/lkml/20210720150145.640727-1-schnelle@xxxxxxxxxxxxx/
I believe even though there were some doubts about the use of
pci_dev_is_added() by arch code the existing uses as well as the use in the
final patch of this series warrant this export.
Patch 4 "PCI: Export pci_dev_lock()" is basically an extension to commit
e3a9b1212b9d ("PCI: Export pci_dev_trylock() and pci_dev_unlock()") which
already exported pci_dev_trylock(). In the final patch we make use of
pci_dev_lock() to wait for any other exclusive uses of the pdev to be finished
before starting recovery.
Finally Patch 5 implements the recovery flow as part of the existing s390
specific PCI availability and error event mechanism. Where previously the error
case only set an error indication requiring manual intervention to make the
device usable again. Now we handle the case where firmware has already reset
a PCI function after an error was encountered informing the OS that it should
be ready to be used again. Note that the same event is also issued by the
hypervisor if the function was manually taken into a service mode for example
for firmware upgrade via the hypervisor and is now ready to be used again.
Thanks,
Niklas Schnelle
Niklas Schnelle (5):
s390/pci: refresh function handle in iomap
s390/pci: implement reset_slot for hotplug slot
PCI: Move pci_dev_is/assign_added() to pci.h
PCI: Export pci_dev_lock()
s390/pci: implement minimal PCI error recovery
arch/powerpc/platforms/powernv/pci-sriov.c | 3 -
arch/powerpc/platforms/pseries/setup.c | 1 -
arch/s390/include/asm/pci.h | 6 +-
arch/s390/pci/pci.c | 143 ++++++++++++++-
arch/s390/pci/pci_event.c | 196 ++++++++++++++++++++-
arch/s390/pci/pci_insn.c | 4 +-
arch/s390/pci/pci_irq.c | 9 +
arch/s390/pci/pci_sysfs.c | 2 -
drivers/pci/hotplug/acpiphp_glue.c | 1 -
drivers/pci/hotplug/s390_pci_hpc.c | 24 +++
drivers/pci/pci.c | 3 +-
drivers/pci/pci.h | 15 --
include/linux/pci.h | 16 ++
13 files changed, 389 insertions(+), 34 deletions(-)
--
2.25.1