[PATCH 3.2 25/74] PCI/PM: Restore the status of PCI devices across hibernation

From: Ben Hutchings
Date: Mon Oct 09 2017 - 08:58:27 EST


3.2.94-rc1 review patch. If anyone has any objections, please let me know.

------------------

From: Chen Yu <yu.c.chen@xxxxxxxxx>

commit e60514bd4485c0c7c5a7cf779b200ce0b95c70d6 upstream.

Currently we saw a lot of "No irq handler" errors during hibernation, which
caused the system hang finally:

ata4.00: qc timeout (cmd 0xec)
ata4.00: failed to IDENTIFY (I/O error, err_mask=0x4)
ata4.00: revalidation failed (errno=-5)
ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
do_IRQ: 31.151 No irq handler for vector

According to above logs, there is an interrupt triggered and it is
dispatched to CPU31 with a vector number 151, but there is no handler for
it, thus this IRQ will not get acked and will cause an IRQ flood which
kills the system. To be more specific, the 31.151 is an interrupt from the
AHCI host controller.

After some investigation, the reason why this issue is triggered is because
the thaw_noirq() function does not restore the MSI/MSI-X settings across
hibernation.

The scenario is illustrated below:

1. Before hibernation, IRQ 34 is the handler for the AHCI device, which
is bound to CPU31.

2. Hibernation starts, the AHCI device is put into low power state.

3. All the nonboot CPUs are put offline, so IRQ 34 has to be migrated to
the last alive one - CPU0.

4. After the snapshot has been created, all the nonboot CPUs are brought
up again; IRQ 34 remains bound to CPU0.

5. AHCI devices are put into D0.

6. The snapshot is written to the disk.

The issue is triggered in step 6. The AHCI interrupt should be delivered
to CPU0, however it is delivered to the original CPU31 instead, which
causes the "No irq handler" issue.

Ying Huang has provided a clue that, in step 3 it is possible that writing
to the register might not take effect as the PCI devices have been
suspended.

In step 3, the IRQ 34 affinity should be modified from CPU31 to CPU0, but
in fact it is not. In __pci_write_msi_msg(), if the device is already in
low power state, the low level MSI message entry will not be updated but
cached. During the device restore process after a normal suspend/resume,
pci_restore_msi_state() writes the cached MSI back to the hardware.

But this is not the case for hibernation. pci_restore_msi_state() is not
currently called in pci_pm_thaw_noirq(), although pci_save_state() has
saved the necessary PCI cached information in pci_pm_freeze_noirq().

Restore the PCI status for the device during hibernation. Otherwise the
status might be lost across hibernation (for example, settings for MSI,
MSI-X, ATS, ACS, IOV, etc.), which might cause problems during hibernation.

Suggested-by: Ying Huang <ying.huang@xxxxxxxxx>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
Signed-off-by: Chen Yu <yu.c.chen@xxxxxxxxx>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
Cc: Len Brown <len.brown@xxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Rui Zhang <rui.zhang@xxxxxxxxx>
Cc: Ying Huang <ying.huang@xxxxxxxxx>
Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
---
drivers/pci/pci-driver.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -873,6 +873,7 @@ static int pci_pm_thaw_noirq(struct devi
return pci_legacy_resume_early(dev);

pci_update_current_state(pci_dev, PCI_D0);
+ pci_restore_state(pci_dev);

if (drv && drv->pm && drv->pm->thaw_noirq)
error = drv->pm->thaw_noirq(dev);