RE: HW power fault defect cause system hang on kernel 5.4.y
From: Bao, Joseph
Date: Mon Nov 15 2021 - 21:48:37 EST
I've tested this tentative patch, and it fixes the issue.
And current kernel behavior of only changing the LEDs may come from spec sec 6.7.1.1, which indicates that the adapter should not be removed when encountering a power fault event.
Based on the description of sec 6.7.1.8, I think it's the power controller's responsibility that power off the slot -> wait until the power fault status cleared (latched by HW)-> power on the slot again, not the kernel's responsibility.
If the kernel reports a power fault log msg, there must be something wrong with the HW, #1 (current case) the power controller circuit has some defect, falsely trigger power fault even the slot has correct voltage. #2 (seen in our factory) The slot has some defect, pulling down the correct voltage, so trigger power fault.
In my opinion, only the infinite loop bug fix is necessary for the kernel, neither the cases I listed above should cause a system hang, then we could get power fault log msg/indicators and know something wrong with our HW.
Otherwise, if exists the case that power fault is not HW-related, then maybe the kernel should interpret power fault as a surprising removal or do some power cycle retry to recovery, but I do not know such a case yet...
Regards
Joseph
-----Original Message-----
From: Lukas Wunner <lukas@xxxxxxxxx>
Sent: Tuesday, November 16, 2021 3:27 AM
To: Bao, Joseph <joseph.bao@xxxxxxxxx>
Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Stuart Hayes <stuart.w.hayes@xxxxxxxxx>; kw@xxxxxxxxx
Subject: Re: HW power fault defect cause system hang on kernel 5.4.y
On Tue, Nov 02, 2021 at 03:45:00AM +0000, Bao, Joseph wrote:
> Recently we encounter system hang (dead spinlock) when move to kernel
> linux-5.4.y.
>
> Finally, we use bisect to locate the suspicious commit https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.4.y&id=4667358dab9cc07da044d5bc087065545b1000df.
>
> Our system has some HW defect, which will wrongly set
> PCI_EXP_SLTSTA_PFD high, and this commit will lead to infinite loop
> jumping to read_status (no chance to clear status PCI_EXP_SLTSTA_PFD
> bit since ctrl is not updated), I know this is our HW defect, but this
> commit makes kernel trapped in this isr function and leads to kernel
> hang (then the user could not get useful information to show what's
> wrong), which I think is not expected behavior, so I would like to report to you for discussion.
Thanks a lot for the report and apologies for the breakage and the delay.
Below please find a tentative fix. Could you test whether it fixes the issue?
I don't think this is a hardware defect. If I'm reading the spec right (PCIe r5.0, sec. 6.7.1.8), the PFD bit is meant to remain set and cannot be cleared until the kernel disables slot power.
When a power fault happens, we currently only change the LEDs (Power Indicator Off, Attention Indicator On) and emit a log message.
We otherwise leave the slot as is, even though I'd assume that the PCI device in the slot is no longer accessible.
I'm wondering whether we should interpret a power fault as surprise removal. Alternatively, we could attempt recovery, i.e. turn slot power off and back on. Similar to what we're doing when an Uncorrectable Error occurs. Do you have an opinion on that? What would be the desired behavior for your users?
Thanks,
Lukas
-- >8 --
Subject: [PATCH] PCI: pciehp: Fix infinite loop in IRQ handler upon power fault
The Power Fault Detected bit in the Slot Status register differs from all other hotplug events in that it is sticky: It can only be cleared after turning off slot power. Per PCIe r5.0, sec. 6.7.1.8:
If a power controller detects a main power fault on the hot-plug slot,
it must automatically set its internal main power fault latch [...].
The main power fault latch is cleared when software turns off power to
the hot-plug slot.
The stickiness used to cause interrupt storms and infinite loops which were fixed in 2009 by commits 5651c48cfafe ("PCI pciehp: fix power fault interrupt storm problem") and 99f0169c17f3 ("PCI: pciehp: enable software notification on empty slots").
Unfortunately in 2020 the infinite loop issue was inadvertently reintroduced by commit 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt
race"): The hardirq handler pciehp_isr() clears the PFD bit until pciehp's power_fault_detected flag is set. That happens in the IRQ thread pciehp_ist(), which never learns of the event because the hardirq handler is stuck in an infinite loop. Fix by setting the power_fault_detected flag already in the hardirq handler.
Fixes: 8edf5332c393 ("PCI: pciehp: Fix MSI interrupt race")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=214989
Link: https://lore.kernel.org/linux-pci/DM8PR11MB5702255A6A92F735D90A4446868B9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: Joseph Bao <joseph.bao@xxxxxxxxx>
Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v4.19+
Cc: Stuart Hayes <stuart.w.hayes@xxxxxxxxx>
---
drivers/pci/hotplug/pciehp_hpc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/pci/hotplug/pciehp_hpc.c b/drivers/pci/hotplug/pciehp_hpc.c
index 6ac5ea5..fac6b8e 100644
--- a/drivers/pci/hotplug/pciehp_hpc.c
+++ b/drivers/pci/hotplug/pciehp_hpc.c
@@ -640,6 +640,8 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
*/
if (ctrl->power_fault_detected)
status &= ~PCI_EXP_SLTSTA_PFD;
+ else if (status & PCI_EXP_SLTSTA_PFD)
+ ctrl->power_fault_detected = true;
events |= status;
if (!events) {
@@ -649,7 +651,7 @@ static irqreturn_t pciehp_isr(int irq, void *dev_id)
}
if (status) {
- pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, events);
+ pcie_capability_write_word(pdev, PCI_EXP_SLTSTA, status);
/*
* In MSI mode, all event bits must be zero before the port @@ -723,8 +725,7 @@ static irqreturn_t pciehp_ist(int irq, void *dev_id)
}
/* Check Power Fault Detected */
- if ((events & PCI_EXP_SLTSTA_PFD) && !ctrl->power_fault_detected) {
- ctrl->power_fault_detected = 1;
+ if (events & PCI_EXP_SLTSTA_PFD) {
ctrl_err(ctrl, "Slot(%s): Power fault\n", slot_name(ctrl));
pciehp_set_indicators(ctrl, PCI_EXP_SLTCTL_PWR_IND_OFF,
PCI_EXP_SLTCTL_ATTN_IND_ON);
--
2.33.0