Re: [PATCH] igc: Add PCIe link recovery for I225/I226
From: Bjorn Helgaas
Date: Wed Feb 11 2026 - 13:29:35 EST
On Tue, Feb 10, 2026 at 08:34:02PM +0000, Harshank Matkar wrote:
> From: Harshank Matkar <harshankmatkar1304@xxxxxxxxxxx>
>
> When ASPM L0s transitions occur on Intel I225/I226 controllers,
> transient PCIe link instability can cause register read failures
> (0xFFFFFFFF responses).
At the PCIe level, the failure is some uncorrectable PCIe error like a
Completion Timeout or Unsupported Request. The 0xFFFFFFFF response is
implementation-specific behavior determined by the Root Complex
design.
> Implement a multi-layer recovery strategy:
> 1. Immediate retries: 3 attempts with 100-200μs delays
> 2. Link retraining: Trigger PCIe link retraining via capabilities
> 3. Device detachment: Only as last resort after max attempts
>
> The recovery mechanism includes rate limiting, maximum attempt
> tracking, and device presence validation to prevent false detaches
> on transient ASPM glitches while maintaining safety through
> bounded retry limits.
I assume the glitch is a hardware erratum and should be documented as
such by Intel, although it's possible ASPM L0s isn't configured
correctly.
If it's a hardware erratum, I think you should use a quirk to disable
L0s on these devices, e.g., pci_disable_link_state(pdev,
PCIE_LINK_STATE_L0S). Even if this patch allows recovery, the PCIe
errors will be logged and reported via AER, which will be confusing to
users.
Bjorn