AW: [BUG] Thunderbolt runtime resume during PCIe removal causes IRQ warning and shutdown failure.

From: Georg Klima

Date: Fri Apr 10 2026 - 01:20:25 EST


The issue disappears after a BIOS update that changes the PCIe root port SlotCap from HotPlug+ to HotPlug-.
This strongly suggests that the bug is triggered by PCIe hotplug handling (pciehp) interacting with runtime PM and Thunderbolt.

Version: N4FET48W (1.29 )
Firmware Revision: 1.13
Release Date: 01/26/2026
was / is not available over fwupdmgr, sorry


SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-

80:1b.4 PCI bridge: Intel Corporation 800 Series PCH PCIe Root Port #21 (rev 10) (prog-if 00 [Normal decode])
Subsystem: Lenovo Device 2347
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 128
IOMMU group: 20
Bus: primary=80, secondary=88, subordinate=d8, sec-latency=0
I/O behind bridge: [disabled] [16-bit]
Memory behind bridge: b0000000-b7ffffff [size=128M] [32-bit]
Prefetchable memory behind bridge: 4000000000-4fffffffff [size=64G] [32-bit]
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Express (v2) Root Port (Slot+), IntMsgNum 0
DevCap: MaxPayload 128 bytes, PhantFunc 0
ExtTag- RBE+ TEE-IO-
DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
LnkCap: Port #21, Speed 16GT/s, Width x4, ASPM not supported
ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, LnkDisable- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt+ FltModeDis-
LnkSta: Speed 16GT/s, Width x4
TrErr- Train- SlotClk+ DLActive+ BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #25, PowerLimit 25W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
Changed: MRL- PresDet+ LinkState+
RootCap: CRSVisible-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR+
10BitTagComp+ 10BitTagReq- OBFF Via WAKE#, ExtFmt+ EETLPPrefix+, MaxEETLPPrefixes 2
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
AtomicOpsCap: Routing- 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
AtomicOpsCtl: ReqEn- EgressBlck-
IDOReq- IDOCompl- LTR+ EmergencyPowerReductionReq-
10BitTagReq- OBFF Disabled, EETLPPrefixBlk-
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported, FltMode-
Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee002b8 Data: 0000
Capabilities: [98] Subsystem: Lenovo Device 2347
Capabilities: [a0] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP-
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+
ECRC- UnsupReq- ACSViol- UncorrIntErr- BlockedTLP- AtomicOpBlocked- TLPBlockedErr-
PoisonTLPBlocked- DMWrReqBlocked- IDECheck- MisIDETLP- PCRC_CHECK- TLPXlatBlocked-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr- CorrIntErr- HeaderOF-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+ CorrIntErr- HeaderOF-
AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
RootCmd: CERptEn+ NFERptEn+ FERptEn+
RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
FirstFatal- NonFatalMsg- FatalMsg- IntMsgNum 0
ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
Capabilities: [220 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
ACSCtl: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
Capabilities: [a30 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [a90 v1] Data Link Feature <?>
Capabilities: [a9c v1] Physical Layer 16.0 GT/s
Phy16Sta: EquComplete+ EquPhase1+ EquPhase2+ EquPhase3+ LinkEquRequest-
Capabilities: [edc v1] Lane Margining at the Receiver
PortCap: Uses Driver-
PortSta: MargReady+ MargSoftReady-
Kernel driver in use: pcieport
Kernel modules: shpchp

________________________________________
Von: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Gesendet: Dienstag, 7. April 2026 07:41
An: Lukas Wunner <lukas@xxxxxxxxx>
Cc: Georg Klima <Georg.Klima@xxxxxxxxxxxxxxx>; linux-pci@xxxxxxxxxxxxxxx <linux-pci@xxxxxxxxxxxxxxx>; thunderbolt@xxxxxxxxxxxxxxx <thunderbolt@xxxxxxxxxxxxxxx>; linux-kernel@xxxxxxxxxxxxxxx <linux-kernel@xxxxxxxxxxxxxxx>; georg_klima@xxxxxx <georg_klima@xxxxxx>; Rene Sapiens <rene.sapiens@xxxxxxxxxxxxxxx>; Alan Borzeszkowski <alan.borzeszkowski@xxxxxxxxxxxxxxx>
Betreff: Re: [BUG] Thunderbolt runtime resume during PCIe removal causes IRQ warning and shutdown failure.

[Sie erhalten nicht häufig E-Mails von mika.westerberg@xxxxxxxxxxxxxxx. Weitere Informationen, warum dies wichtig ist, finden Sie unter https://aka.ms/LearnAboutSenderIdentification ]

Hi,

On Sun, Apr 05, 2026 at 10:59:20AM +0200, Lukas Wunner wrote:
> [cc += Mika, Rene, Alan; start of thread is here:
> https://lore.kernel.org/all/AM9PR10MB42316BF3E59B29E1EA3E5600B756A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> ]
>
> On Thu, Mar 26, 2026 at 04:09:05PM +0000, Georg Klima wrote:
> > I am reporting a reproducible shutdown issue involving Thunderbolt,
> > PCIe hotplug, and runtime PM on a Lenovo ThinkPad P16.
> > System fails to power off cleanly when PCIe ASPM is enabled.
> > After the kernel prints "Power off", it emits warnings and does not
> > complete shutdown.
>
> The dmesg output shows that the problems start much earlier than
> on shutdown: The discrete "Barlow Ridge" Thunderbolt controller
> is hot-removed at the 08:44:29 timestamp in a noisy fashion:
>
> > Mar 26 08:44:28 fedora kernel: usb 3-3: reset full-speed USB device number 2 using xhci_hcd
> > Mar 26 08:44:29 fedora kernel: pcieport 0000:80:1b.4: Data Link Layer Link Active not set in 100 msec
> > Mar 26 08:44:29 fedora kernel: pcieport 0000:80:1b.4: pciehp: Slot(25): Card not present
> > Mar 26 08:44:29 fedora kernel: ------------[ cut here ]------------
> > Mar 26 08:44:29 fedora kernel: thunderbolt 0000:8a:00.0: interrupt for TX ring 0 is already enabled
> > Mar 26 08:44:29 fedora kernel: xhci_hcd 0000:b1:00.0: Controller not ready at resume -19
> > Mar 26 08:44:29 fedora kernel: xhci_hcd 0000:b1:00.0: PCI post-resume error -19!
> > Mar 26 08:44:29 fedora kernel: xhci_hcd 0000:b1:00.0: HC died; cleaning up
> > Mar 26 08:44:29 fedora kernel: WARNING: drivers/thunderbolt/nhi.c:147 at ring_interrupt_active+0x246/0x2f0 [thunderbolt], CPU#3: kworker/u96:5/1092
>
> The controller is then re-discovered after the link goes back up.
> The actual shutdown doesn't seem to start until the 08:45:26 timestamp.
>
> Going forward please use "dmesg" to collect kernel output, not journalctl,
> so that we get timestamps with usec granularity.
>
> > * Hardware: Lenovo ThinkPad P16 (21RQ003BGE)
> > * BIOS: N4FET30W (1.11) 10/03/2025
> > * Kernel: 6.19.10-200.fc43.x86_64
> > * Distribution: Fedora 43
> > * Platform: Intel (Meteor Lake)
> > * Thunderbolt controller: 0000:8a:00.0
>
> It looks like this isn't Meteor Lake but Arrow Lake-S:
>
> 0000:80:1b.4 - Arrow Lake-S (800 Series) PCH Root Port #21
> 0000:88:00.0 - Barlow Ridge Upstream Port
> 0000:89:00.0 - Barlow Ridge Downstream Port to NHI
> 0000:8a:00.0 - Barlow Ridge NHI
>

Looking at the dmesg there is hotplug enabled for the PCIe root port:

Mar 26 09:44:00 fedora kernel: pcieport 0000:80:1b.4: pciehp: Slot #25 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+

For Barlow Ridge it should be disabled. Lenovo may already have a BIOS fix
please check. They have done that for other models too.