Re: [PATCH] pci/pciehp: bail on bogus pcie reads from removed devices

From: Jarod Wilson
Date: Tue Aug 04 2015 - 13:25:26 EST

On 8/4/2015 12:56 PM, Bjorn Helgaas wrote:
On Tue, Aug 04, 2015 at 10:05:18AM -0400, Jarod Wilson wrote:
On Mon, Aug 3, 2015 at 12:14 AM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:

On Tue, Jul 21, 2015 at 12:25:30PM -0400, Jarod Wilson wrote:

Seems like a read of all 1's from a register of a device that has gone
away should be taken as a sign that the device has gone away.
Section 6.2.10 of the PCIE spec (v4.0, rev 0.3, Feb 19, 2014) suggests as
much with this snippet:

|Data Value of All 1âs
|Many platforms, including those supporting RP Extensions for DPC, can
|return a data value of all 1âs to software when an error is associated
|with a PCI Express Configuration, I/O, or Memory Read Request. During
|the Downstream Port discards Requests destined for the Link and
|them with an error (i.e., either with an Unsupported Request (UR) or
|Completer Abort (CA) Completion Status). By ending a series of MMIO or
|configuration space operations with a read to an address with a known
|data value not equal to all 1âs, software may determine if a Completer
|has been removed or DPC has been triggered.

I'm not sure the above is directly relevant to this case, but the same
principle (reading all 1's means the device is probably gone) seems to

This is based on part of a debugging patch Bjorn posted in the referenced
bugzilla, and its required to make the HP ZBook G2 I've got here not barf
when disconnecting a thunderbolt ethernet adapter and corrupt memory.

Suggested-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
CC: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
CC: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
CC: linux-pci@xxxxxxxxxxxxxxx
Signed-off-by: Jarod Wilson <jarod@xxxxxxxxxx>

Hi Jarod,

I think there are two issues here:

1) pciehp doesn't handle all 1's correctly
2) use-after-free of hotplug_slot

This patch is for the first issue. I think it's correct, but I still
have a question or two. I attached an updated version of the patch
and changelog.

Here's the path I think we're taking: 03:03.0 receives pciehp
interrupt for removal (link down and card not present), and we call
pciehp_unconfigure_device() for 05:00.0 and everything downstream from
it. Part of this is removing 06:00.0. I expected this would use this

pcie_port_remove_service # .remove method for 06:00.0
dev_printk("unloading service driver ...")
pciehp_remove # hpdriver_portdrv.remove
pcie_do_write_cmd(..., true) # wait
read PCI_EXP_SLTSTA # would get 0xffff
read PCI_EXPT_SLTCTL # would get 0xffff

so I added checks for ~0 data in pcie_poll_cmd() and

But the dmesg log shows that we were in pcie_isr(), and I don't
understand yet how we got there. Can you help figure that out? Maybe
put a dump_stack() in pcie_isr() or something?

[ 1949.102247] pciehp 0000:03:03.0:pcie24: pcie_isr: intr_loc 108
[ 1949.102252] pciehp 0000:03:03.0:pcie24: Presence/Notify input change
[ 1949.102256] pciehp 0000:03:03.0:pcie24: Card not present on Slot(3)
[ 1949.102262] pciehp 0000:03:03.0:pcie24: Data Link Layer State change
[ 1949.102266] pciehp 0000:03:03.0:pcie24: slot(3): Link Down event
[ 1949.102281] pciehp 0000:03:03.0:pcie24: Surprise Removal
[ 1949.102286] pciehp 0000:03:03.0:pcie24: Link Down event ignored on
slot(3): already powering off
[ 1949.102288] pciehp 0000:03:03.0:pcie24: Disabling
[ 1949.102290] pciehp 0000:03:03.0:pcie24: pciehp_unconfigure_device:
domain:bus:dev = 0000:05:00
[ 1950.321907] tg3 0000:07:00.0: tg3_abort_hw timed out, TX_MODE_ENABLE
will not clear MAC_TX_MODE=ffffffff
[ 1950.525986] [sched_delayed] sched: RT throttling activated
[ 1950.544164] pciehp 0000:06:00.0:pcie24: unloading service driver pciehp
[ 1950.544170] pciehp 0000:06:00.0:pcie24: release_slot: physical_slot = 9
[ 1950.545016] pciehp 0000:06:00.0:pcie24: Timeout on hotplug command
0x1038 (issued 19092 msec ago)
[ 1950.545020] pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response
from device
[ 1950.545021] pciehp 0000:06:00.0:pcie24: pcie_disable_notification:
SLOTCTRL d8 write cmd 0
[ 1950.545025] pciehp 0000:06:00.0:pcie24: Device has gone away
[ 1950.545027] CPU: 0 PID: 12361 Comm: kworker/0:2 Not tainted
3.10.0-302.el7.hp.x86_64 #1
[ 1950.545028] Hardware name: Hewlett-Packard HP ZBook 15 G2/2253, BIOS M70
Ver. 01.07 02/26/2015
[ 1950.545033] Workqueue: pciehp-3 pciehp_power_thread
[ 1950.545034] 0000000000000000 00000000f721dd13 ffff8804822ffa78
[ 1950.545036] ffff8804822ffac0 ffffffff8133bf64 ffff00000000002e
[ 1950.545038] ffff8804818fab00 ffff880468f70cc0 000000000000002e
[ 1950.545039] Call Trace:
[ 1950.545044] [<ffffffff81632729>] dump_stack+0x19/0x1b
[ 1950.545046] [<ffffffff8133bf64>] pcie_isr+0x264/0x280
[ 1950.545048] [<ffffffff8111b6b9>] __free_irq+0x189/0x220
[ 1950.545049] [<ffffffff8111b7e9>] free_irq+0x49/0xb0
[ 1950.545051] [<ffffffff8133d3b9>] pciehp_release_ctrl+0xb9/0xe0
[ 1950.545053] [<ffffffff81339db3>] pciehp_remove+0x23/0x30
[ 1950.545055] [<ffffffff8133442e>] pcie_port_remove_service+0x4e/0x60

__free_irq() calls the ISR one last time. It does make sense that the
driver must be prepared for the ISR to be called at any time before
free_irq() returns. I just didn't see a path for the already-removed
device to generate an actual interrupt.

Yup, I've got CONFIG_DEBUG_SHIRQ=y in my config. So I take it we're hitting the action->handler(irq, dev_id) bit in __free_irq(), after we've already done a bunch of teardown/removal?

Can you try the version I posted, with the additional tests in
pcie_poll_cmd() and pcie_do_write_cmd()? We should try to read from
the device there, even before we free the IRQ, so we might see several
messages. Maybe there's a way we can be smarter about bailing out

The above was with your additions munged in with the older patch, I actually do see "pcie_do_write_cmd: no response from device" just two lines ahead of each "Device has gone away" message from pcie_isr().

pciehp 0000:06:00.0:pcie24: pcie_do_write_cmd: no response from device
pciehp 0000:06:00.0:pcie24: pcie_disable_notification: SLOTCTRL d8 write cmd 0
pciehp 0000:06:00.0:pcie24: Device has gone away <- from pcie_isr()

Jarod Wilson
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at