"
On Thu, Sep 8, 2022 at 12:30 AM Limonciello, Mario
<Mario.Limonciello@xxxxxxx> wrote:
[Public]
Hi,
-----Original Message-----
From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
Sent: Monday, September 5, 2022 02:30
To: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
Cc: mika.westerberg@xxxxxxxxxxxxxxx; andreas.noever@xxxxxxxxx;
michael.jamet@xxxxxxxxx; YehezkelShB@xxxxxxxxx; Mehta, Sanju
<Sanju.Mehta@xxxxxxx>; Limonciello, Mario
<Mario.Limonciello@xxxxxxx>; linux-usb@xxxxxxxxxxxxxxx; linux-
kernel@xxxxxxxxxxxxxxx
Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found
on AMD USB4 controller
On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote:
AMD USB4 can not detect external PCIe devices like external NVMe when
it's hotplugged, because card/link are not up:
pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101
That sounds like a hardware bug, how does this work in other operating
systems for this hardware?
We happen to have this HP system in our lab. My colleague Anson (now on CC) flashed
the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built
from the Canonical mainline kernel PPA.
He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue.
I attached his log to the kernel Bugzilla.
Nice to hear. Hopefully this can be fixed at firmware/hardware side.
Use `lspci` to resume pciehp bridges can find external devices.
That's not good :(
A long delay before checking card/link presence doesn't help, either.
The only way to make the hotplug work is to enable pciehp interrupt and
check card presence after the TB switch is added.
Since the topology of USB4 and its PCIe bridges are siblings, hardcode
the bridge ID so TBT driver can wake them up to check presence.
As I mention below, this is not an acceptable solution.
AMD developers, any ideas on how to get this fixed in the TB controller
firware instead?
Anson also double checked on the AMD reference hardware that the HP system is built
against and couldn't reproduce it there either.
KH, I've got a few questions/comments to try to better explain why we're here.
1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog.
Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the
upgrade that didn't get set/cleared properly from an earlier pre-release BIOS?
We used UEFI capsule to update the firmware, via fwupd.
2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM?
No, nothing in BIOS was changed. >
3) If you explicitly reset to HP's "default BIOS settings" does it resolve?
Doesn't help. I put the device to ACPI G3 and it doesn't help, either.
4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to
add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings
combination, we might need to undo it by the Linux CM.
All ports say "Hotplug disabled: 0".
dmesg attached to the bugzilla.
5) Are you changing any of the default runtime PM policies for any of the USB4 routers or
root ports used for tunneling using software like TLP?
No. And they should be suspended by default.
Kai-Heng
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz
Bugzilla:
illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm
ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d
d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU
nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI
6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK
0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0
Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>struct pci_device_id *id)
---
drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++
drivers/thunderbolt/switch.c | 6 ++++++
drivers/thunderbolt/tb.c | 1 +
drivers/thunderbolt/tb.h | 5 +++++
include/linux/thunderbolt.h | 1 +
5 files changed, 42 insertions(+)
diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index cb8c9c4ae93a2..75f5ce5e22978 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const
{struct pci_device_id *id)
struct tb_nhi *nhi;
struct tb *tb;
+ struct pci_dev *p = NULL;
+ struct tb_pci_bridge *pci_bridge, *n;
int res;
if (!nhi_imr_valid(pdev)) {
@@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const
nhi_shutdown(nhi);p))) {
return res;
}
+
+ if (pdev->vendor == PCI_VENDOR_ID_AMD) {
+ while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd,
+ pci_bridge = kmalloc(sizeof(struct tb_pci_bridge),GFP_KERNEL);
+ if (!pci_bridge)
+ goto cleanup;
+
+ pci_bridge->bridge = p;
+ INIT_LIST_HEAD(&pci_bridge->list);
+ list_add(&pci_bridge->list, &tb->bridge_list);
+ }
+ }
You can't walk the device tree and create a "shadow" list of devices
like this and expect any lifetime rules to work properly with them at
all.
Please do not do this.
greg k-h