Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities
From: Mario Limonciello
Date: Fri Jan 09 2026 - 10:43:09 EST
On 1/9/26 1:23 AM, Mika Westerberg wrote:
On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:
On 1/8/26 5:42 AM, Mika Westerberg wrote:
On Wed, Jan 07, 2026 at 02:50:54PM -0600, Mario Limonciello wrote:
On 1/7/26 3:33 AM, Mika Westerberg wrote:
Hi,
On Mon, Jan 05, 2026 at 11:37:47PM -0600, Mario Limonciello (AMD) wrote:
When a machine is restored from S4 if the firmware CM has created
tunnels there can be an incongruity of expectation from the kernel
when compared to booting from S5. This series addresses those.
I suspect there is no Firmware CM in AMD platforms so this actually means
the BIOS CM, correct?
That's correct.
However, on S4 we actually do reset host router when the "boot kernel" is
started before loading and jumping to the hibernation image.
That's only if thunderbolt.ko is built into the kernel or is included in the
initramfs before it does the pivot to the hibernation image.
Ah good point.
At least in the tests we were doing it's not part of the boot kernel.
It might be
that this boot kernel tunnel configuration is causing the issues you are
seeing (can you elaborate on those?)
The issues manifest "downstream" in the GPU driver. There are a bunch of
aux failures and a non functional display. Tracing it back the GPU driver
isn't alive at the time that the tunnels are attempted to be reconstructed
at the moment and so CM tears DP tunnel down and then when GPU driver does
come up it is not functional.
DP tunnel constructed at:
[ 486.007194] thunderbolt 0000:c6:00.6: AUX RX path activation complete
First DPRx timeout at:
[ 486.135483] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): DPRX read
timeout
DP tunnel deactivating at:
[ 486.331856] thunderbolt 0000:c6:00.6: 0:6 <-> 2:13 (DP): deactivating
Hmm, we have dprx_timeout by default 12 seconds. How come it tears down the
tunnel already?
*I believe* it's because of a hot unplug event that occurs from it not
working.
First DPRx DPCD reading starts at:
[ 486.351765] amdgpu 0000:c4:00.0: amdgpu: [drm] DPIA AUX failed on
0xf0000(10), error 7
This would have maked it within the 12s if I read the timestamps right.
Let me just share the whole log so you can see the full context.
https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e
Thanks!
[Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
(40G).]
Looking at the dmesg I recalled that there is an internal report about
similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
entry:
[ 489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel
They made a hack patch that works it around, see below. I wonder if you
could try that too? If that's the issue (not releasing HopIDs) then we need
to figure a way to fix it properly.
Thanks! I shared it with our internal team that reproduced it, will come back next week with their results.
One suggestion is to release DP
resources earlier, and of course doing full reset as done here. I would
prefer "smallest" possible change.
Well FWIW the v1 of my patch for the reset was a lot smaller :P
https://lore.kernel.org/linux-usb/20251023050354.115015-1-superm1@xxxxxxxxxx/#t
@Pooja, any updates on your side to this?
diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index 28c1e5c062f3..45f7ee940f10 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
{
+ if (tunnel->dprx_started)
+ return 0;
+
/*
* Bump up the reference to keep the tunnel around. It will be
* dropped in tb_dp_dprx_stop() once the tunnel is deactivated.