Re: [PATCH v2 0/2] thunderbolt: Fix S4 resume incongruities

From: Mario Limonciello (AMD) (kernel.org)

Date: Tue Jan 13 2026 - 13:44:20 EST

On 1/9/2026 6:42 PM, Katiyar, Pooja wrote:

Hi,

On Thu, Jan 8, 2026 at 11:23:18PM -0800, Mika Westerberg wrote:

On Thu, Jan 08, 2026 at 01:18:58PM -0600, Mario Limonciello wrote:

On 1/8/26 5:42 AM, Mika Westerberg wrote:

Let me just share the whole log so you can see the full context.

https://gist.github.com/superm1/6798fff44d0875b4ed0fe43d0794f81e

Thanks!

[Side note, you seem to have the link trained at Gen2 (20G) instead of Gen3
(40G).]

Looking at the dmesg I recalled that there is an internal report about
similar issue by Pooja and Rene (Cc'd) and it all boils down to this log
entry:

[ 489.339148] thunderbolt 0000:c6:00.6: 2:13: could not allocate DP tunnel

They made a hack patch that works it around, see below. I wonder if you
could try that too? If that's the issue (not releasing HopIDs) then we need
to figure a way to fix it properly. One suggestion is to release DP
resources earlier, and of course doing full reset as done here. I would
prefer "smallest" possible change.

@Pooja, any updates on your side to this?

Looking at the log "could not allocate DP tunnel", this appears to be
similar to kref synchronization issue during S4 resume that we are
facing. The problem we have identified is during S4 entry, hibernation
image is created first, and then the DP tunnels are freed. This means
the hibernation image still contains the tunnels in their active state.
However, when resuming from S4, the tunnels are restored from the
hibernation image (as active) and then the resume flow reactivates
them again, causing kref count mismatch. This leads to HopID allocation
conflicts and the "could not allocate DP tunnel" error on next
connect/tunnel activation.

The hack patch works around this by preventing double activation via
dprx_started flag. You could try this hack to confirm if it's the same
issue we're dealing with.

For a proper fix, we are working on a patch which releases the DP resources
before saving the hibernation image and creates them again during resume,
managing the resources properly. The patch is currently under review and
testing and will send shortly.

I have confirmation the hack patch does help the issue for us too.

If your patch doesn't work another logical solution could be to destroy all the tunnels as part of the PM freeze callback (not just the DP resources). Maybe even unify the suspend and freeze codepaths for more opportunities for code reuse?

diff --git a/drivers/thunderbolt/tunnel.c b/drivers/thunderbolt/tunnel.c
index 28c1e5c062f3..45f7ee940f10 100644
--- a/drivers/thunderbolt/tunnel.c
+++ b/drivers/thunderbolt/tunnel.c
@@ -1084,6 +1084,9 @@ static void tb_dp_dprx_work(struct work_struct *work)
static int tb_dp_dprx_start(struct tb_tunnel *tunnel)
{
+ if (tunnel->dprx_started)
+ return 0;
+
/*
* Bump up the reference to keep the tunnel around. It will be
* dropped in tb_dp_dprx_stop() once the tunnel is deactivated.

Thanks,
Pooja