Re: USB-C DisplayPort display failing to stay active with Intel Barlow Ridge USB4 controller, power-management related issue?

From: Mika Westerberg
Date: Fri Oct 11 2024 - 12:38:25 EST


Hi,

On Thu, Oct 10, 2024 at 11:26:56PM -0500, Aaron Rainbolt wrote:
> > Can you share full dmesg with the repro and "thunderbolt.dyndbg=+p" in
> > the kernel command line?
>
> The full log is very long, so I've included it as an email attachment.
> The exact steps taken after booting with the requested kernel parameter
> were:
>
> 1. boot with thunderbolt.dyndbg=+p kernel param, no USB-C plugged in.
> 2. After login, hot-plug two USB-C cables. This time, the displays came
> up and stayed resident (this happens sometimes)
> 3. Unplugged both cables.
> 4. Replugged both. This time, the displays did not show anything.
> 5. lspci -k "jiggled" the displays and they came back on.
> 6. After ~15s, the displays blacked out again.
> 7. Save to the demsg file after about 30s.
>
> The laptop's firmware is fully up-to-date. One of the fixes we tried
> was installing Windows 11, updating the firmware, and then
> re-installing Kubuntu 24.04. This had no effect on the issue.
>
> Notes:
>
> * Kernel 6.1 does not exhibit this time out. 6.5 and later do.
> * Windows 11 had very similar behavior before installing Windows
> updates. After update, it was fixed.
> * All distros and W11 were tested on the same hardware with the latest
> firmware, so we know this is not a hardware failure.

Thanks for the logs and steps!

I now realize that

a75e0684efe5 ("thunderbolt: Keep the domain powered when USB4 port is in redrive mode")

was half-baked. Yes it deals with the situation where plugging in
monitor when the domain is powered. However, it completely misses these
cases:

* Plug in monitor to the Type-C port when the controller is runtime
suspended.
* Boot with monitor plugged in to the Type-C port.

At the end of this email there is a hack patch that tries to solve this.
Can you try it out? I will be on vacation next week but I'm copying my
colleague Gil who is familiar with this too. He should be able to help
you out during my absense.

Couple of notes about the dmesg you shared. They don't affect this issue
but may cause other issues:

> [ 1.382718] thunderbolt 0000:06:00.0: device links to tunneled native ports are missing!

This is means the BIOS does not implement the USB4 power contract which
means that USB 3.x and PCIe tunnels will not work as expected after
power transition.

> [ 1.416488] thunderbolt 0000:06:00.0: 0: NVM version 14.86

This is really old firmware version. My development system for example
has 56.x so yours might have a bunch of issues that are solved in the
later versions.

The hack patch below:

diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
index 07a66594e904..0e424b7661be 100644
--- a/drivers/thunderbolt/tb.c
+++ b/drivers/thunderbolt/tb.c
@@ -2113,6 +2113,37 @@ static void tb_exit_redrive(struct tb_port *port)
}
}

+static void tb_switch_enter_redrive(struct tb_switch *sw)
+{
+ struct tb_port *port;
+
+ tb_switch_for_each_port(sw, port)
+ tb_enter_redrive(port);
+}
+
+/*
+ * Called during system and runtime suspend to forcefully exit redrive
+ * mode without querying whether the resource is available.
+ */
+static void tb_switch_exit_redrive(struct tb_switch *sw)
+{
+ struct tb_port *port;
+
+ if (!(sw->quirks & QUIRK_KEEP_POWER_IN_DP_REDRIVE))
+ return;
+
+ tb_switch_for_each_port(sw, port) {
+ if (!tb_port_is_dpin(port))
+ continue;
+
+ if (port->redrive) {
+ port->redrive = false;
+ pm_runtime_put(&sw->dev);
+ tb_port_dbg(port, "exit redrive mode\n");
+ }
+ }
+}
+
static void tb_dp_resource_unavailable(struct tb *tb, struct tb_port *port,
const char *reason)
{
@@ -2987,6 +3018,7 @@ static int tb_start(struct tb *tb, bool reset)
tb_create_usb3_tunnels(tb->root_switch);
/* Add DP IN resources for the root switch */
tb_add_dp_resources(tb->root_switch);
+ tb_switch_enter_redrive(tb->root_switch);
/* Make the discovered switches available to the userspace */
device_for_each_child(&tb->root_switch->dev, NULL,
tb_scan_finalize_switch);
@@ -3002,6 +3034,7 @@ static int tb_suspend_noirq(struct tb *tb)

tb_dbg(tb, "suspending...\n");
tb_disconnect_and_release_dp(tb);
+ tb_switch_exit_redrive(tb->root_switch);
tb_switch_suspend(tb->root_switch, false);
tcm->hotplug_active = false; /* signal tb_handle_hotplug to quit */
tb_dbg(tb, "suspend finished\n");
@@ -3094,6 +3127,7 @@ static int tb_resume_noirq(struct tb *tb)
tb_dbg(tb, "tunnels restarted, sleeping for 100ms\n");
msleep(100);
}
+ tb_switch_enter_redrive(tb->root_switch);
/* Allow tb_handle_hotplug to progress events */
tcm->hotplug_active = true;
tb_dbg(tb, "resume finished\n");
@@ -3157,6 +3191,8 @@ static int tb_runtime_suspend(struct tb *tb)
struct tb_cm *tcm = tb_priv(tb);

mutex_lock(&tb->lock);
+ tb_disconnect_and_release_dp(tb);
+ tb_switch_exit_redrive(tb->root_switch);
tb_switch_suspend(tb->root_switch, true);
tcm->hotplug_active = false;
mutex_unlock(&tb->lock);
@@ -3188,6 +3224,7 @@ static int tb_runtime_resume(struct tb *tb)
tb_restore_children(tb->root_switch);
list_for_each_entry_safe(tunnel, n, &tcm->tunnel_list, list)
tb_tunnel_activate(tunnel);
+ tb_switch_enter_redrive(tb->root_switch);
tcm->hotplug_active = true;
mutex_unlock(&tb->lock);