Re: [PATCH] thunderbolt: Defer DP tunnel teardown until display driver is ready

From: An Wu

Date: Wed May 27 2026 - 21:04:03 EST


Hi Mika,

Thank you for the feedback.

Sorry for the mess, and I understand the concern that the Thunderbolt
CM core should not call PCI-specific functions, especially since the
direction is to support non-PCIe hosts as well.

Putting graphics drivers into the initramfs does not look practical
for us, because we may need to include many possible graphics drivers
and dependencies, which would increase the initramfs size and
complexity. Moving Thunderbolt out of the initramfs may also cause
regressions for users relying on Thunderbolt docks early in boot, such
as keyboards in the recovery/LUKS shell or network devices for
early/rootfs use cases.

The problem I am trying to solve is that graphics driver readiness can
affect Thunderbolt DP tunneling, but the graphics and Thunderbolt
drivers currently run independently without any coordination. As a
result, Thunderbolt may treat a temporary graphics-side readiness
issue as a permanent DP tunnel failure.

So the goal is not to make Thunderbolt depend on PCI, but to find an
acceptable way for these components to coordinate, or for Thunderbolt
to retry/check readiness in a more generic way without adding
PCI-specific logic into the CM core.

Could you please give us guidance on what direction would be
acceptable upstream?

BR
An

On Wed, May 27, 2026 at 3:14 PM Mika Westerberg
<mika.westerberg@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> On Wed, May 27, 2026 at 02:41:21PM +0800, ChunAn Wu wrote:
> > When the Thunderbolt driver loads early (e.g., from initramfs)
> > and discovers a BIOS-established DisplayPort tunnel, it starts
> > asynchronous DPRX polling which checks if the GPU driver has
> > read DPCD from the connected monitor within a 12-second timeout
> > (TB_DPRX_TIMEOUT).
> >
> > On systems with Full Disk Encryption (FDE/LUKS), the GPU driver
> > (i915, xe, amdgpu, etc.) resides on the encrypted root filesystem
> > and cannot load until the user enters the passphrase. This creates
> > a driver load ordering issue where the DPRX timeout fires before
> > the GPU driver has had a chance to initialize, causing the
> > Thunderbolt driver to permanently tear down the DP tunnel and
> > remove the DP IN adapter from available resources. Recovery
> > requires a physical re-plug of the dock.
> >
> > Fix this by deferring the DP tunnel teardown when no PCI display
> > driver has bound yet. Register a PCI bus notifier that watches
> > for display class (PCI_BASE_CLASS_DISPLAY) driver bind events.
> > When the DPRX timeout fires:
> >
> > - If no display driver is bound: tear down the tunnel but keep
> > the DP IN adapter in the available resources list, allowing
> > a retry.
> > - If a display driver is already bound: proceed with the
> > existing behavior of permanently removing the DP IN resource.
> >
> > When a display driver eventually binds, the notifier triggers a
> > DP tunnel retry via a scheduled work item, re-establishing the
> > connection.
> >
> > This approach requires no changes to GPU drivers and handles all
> > GPU vendors (Intel, AMD, NVIDIA) through the generic PCI base
> > class check (0x03xx covers VGA, XGA, 3D, and other display
> > controllers). It also handles the FDE case gracefully since the
> > defer and retry can span an unbounded passphrase wait.
> >
> > Tested on Dell Pro Max 14 MC14250 with Dell SD25TB5 Thunderbolt
> > 5 Dock and LUKS full disk encryption. Simulated a 58-second
> > delay between TB and GPU driver loading -- display came up
> > successfully after display driver bound.
> >
> > Signed-off-by: ChunAn Wu <an.wu@xxxxxxxxxxxxx>
> > ---
> > drivers/thunderbolt/tb.c | 96 ++++++++++++++++++++++++++++++++++++----
> > 1 file changed, 88 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
> > index 95d84612e06e..48e0b540fbec 100644
> > --- a/drivers/thunderbolt/tb.c
> > +++ b/drivers/thunderbolt/tb.c
> > @@ -62,6 +62,9 @@ MODULE_PARM_DESC(asym_threshold,
> > * @remove_work: Work used to remove any unplugged routers after
> > * runtime resume
> > * @groups: Bandwidth groups used in this domain.
> > + * @pci_nb: PCI bus notifier to detect when a display driver binds
> > + * @display_bound: Set when a PCI display driver has bound
> > + * @display_retry_work: Work to retry DP tunneling after display driver binds
> > */
> > struct tb_cm {
> > struct list_head tunnel_list;
> > @@ -69,6 +72,9 @@ struct tb_cm {
> > bool hotplug_active;
> > struct delayed_work remove_work;
> > struct tb_bandwidth_group groups[MAX_GROUPS];
> > + struct notifier_block pci_nb;
> > + bool display_bound;
> > + struct work_struct display_retry_work;
> > };
> >
> > static inline struct tb *tcm_to_tb(struct tb_cm *tcm)
> > @@ -1914,6 +1920,58 @@ static struct tb_port *tb_find_dp_out(struct tb *tb, struct tb_port *in)
> > return NULL;
> > }
> >
> > +static void tb_tunnel_dp(struct tb *tb);
> > +
> > +/*
> > + * Check if any PCI display class (0x03xx) device has a driver bound.
> > + * Used to decide whether to defer DPRX polling at boot.
> > + */
> > +static bool tb_is_display_driver_bound(void)
> > +{
> > + struct pci_dev *pdev = NULL;
> > +
> > + while ((pdev = pci_get_base_class(PCI_BASE_CLASS_DISPLAY, pdev))) {
>
> There is no way we are going to call PCI functions from the core of the CM.
> We are actually going to the opposite direction to be able to support
> non-PCIe hosts.
>
> Why not put the TB driver as part of the encrypted volume as well if the
> graphics driver is there? Or put the graphics drivers part of the
> initramfs?