Re: Apple Thunderbolt Display chaining
From: Mika Westerberg
Date: Thu Aug 11 2022 - 05:50:19 EST
Hi,
On Wed, Aug 10, 2022 at 03:40:08PM +0800, Brad Campbell wrote:
> G'day Mika,
>
> On 9/8/22 23:50, Mika Westerberg wrote:
> > Hi,
> >
> > On Tue, Aug 09, 2022 at 11:16:27PM +0800, Brad Campbell wrote:
> > > If I then reboot and load the driver it fails.
> > >
> > > The only thing I could think of doing was an lspci -vvv after the boot and module load
> > > and an lspci -vvv after a warm reboot and diff them, because there are changes around the
> > > thunderbolt bridge devices. I've done a diff -u50 to try and keep as much context as possible.
> > >
> > > On the first boot I can unload/reload the thunderbolt module repeatedly and there's no issue
> > > but loading it after a reboot locks up. There are no lspci changes on the first boot after the
> > > initial module load unless I rescan the PCI bus, but they're minor and it doesn't cause an issue
> > > with loading the thunderbolt module.
> > >
> > > The firmware *must* be doing something on reboot I suppose or the PCIe configs wouldn't change.
> >
> > Okay, let's try a bigger hammer and reset all the ports upon load. That
> > should hopefully clear out the "bad state" too. This is completely
> > untested but it should trigger reset and then re-initialize the TBT
> > links.
> >
> > diff --git a/drivers/thunderbolt/lc.c b/drivers/thunderbolt/lc.c
> > index 633970fbe9b0..c419c2568de4 100644
> > --- a/drivers/thunderbolt/lc.c
> > +++ b/drivers/thunderbolt/lc.c
> > @@ -6,6 +6,8 @@
> > * Author: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> > */
> > +#include <linux/delay.h>
> > +
> > #include "tb.h"
> > /**
> > @@ -327,6 +329,34 @@ void tb_lc_xhci_disconnect(struct tb_port *port)
> > tb_port_dbg(port, "xHCI disconnected\n");
> > }
> > +int tb_lc_reset_port(struct tb_port *port)
> > +{
> > + struct tb_switch *sw = port->sw;
> > + int cap, ret;
> > + u32 val;
> > +
> > + if (sw->generation != 3)
> > + return -EINVAL;
> > +
> > + cap = find_port_lc_cap(port);
> > + if (cap < 0)
> > + return cap;
> > +
> > + ret = tb_sw_read(sw, &val, TB_CFG_SWITCH, cap + TB_LC_PORT_MODE, 1);
> > + if (ret)
> > + return ret;
> > +
> > + val |= TB_LC_PORT_MODE_DPR;
> > + ret = tb_sw_write(sw, &val, TB_CFG_SWITCH, cap + TB_LC_PORT_MODE, 1);
> > + if (ret)
> > + return ret;
> > +
> > + msleep(20);
> > +
> > + val &= ~TB_LC_PORT_MODE_DPR;
> > + return tb_sw_write(sw, &val, TB_CFG_SWITCH, cap + TB_LC_PORT_MODE, 1);
> > +}
> > +
> > static int tb_lc_set_wake_one(struct tb_switch *sw, unsigned int offset,
> > unsigned int flags)
> > {
> > diff --git a/drivers/thunderbolt/switch.c b/drivers/thunderbolt/switch.c
> > index 0ae8a7ec7c9c..21ac3ccf1cf9 100644
> > --- a/drivers/thunderbolt/switch.c
> > +++ b/drivers/thunderbolt/switch.c
> > @@ -740,6 +740,11 @@ int tb_port_disable(struct tb_port *port)
> > return __tb_port_enable(port, false);
> > }
> > +int tb_port_reset(struct tb_port *port)
> > +{
> > + return tb_lc_reset_port(port);
> > +}
> > +
> > /*
> > * tb_init_port() - initialize a port
> > *
> > diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c
> > index 8030fc544c5e..48a7396994ef 100644
> > --- a/drivers/thunderbolt/tb.c
> > +++ b/drivers/thunderbolt/tb.c
> > @@ -1875,6 +1875,7 @@ static int tb_scan_finalize_switch(struct device *dev, void *data)
> > static int tb_start(struct tb *tb)
> > {
> > struct tb_cm *tcm = tb_priv(tb);
> > + struct tb_port *p;
> > int ret;
> > tb->root_switch = tb_switch_alloc(tb, &tb->dev, 0);
> > @@ -1911,6 +1912,12 @@ static int tb_start(struct tb *tb)
> > false);
> > /* Enable TMU if it is off */
> > tb_switch_tmu_enable(tb->root_switch);
> > +
> > + tb_switch_for_each_port(tb->root_switch, p) {
> > + if (tb_port_is_null(p))
> > + tb_port_reset(p);
> > + }
> > +
> > /* Full scan to discover devices added before the driver was loaded. */
> > tb_scan_switch(tb->root_switch);
> > /* Find out tunnels created by the boot firmware */
> > diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
> > index 28bb80d967d6..fe5edefec712 100644
> > --- a/drivers/thunderbolt/tb.h
> > +++ b/drivers/thunderbolt/tb.h
> > @@ -1028,6 +1028,7 @@ int tb_port_clear_counter(struct tb_port *port, int counter);
> > int tb_port_unlock(struct tb_port *port);
> > int tb_port_enable(struct tb_port *port);
> > int tb_port_disable(struct tb_port *port);
> > +int tb_port_reset(struct tb_port *port);
> > int tb_port_alloc_in_hopid(struct tb_port *port, int hopid, int max_hopid);
> > void tb_port_release_in_hopid(struct tb_port *port, int hopid);
> > int tb_port_alloc_out_hopid(struct tb_port *port, int hopid, int max_hopid);
> > @@ -1121,6 +1122,7 @@ bool tb_lc_is_usb_plugged(struct tb_port *port);
> > bool tb_lc_is_xhci_connected(struct tb_port *port);
> > int tb_lc_xhci_connect(struct tb_port *port);
> > void tb_lc_xhci_disconnect(struct tb_port *port);
> > +int tb_lc_reset_port(struct tb_port *port);
> > int tb_lc_set_wake(struct tb_switch *sw, unsigned int flags);
> > int tb_lc_set_sleep(struct tb_switch *sw);
> > bool tb_lc_lane_bonding_possible(struct tb_switch *sw);
> > diff --git a/drivers/thunderbolt/tb_regs.h b/drivers/thunderbolt/tb_regs.h
> > index f8c1ca3464d9..8fd12bc2d500 100644
> > --- a/drivers/thunderbolt/tb_regs.h
> > +++ b/drivers/thunderbolt/tb_regs.h
> > @@ -555,6 +555,9 @@ struct tb_regs_hop {
> > #define TB_LC_POWER 0x740
> > /* Link controller registers */
> > +#define TB_LC_PORT_MODE 0x26
> > +#define TB_LC_PORT_MODE_DPR BIT(6)
> > +
> > #define TB_LC_CS_42 0x2a
> > #define TB_LC_CS_42_USB_PLUGGED BIT(31)
> >
>
> Yep, that certainly solves the lockup/reboot issues and all PCIe devices are
> discovered and appear to work. I can reboot repeatedly and that seems to be ok.
>
> It causes some peculiarity in the DP tunnel however where one or both will fail to come up
> leaving this in dmesg (in this instance both failed) :
>
> [ 10.550439] [drm] Adding stream 00000000a5b9bb95 to context failed with err 28!
> [ 10.551032] [drm:handle_hpd_irq_helper [amdgpu]] *ERROR* Restoring old state failed with -22
> [ 11.180398] [drm] Adding stream 00000000a5b9bb95 to context failed with err 28!
> [ 11.180830] [drm:handle_hpd_irq_helper [amdgpu]] *ERROR* Restoring old state failed with -22
>
> Oddly enough X thinks the displays are there and is pretending to display on them, but they
> remain black. This can be one, the other or both depending on the boot.
>
> I have probably cold/warm booted 50 times now with varying combinations of activation to attempt
> to pin some form of determinism on this behaviour, but I've got nothing at this point.
Okay, do you see in the dmesg whether the DP tunnels are actually
created when you see the issue?