RE: thunderbolt: Warning and 20 second delay in S4

From: Mani, Rajmohan
Date: Tue Jul 13 2021 - 18:00:54 EST


Hi Evan,

> Subject: Re: thunderbolt: Warning and 20 second delay in S4
>
> Hi Raj,
> Sure. I've got a TGL chromebook with my own kernel. The chromebook has
> nothing but a servo v4 plugged into it via type-C.
>
> I built the kernel by checking out next-20210709 from linux-next in the v5.4
> ChromeOS chroot directory, then doing "git checkout m/main -- chromeos", in
> order to get the configs. My chromeos-5.4 (where I pulled the configs from)
> happened to be on 04686c32716158 UPSTREAM:
> ASoC: rt5682-sdw: use first_hw_init flag on resume, though I don't think it
> matters.
>
> From there, my build line is:
> USE="kgdb pcserial vtconsole " emerge-volteer chromeos-kernel-5_4
>
> My commandline has "earlyprintk=ttyS0,115200n8 console=ttyS0,115200n8"
> so I get spew out of the serial port, but otherwise it should be standard. I'm
> also tracking this in b/192575702.
>

tb_cfg_read_raw() and tb_cfg_write_raw() implementation makes use of
TB_CTL_RETRIES (4) retries with a timeout of TB_CFG_DEFAULT_TIMEOUT
(5 seconds) for the control read/write operations.

Per the latest USB4 spec, it is recommended to have 10 +/- 1 ms timeout
value for control packets within domain and 1 second for inter-domain.

You can try changing the TB_CFG_DEFAULT_TIMEOUT value to 100ms and
see if 400ms is manageable and you can get by with it.

Mika will be back by the end of next week. I will check with Mika and rest
of the team, to arrive at the ideal values and post a patch.

> -Evan
>
> On Mon, Jul 12, 2021 at 5:16 PM Mani, Rajmohan
> <rajmohan.mani@xxxxxxxxx> wrote:
> >
> > Hi Evan,
> >
> > > -----Original Message-----
> > > From: Evan Green <evgreen@xxxxxxxxxxxx>
> > > Sent: Monday, July 12, 2021 4:46 PM
> > > To: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> > > Cc: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>; Fine, Gil
> > > <gil.fine@xxxxxxxxx>; Mani, Rajmohan <rajmohan.mani@xxxxxxxxx>;
> > > linux- usb@xxxxxxxxxxxxxxx; Prashant Malani <pmalani@xxxxxxxxxx>;
> > > LKML <linux- kernel@xxxxxxxxxxxxxxx>
> > > Subject: Re: thunderbolt: Warning and 20 second delay in S4
> > >
> > > On Fri, Jul 9, 2021 at 11:34 PM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx>
> wrote:
> > > >
> > > > On Fri, Jul 09, 2021 at 02:31:35PM -0700, Evan Green wrote:
> > > > > Hi Mika et al,
> > > > >
> > > > > I'm experimenting with suspending to disk (hibernate) on a
> > > > > Tigerlake Chromebook running the chromeos-5.4 kernel. I don't
> > > > > have any USB4 peripherals plugged in. I'm getting this warning,
> > > > > along with a 20 second stall, both when going down for hibernate and
> coming back up.
> > > >
> > > > 5.4 is pretty old, especially for thunderbolt issues, can you try
> > > > 5.13 please?
> > >
> > > Good idea. On 5.13.0-next-20210709, I see the warning and delay even
> > > at boot when runtime pm kicks in. This should make for an easier repro at
> least:
> > >
> > > [ 18.832016] thunderbolt 0000:00:0d.2: 0: timeout reading config
> > > space 2 from 0x6
> > > [ 18.840309] ------------[ cut here ]------------
> > > [ 18.845466] thunderbolt 0000:00:0d.2: interrupt for RX ring 0 is
> > > already disabled
> > > [ 18.853836] WARNING: CPU: 0 PID: 5 at drivers/thunderbolt/nhi.c:103
> > > ring_interrupt_active+0x1b7/0x1da
> > > ...
> > > [ 18.977736] CPU: 0 PID: 5 Comm: kworker/0:0 Tainted: G U
> > > 5.13.0-next-20210709 #18
> > > [ 18.996804] Workqueue: pm pm_runtime_work
> > > [ 19.001285] RIP: 0010:ring_interrupt_active+0x1b7/0x1da
> > > ...
> > > [ 19.100302] Call Trace:
> > > [ 19.103031] tb_ring_stop+0x9d/0x17d
> > > [ 19.107022] tb_ctl_stop+0x33/0xa0
> > > [ 19.110822] tb_domain_runtime_suspend+0x35/0x3a
> > > [ 19.115979] nhi_runtime_suspend+0x1f/0x4c
> > > [ 19.120557] pci_pm_runtime_suspend+0x5a/0x173
> > > [ 19.125533] ? pci_pm_restore_noirq+0x73/0x73
> > > [ 19.130411] __rpm_callback+0x8a/0x10d
> > > [ 19.134595] rpm_callback+0x22/0x74
> > > [ 19.138489] ? pci_pm_restore_noirq+0x73/0x73
> > > [ 19.143355] rpm_suspend+0x21e/0x514
> > > [ 19.147355] pm_runtime_work+0x8a/0xa5
> > > [ 19.151554] process_one_work+0x1b7/0x368
> > > [ 19.156027] worker_thread+0x213/0x372
> > > [ 19.160217] kthread+0x147/0x15f
> > > [ 19.163827] ? pr_cont_work+0x58/0x58
> > > [ 19.167928] ? kthread_blkcg+0x31/0x31
> > > [ 19.172113] ret_from_fork+0x1f/0x30
> > > [ 19.176105] ---[ end trace 438b7f20f6b4049d ]---
> >
> > I used to see these timeout errors, when there was a control
> > read/write issued to the thunderbolt/usb4 device, after the
> > thunderbolt driver is suspended.
> > Can you share the steps to reproduce this S4 issue in a Chrome device?
> >
> > Thanks
> > Raj