Re: [PATCH v5 2/5] PCI: Check PCI_PM_CTRL instead of PCI_COMMAND in pci_dev_wait()
From: Bjorn Helgaas
Date: Tue Sep 03 2024 - 13:12:10 EST
On Tue, Sep 03, 2024 at 11:29:23AM -0500, Mario Limonciello wrote:
> On 8/29/2024 19:01, Bjorn Helgaas wrote:
> > On Fri, Aug 23, 2024 at 10:40:20AM -0500, Mario Limonciello wrote:
> > > From: Mario Limonciello <mario.limonciello@xxxxxxx>
> > >
> > > If a dock is plugged in at the same time as autosuspend delay then
> > > this can cause malfunctions in the USB4 stack. This happens because
> > > the device is still in D3cold at the time that the PCI core handed
> > > control back to the USB4 stack.
> > >
> > > A device that has gone through a reset may return a value in
> > > PCI_COMMAND but that doesn't mean it's finished transitioning to D0.
> > > For devices that support power management explicitly check
> > > PCI_PM_CTRL on everything but system resume to ensure the transition
> > > happened.
> >
> > Still trying to understand what's going on here.
> >
> > I posted a change to pci_dev_wait() to read Vendor ID, look for Config
> > RRS status, and wait for a successful completion (when RRS Software
> > Visibility is enabled) [1].
> >
> > You tested that and found that it didn't help with *this* issue [2].
> > I assume you tested something like v6.11-rc plus the patches from [1],
> > i.e., without the PCI_PM_CTRL changes in this series.
> >
> > 1) Initially the device is in D0
> >
> > 2) We put it in D3cold (using some ACPI method) because the
> > autosuspend delay expired (?)
> >
> > 3) Plugging in the dock wakes up the device, so we power up the
> > device (via pci_power_up(), which again uses some ACPI method), and
> > it should transition to D0uninitialized
> >
> > 4) The USB4 stack sees the device but thinks it's in D3cold (?)
> ...
> > If you *did* include both [1] and patch 3/5, the implication would be
> > that pci_dev_wait() successfully read the Vendor ID, meaning the
> > device is not in D3cold when pci_power_up() returns.
>
> The testing here was from the LTS 6.6.y kernel with both [1] and
> patch 3/5 from this series.
>
> > Can you clarify what you see and possibly expand/correct my
> > timeline above?
>
> The timeline you shared is nearly correct. The USB4 stack *thinks*
> the device is in D0 because of the return of pci_power_up().
>
> As by polling PCI_PM_CTRL we can see it's still in D3, so it hasn't
> made it to D0uninitialized yet.
>
> I guess I reading between the lines you have an assumption that you
> can't read the vendor ID from D3; which doesn't appear to be the
> case from our testing.
A Vendor ID read of a device in D3hot should definitely work.
Obviously if the device were in D3cold, we'd get no response at all,
so the requester should log a UR error and fabricate ~0 data.
But if the device starts out in D3cold and we power it up, it should
not go through D3hot. The only legal transition from D3cold is to
D0uninitialized (PCIe r6.0, sec 5.8).
OK, so with [1] and patch 3/5:
1) Initially the device is in D0
2) We put it in D3cold (using some ACPI method) because the
autosuspend delay expired (?)
3) Plugging in the dock wakes up the device, so we power up the
device (via pci_power_up(), which again uses some ACPI method), and
it should transition to D0uninitialized
4) With patch 3/5, pci_power_up() calls pci_dev_wait() because
dev->current_state == PCI_D3cold
5) I *assume* RRS SV is enabled (lspci -vv of Root Port would
confirm this; maybe we should add a pci_dbg message about which
register we're polling). If so, patch [1] means we should poll
Vendor ID until successful completion.
6) pci_dbg log should confirm the device is ready with a "ready %dms
after D3cold->D0" message, which would mean we got a successful
completion when reading Vendor ID
7) For debugging purposes, it would be interesting to read and log
the PCI_PM_CTRL value here. Per sec 2.3.1, the device is not
allowed to return RRS at this point since we already got a
successful completion.
8) The USB4 stack sees the device and assumes it is in D0, but it
seems to still be in D3cold. What is this based on? Is there a
config read that returns ~0 data when it shouldn't?
> > [1] https://lore.kernel.org/linux-pci/20240827234848.4429-1-helgaas@xxxxxxxxxx/
> > [2] https://lore.kernel.org/linux-pci/30d9589a-8050-421b-a9a5-ad3422feadad@xxxxxxx/
> >
> > > Devices that don't support power management and system resume will
> > > continue to use PCI_COMMAND.
> > >
> > > Signed-off-by: Mario Limonciello <mario.limonciello@xxxxxxx>
> > > ---
> > > v4->v5:
> > > * Fix misleading indentation
> > > * Amend commit message
> > > ---
> > > drivers/pci/pci.c | 28 ++++++++++++++++++++--------
> > > 1 file changed, 20 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > > index 1e219057a5069..f032a4aaec268 100644
> > > --- a/drivers/pci/pci.c
> > > +++ b/drivers/pci/pci.c
> > > @@ -1309,21 +1309,33 @@ static int pci_dev_wait(struct pci_dev *dev, enum pci_reset_type reset_type, int
> > > * the read (except when CRS SV is enabled and the read was for the
> > > * Vendor ID; in that case it synthesizes 0x0001 data).
> > > *
> > > - * Wait for the device to return a non-CRS completion. Read the
> > > - * Command register instead of Vendor ID so we don't have to
> > > - * contend with the CRS SV value.
> > > + * Wait for the device to return a non-CRS completion. On devices
> > > + * that support PM control and on waits that aren't part of system
> > > + * resume read the PM control register to ensure the device has
> > > + * transitioned to D0. On devices that don't support PM control,
> > > + * or during system resume read the command register to instead of
> > > + * Vendor ID so we don't have to contend with the CRS SV value.
> > > */
> > > for (;;) {
> > > - u32 id;
> > > -
> > > if (pci_dev_is_disconnected(dev)) {
> > > pci_dbg(dev, "disconnected; not waiting\n");
> > > return -ENOTTY;
> > > }
> > > - pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > - if (!PCI_POSSIBLE_ERROR(id))
> > > - break;
> > > + if (dev->pm_cap && reset_type != PCI_DEV_WAIT_RESUME) {
> > > + u16 pmcsr;
> > > +
> > > + pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
> > > + if (!PCI_POSSIBLE_ERROR(pmcsr) &&
> > > + (pmcsr & PCI_PM_CTRL_STATE_MASK) == PCI_D0)
> > > + break;
> > > + } else {
> > > + u32 id;
> > > +
> > > + pci_read_config_dword(dev, PCI_COMMAND, &id);
> > > + if (!PCI_POSSIBLE_ERROR(id))
> > > + break;
> > > + }
> > > if (delay > timeout) {
> > > pci_warn(dev, "not ready %dms after %s; giving up\n",
> > > --
> > > 2.43.0
> > >
>