Re: [PATCH v2 2/2] PCI: Add missing link delays required by the PCIe spec

From: Mika Westerberg
Date: Wed Oct 30 2019 - 07:34:00 EST


On Tue, Oct 29, 2019 at 03:54:56PM -0500, Bjorn Helgaas wrote:
> On Fri, Oct 04, 2019 at 03:39:47PM +0300, Mika Westerberg wrote:
> > Currently Linux does not follow PCIe spec regarding the required delays
> > after reset. A concrete example is a Thunderbolt add-in-card that
> > consists of a PCIe switch and two PCIe endpoints:
> > ...
>
> > @@ -1025,15 +1025,11 @@ static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
> > if (state == PCI_D0) {
> > pci_platform_power_transition(dev, PCI_D0);
> > /*
> > - * Mandatory power management transition delays, see
> > - * PCI Express Base Specification Revision 2.0 Section
> > - * 6.6.1: Conventional Reset. Do not delay for
> > - * devices powered on/off by corresponding bridge,
> > - * because have already delayed for the bridge.
> > + * Mandatory power management transition delays are handled
> > + * in pci_pm_runtime_resume() of the corresponding
> > + * downstream/root port.
> > */
> > if (dev->runtime_d3cold) {
> > - if (dev->d3cold_delay && !dev->imm_ready)
> > - msleep(dev->d3cold_delay);
>
> This removes the only use of d3cold_delay. I assume that's
> intentional? If we no longer need it, we might as well remove it from
> the pci_dev and remove the places that set it. It'd be nice if that
> could be a separate patch, even if we waited a little longer than
> necessary at that one bisection point.

Yes, it is intentional. In the previous version I had function
pcie_get_downstream_delay() that used both d3cold_delay and imm_ready to
calculate the downstream device delay but you said:

I'm not sold on the idea that this delay depends on what's *below* the
bridge. We're using sec 6.6.1 to justify the delay, and that section
doesn't say anything about downstream devices.

So I dropped it and use 100 ms instead.

Now that you mention, I think if we want to continue support that _DSM,
we should still take d3cold_delay into account in this patch. There is
also one driver (drivers/mfd/intel-lpss-pci.c) that sets it to 0.

> It also removes one of the three uses of imm_ready, leaving only the
> two in FLR. I suspect there are other places we should use imm_ready,
> e.g., transitions to/from D1 and D2, but that would be beyond the
> scope of this patch.

Right, I think imm_ready does not apply here. If I understand correctly
it is exactly for D1, D2 and D3hot transitions which we should take into
account in pci_dev_d3_sleep() (which we don't do right now).

> > + /*
> > + * For PCIe downstream and root ports that do not support speeds
> > + * greater than 5 GT/s need to wait minimum 100 ms. For higher
> > + * speeds (gen3) we need to wait first for the data link layer to
> > + * become active.
> > + *
> > + * However, 100 ms is the minimum and the PCIe spec says the
> > + * software must allow at least 1s before it can determine that the
> > + * device that did not respond is a broken device. There is
> > + * evidence that 100 ms is not always enough, for example certain
> > + * Titan Ridge xHCI controller does not always respond to
> > + * configuration requests if we only wait for 100 ms (see
> > + * https://bugzilla.kernel.org/show_bug.cgi?id=203885).
> > + *
> > + * Therefore we wait for 100 ms and check for the device presence.
> > + * If it is still not present give it an additional 100 ms.
> > + */
> > + if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
> > + pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
> > + return;
>
> Shouldn't this be:
>
> if (!pcie_downstream_port(dev))
> return
>
> so we include PCI/PCI-X to PCIe bridges?

Yes, I'll change it in v3.