Re: [PATCH 3/4] PCI: qcom: Indicate broken L1ss exit during resume from system suspend
From: Manivannan Sadhasivam
Date: Fri Apr 17 2026 - 08:06:56 EST
On Thu, Apr 16, 2026 at 02:20:00PM -0500, Bjorn Helgaas wrote:
> [+cc Rafael]
>
> On Tue, Apr 14, 2026 at 09:29:41PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> > From: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxxxxxxxx>
> >
> > Qcom PCIe RCs can successfully exit from L1ss during OS runtime. However,
> > during system suspend, the Qcom PCIe RC driver may remove all resource
> > votes and turns off the PHY to maximize power savings.
> >
> > Consequently, when the host is in system suspend with the link in L1ss and
> > the endpoint asserts CLKREQ#, the OS must first wake up and the RC driver
> > must restore the PHY and enable the refclk. This recovery process causes
> > the strict L1ss exit latency time to be exceeded. (If the RC driver were to
> > retain all votes during suspend, L1ss exit would succeed without issue, but
> > at the expense of higher power consumption).
>
> I don't think the link can be in L1.x if the PHY is turned off, can
> it? I assume if the PHY is off, the link would be in L2 (if aux power
> is available) or L3.
>
As per the spec, if the link is in L1.2, the entire analog circuitry of the PHY
can be powered off and that's what I meant here. The LTSSM state would be
preserved by the MAC layer, whose context is always retained.
The only problem is that, CLKREQ# is routed to an Always-on-Domain (AON) inside
the SoC. So when the endpoint asserts CLKREQ#, AON wakes up the SoC and later
the PCIe controller driver turns ON the PHY. But by that time, the L1ss exit
latency would've elapsed, causing LDn.
> L2 and L3 both correspond to the downstream device being in D3cold
> (PCIe r7.0, sec 5.3.2), so I assume this is a reset as far as the
> device is concerned, and we need all the delays associated with reset
> and the D3cold -> D0 transition.
>
> > This latency violation leads to an L1ss exit timeout, followed by a Link
> > Down (LDn) condition during resume. This LDn can crash the OS if the
> > endpoint hosts the RootFS, and for other types of devices, it may result in
> > a full device reset/recovery.
>
> What does "L1SS exit timeout" mean in PCIe terms? Is there some event
> (Message, interrupt, etc) that is triggered by the timeout?
>
By 'L1ss exit timeout' I meant the failure to move to L0 state post L1.2 exit.
During L1.2 exit, the endpoint expects the refclk and common mode voltage to be
restored within the negotiated time. Per spec, r7.0, sec 5.5.3.3.1, Exit from
L1.2:
```
Next state is L1.0 after waiting for TPOWER_ON
* Common mode is permitted to be established passively during L1.0, and actively
during Recovery. In order to ensure common mode has been established, the
Downstream Port must maintain a timer, and the Downstream Port must continue to
send TS1 training sequences until a minimum of TCOMMONMODE has elapsed since the
Downstream Port has started transmitting TS1 training sequences and has detected
electrical idle exit on any Lane of the configured Link.
```
So if this condition is not satisfied, then the link would move to the LDn
state and that's the only event triggered to the OS.
> > So to ensure that the client drivers can properly handle this scenario, let
> > them know about this platform limitation by setting the
> > 'pci_host_bridge::broken_l1ss_resume' flag.
>
> I don't see how this means L1SS is broken. If the device is
> effectively reset, of course we can't go from L1.x to L0 because we
> didn't start from L1.x.
>
>From the OS perspective, the link would still be in L1ss and not expected to
move to L2/L3 during suspend/resume, since that transition is controlled by the
OS itself. But when the OS resumes, the link would go to LDn state and it can
only be brought back to L0, after a complete reset.
- Mani
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxxxxxxxx>
> > ---
> > drivers/pci/controller/dwc/pcie-qcom.c | 11 +++++++++++
> > 1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
> > index 67a16af69ddc..01afffd384f2 100644
> > --- a/drivers/pci/controller/dwc/pcie-qcom.c
> > +++ b/drivers/pci/controller/dwc/pcie-qcom.c
> > @@ -1363,6 +1363,17 @@ static void qcom_pcie_host_post_init(struct dw_pcie_rp *pp)
> > struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
> > struct qcom_pcie *pcie = to_qcom_pcie(pci);
> >
> > + /*
> > + * During system suspend, the Qcom RC driver may turn off the PHY and
> > + * remove votes to save power. If the endpoint asserts CLKREQ# to
> > + * exit L1ss, the time required to wake the system and restore the
> > + * PHY/refclk exceeds the strict L1ss exit timing, resulting in Link
> > + * Down (LDn). Set this flag to indicate this limitation to client
> > + * drivers so that they will avoid relying on L1ss during system
> > + * suspend.
> > + */
> > + pp->bridge->broken_l1ss_resume = true;
> > +
> > if (pcie->cfg->ops->host_post_init)
> > pcie->cfg->ops->host_post_init(pcie);
> > }
> >
> > --
> > 2.51.0
> >
> >
--
மணிவண்ணன் சதாசிவம்