Re: [PATCH V2] PCI/ASPM: Save/restore L1SS Capability for suspend/resume
From: Rajat Jain
Date: Mon Jul 25 2022 - 18:51:26 EST
Hello,
On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@xxxxxxxxxx> wrote:
>
> Agree with Bjorn's observations.
> The fact that the L1SS capability registers themselves disappeared in
> the root port post resume indicates that there seems to be something
> wrong with the BIOS itself.
> Could you please check from that perspective?
ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This
is a shallower sleep state that preserves more state than, for e.g. S3
(suspend-to-RAM). When we use S0ix, then BIOS does not come in picture
at all. i.e. after the kernel runs its suspend routines, it just puts
the CPU into S0ix state. So I do not think there is a BIOS angle to
this.
>
> Thanks,
> Vidya Sagar
>
>
> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote:
> >> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> napisał(a):
> >>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@xxxxxxxxx> wrote:
> >>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@xxxxxxxxxx> wrote:
> >>>>>
> >>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> >>>>> saved and restored during suspend/resume leading to L1 Substates
> >>>>> configuration being lost post-resume.
> >>>>>
> >>>>> Save the L1 Substates control registers so that the configuration is
> >>>>> retained post-resume.
> >>>>>
> >>>>> Signed-off-by: Vidya Sagar <vidyas@xxxxxxxxxx>
> >>>>> Tested-by: Abhishek Sahu <abhsahu@xxxxxxxxxx>
> >>>>
> >>>> Hi Vidya,
> >>>>
> >>>> I tested this patch on kernel v5.19-rc6.
> >>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP.
> >>>> This patch can restore L1SS after suspend/resume.
> >>>>
> >>>> The test results are as follows:
> >>>>
> >>>> After Boot:
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>> Capabilities: [110 v1] L1 PM Substates
> >>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>> PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>> T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>> L1SubCtl2: T_PwrOn=3100us
> >>>>
> >>>>
> >>>> After suspend/resume without this patch.
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>> Capabilities: [110 v1] L1 PM Substates
> >>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>> PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> >>>> T_CommonMode=0us LTR1.2_Threshold=0ns
> >>>> L1SubCtl2: T_PwrOn=10us
> >>>>
> >>>>
> >>>> After suspend/resume with this patch.
> >>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates"
> >>>> Capabilities: [110 v1] L1 PM Substates
> >>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+
> >>>> ASPM_L1.1+ L1_PM_Substates+
> >>>> PortCommonModeRestoreTime=255us
> >>>> PortTPowerOnTime=3100us
> >>>> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> >>>> T_CommonMode=0us LTR1.2_Threshold=3145728ns
> >>>> L1SubCtl2: T_PwrOn=3100us
> >>>>
> >>>>
> >>>> Tested-by: Ben Chuang <benchuanggli@xxxxxxxxx>
> >>>
> >>> Forgot to add mine:
> >>> Tested-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx>
> >>>
> >>>>
> >>>> Best regards,
> >>>> Ben Chuang
> >>>>
> >>>>
> >>>>> ---
> >>>>> Hi,
> >>>>> Kenneth R. Crudup <kenny@xxxxxxxxx>, Could you please verify this patch
> >>>>> on your laptop (Dell XPS 13) one last time?
> >>>>> IMHO, the regression observed on your laptop with an old version of the patch
> >>>>> could be due to a buggy old version BIOS in the laptop.
> >>>>>
> >>>>> Thanks,
> >>>>> Vidya Sagar
> >>>>>
> >>>>> drivers/pci/pci.c | 7 +++++++
> >>>>> drivers/pci/pci.h | 4 ++++
> >>>>> drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
> >>>>> 3 files changed, 55 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> >>>>> index cfaf40a540a8..aca05880aaa3 100644
> >>>>> --- a/drivers/pci/pci.c
> >>>>> +++ b/drivers/pci/pci.c
> >>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev)
> >>>>> return i;
> >>>>>
> >>>>> pci_save_ltr_state(dev);
> >>>>> + pci_save_aspm_l1ss_state(dev);
> >>>>> pci_save_dpc_state(dev);
> >>>>> pci_save_aer_state(dev);
> >>>>> pci_save_ptm_state(dev);
> >>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev)
> >>>>> * LTR itself (in the PCIe capability).
> >>>>> */
> >>>>> pci_restore_ltr_state(dev);
> >>>>> + pci_restore_aspm_l1ss_state(dev);
> >>>>>
> >>>>> pci_restore_pcie_state(dev);
> >>>>> pci_restore_pasid_state(dev);
> >>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
> >>>>> if (error)
> >>>>> pci_err(dev, "unable to allocate suspend buffer for LTR\n");
> >>>>>
> >>>>> + error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> >>>>> + 2 * sizeof(u32));
> >>>>> + if (error)
> >>>>> + pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> >>>>> +
> >>>>> pci_allocate_vc_save_buffers(dev);
> >>>>> }
> >>>>>
> >>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> >>>>> index e10cdec6c56e..92d8c92662a4 100644
> >>>>> --- a/drivers/pci/pci.h
> >>>>> +++ b/drivers/pci/pci.h
> >>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
> >>>>> void pcie_aspm_exit_link_state(struct pci_dev *pdev);
> >>>>> void pcie_aspm_pm_state_change(struct pci_dev *pdev);
> >>>>> void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
> >>>>> #else
> >>>>> static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
> >>>>> static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
> >>>>> static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
> >>>>> static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> >>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
> >>>>> #endif
> >>>>>
> >>>>> #ifdef CONFIG_PCIE_ECRC
> >>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> >>>>> index a96b7424c9bc..2c29fdd20059 100644
> >>>>> --- a/drivers/pci/pcie/aspm.c
> >>>>> +++ b/drivers/pci/pcie/aspm.c
> >>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >>>>> PCI_L1SS_CTL1_L1SS_MASK, val);
> >>>>> }
> >>>>>
> >>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> >>>>> +{
> >>>>> + int aspm_l1ss;
> >>>>> + struct pci_cap_saved_state *save_state;
> >>>>> + u32 *cap;
> >>>>> +
> >>>>> + if (!pci_is_pcie(dev))
> >>>>> + return;
> >>>>> +
> >>>>> + aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> + if (!aspm_l1ss)
> >>>>> + return;
> >>>>> +
> >>>>> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> + if (!save_state)
> >>>>> + return;
> >>>>> +
> >>>>> + cap = (u32 *)&save_state->cap.data[0];
> >>>>> + pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> >>>>> + pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> >>>>> +}
> >>>>> +
> >>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> >>>>> +{
> >>>>> + int aspm_l1ss;
> >>>>> + struct pci_cap_saved_state *save_state;
> >>>>> + u32 *cap;
> >>>>> +
> >>>>> + if (!pci_is_pcie(dev))
> >>>>> + return;
> >>>>> +
> >>>>> + aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> + if (!aspm_l1ss)
> >>>>> + return;
> >>>>> +
> >>>>> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> >>>>> + if (!save_state)
> >>>>> + return;
> >>>>> +
> >>>>> + cap = (u32 *)&save_state->cap.data[0];
> >>>>> + pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> >>>>> + pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> >>>>> +}
> >>>>> +
> >>>>> static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
> >>>>> {
> >>>>> pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
> >>>>> --
> >>>>> 2.17.1
> >>>>>
> >>
> >> Hi,
> >>
> >> With this patch (and also mentioned
> >> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@xxxxxxxxxxxxx/)
> >> applied on 5.10 (chromeos-5.10) I am observing problems after
> >> suspend/resume with my WiFi card - it looks like whole communication
> >> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume
> >> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3
> >>
> >> I played a little bit with this code and it looks like the
> >> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know
> >> why, not a PCI expert).
> >
> > Thanks a lot for testing this! I'm not quite sure what to make of the
> > results since v5.10 is fairly old (Dec 2020) and I don't know what
> > other changes are in chromeos-5.10.
Lukasz: I assume you are running this on Atlas and are seeing this bug
when uprev'ving it to 5.10 kernel. Can you please try it on a newer
Intel platform that have the latest upstream kernel running already
and see if this can be reproduced there too?
Note that the wifi PCI device is different on newer Intel platforms,
but platform design is similar enough that I suspect we should see
similar bug on those too. The other option is to try the latest
ustream kernel on Atlas. Perhaps if we just care about wifi (and
ignore bringing up the graphics stack and GUI), it may come up
sufficiently enough to try this patch?
Thanks,
Rajat
> >
> > Random observations, no analysis below. This from your dmesg
> > certainly looks like PCI reads failing and returning ~0:
> >
> > Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff)
> > iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
> > iwlwifi 0000:01:00.0: Device gone - attempting removal
> > Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue.
> >
> > And then we re-enumerate 01:00.0 and it looks like it may have been
> > reset (BAR is 0):
> >
> > pci 0000:01:00.0: [8086:095a] type 00 class 0x028000
> > pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit]
> >
> > lspci diffs from before/after suspend:
> >
> > 00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode])
> > Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
> > - DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend-
> > + DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
> > - LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > + LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
> > - LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1-
> > + LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
> > - Capabilities: [150 v0] Null
> > - Capabilities: [200 v1] L1 PM Substates
> > - L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > - PortCommonModeRestoreTime=40us PortTPowerOnTime=10us
> > - L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > - T_CommonMode=40us LTR1.2_Threshold=98304ns
> > - L1SubCtl2: T_PwrOn=60us
> >
> > The DevSta differences might be BIOS bugs, probably not relevant.
> > Interesting that ASPM is disabled, maybe didn't get enabled after
> > re-enumerating 01:00.0? Strange that the L1 PM Substates capability
> > disappeared.
> >
> > 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59)
> > LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
> > - ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt-
> > + ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
> > Capabilities: [154 v1] L1 PM Substates
> > L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
> > PortCommonModeRestoreTime=30us PortTPowerOnTime=60us
> > - L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+
> > - T_CommonMode=0us LTR1.2_Threshold=98304ns
> > + L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
> > + T_CommonMode=0us LTR1.2_Threshold=0ns
> >
> > Dmesg claimed we reconfigured common clock config. Maybe ASPM didn't
> > get reinitialized after re-enumeration? Looks like we didn't restore
> > L1SubCtl1.
> >
> > Bjorn
> >