Re: [PATCH 2/2] PCI: vmd: Enable ASPM for mobile platforms
From: Ian Kumlien
Date: Wed Oct 07 2020 - 05:29:10 EST
On Wed, Oct 7, 2020 at 6:26 AM Kai-Heng Feng
<kai.heng.feng@xxxxxxxxxxxxx> wrote:
>
>
>
> > On Oct 6, 2020, at 03:19, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > [+cc Ian, who's also working on an ASPM issue]
> >
> > On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> >>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >>>> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
> >>>> SoC from entering deeper power saving state.
> >>>
> >>> It's not a question of BIOS not being *able* to configure ASPM. I
> >>> think BIOS could do it, at least in principle, if it had a driver for
> >>> VMD. Actually, it probably *does* include some sort of VMD code
> >>> because it sounds like BIOS can assign some Root Ports to appear
> >>> either as regular Root Ports or behind the VMD.
> >>>
> >>> Since this issue is directly related to the unusual VMD topology, I
> >>> think it would be worth a quick recap here. Maybe something like:
> >>>
> >>> VMD is a Root Complex Integrated Endpoint that acts as a host bridge
> >>> to a secondary PCIe domain. BIOS can reassign one or more Root
> >>> Ports to appear within a VMD domain instead of the primary domain.
> >>>
> >>> However, BIOS may not enable ASPM for the hierarchies behind a VMD,
> >>> ...
> >>>
> >>> (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
> >>> driver for Intel Volume Management Device (VMD)")).
> >>
> >> Ok, will just copy the portion as-is if there's patch v2 :)
> >>
> >>> But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
> >>> "use the BIOS defaults", and this patch would make it so we use the
> >>> BIOS defaults *except* for things behind VMD.
> >>>
> >>> - Why should VMD be a special case?
> >>
> >> Because BIOS doesn't handle ASPM for it so it's up to software to do
> >> the job. In the meantime we want other devices still use the BIOS
> >> defaults to not introduce any regression.
> >>
> >>> - How would we document such a special case?
> >>
> >> I wonder whether other devices that add PCIe domain have the same
> >> behavior? Maybe it's not a special case at all...
> >
> > What other devices are these?
>
> Controllers which add PCIe domain.
>
> >
> >> I understand the end goal is to keep consistency for the entire ASPM
> >> logic. However I can't think of any possible solution right now.
> >>
> >>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> >>> SoC power state problem?
> >>
> >> Yes.
> >>
> >>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
> >>
> >> This will break many systems, at least for the 1st Gen Ryzen
> >> desktops and laptops.
> >>
> >> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> >> freeze once ASPM is enabled.
> >
> > That indicates a defect in the Linux ASPM code. We should fix that.
> > It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
>
> On those systems ASPM are also not enabled on Windows. So I think ASPM are disabled for a reason.
>
> >
> > Are there bug reports for these? The info we would need to start with
> > includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
> > If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
> > might be interesting, too. We'll likely need to add some
> > instrumentation and do some experimentation, but in principle, this
> > should be fixable.
>
> Doing this is asking users to use hardware settings that ODM/OEM never tested, and I think the risk is really high.
They have to test it to comply with pcie specs? and what we're
currently doing is wrong...
This fixes the L1 behaviour in the kernel, could you test it?
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 253c30cc1967..893b37669087 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,7 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -456,10 +456,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
(link->latency_dw.l0s > acceptable->l0s))
link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+
/*
* Check L1 latency.
- * Every switch on the path to root complex need 1
- * more microsecond for L1. Spec doesn't mention L0s.
+ *
+ * PCIe r5.0, sec 5.4.1.2.2 states:
+ * A Switch is required to initiate an L1 exit transition on its
+ * Upstream Port Link after no more than 1 μs from the
beginning of an
+ * L1 exit transition on any of its Downstream Port Links.
*
* The exit latencies for L1 substates are not advertised
* by a device. Since the spec also doesn't mention a way
@@ -469,11 +473,14 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
* L1 exit latencies advertised by a device include L1
* substate latencies (and hence do not do any check).
*/
- latency = max_t(u32, link->latency_up.l1, link->latency_dw.l1);
- if ((link->aspm_capable & ASPM_STATE_L1) &&
- (latency + l1_switch_latency > acceptable->l1))
- link->aspm_capable &= ~ASPM_STATE_L1;
- l1_switch_latency += 1000;
+ if (link->aspm_capable & ASPM_STATE_L1) {
+ latency = max_t(u32, link->latency_up.l1,
link->latency_dw.l1);
+ l1_max_latency = max_t(u32, latency, l1_max_latency);
+ if (l1_max_latency + l1_switch_latency > acceptable->l1)
+ link->aspm_capable &= ~ASPM_STATE_L1;
+
+ l1_switch_latency += 1000;
+ }
link = link->parent;
}
---
If it doesn't you could also look at the following L0s patch
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 893b37669087..15d64832a988 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -434,7 +434,8 @@ static void pcie_get_aspm_reg(struct pci_dev *pdev,
static void pcie_aspm_check_latency(struct pci_dev *endpoint)
{
- u32 latency, l1_max_latency = 0, l1_switch_latency = 0;
+ u32 latency, l1_max_latency = 0, l1_switch_latency = 0,
+ l0s_latency_up = 0, l0s_latency_dw = 0;
struct aspm_latency *acceptable;
struct pcie_link_state *link;
@@ -448,14 +449,18 @@ static void pcie_aspm_check_latency(struct
pci_dev *endpoint)
while (link) {
/* Check upstream direction L0s latency */
- if ((link->aspm_capable & ASPM_STATE_L0S_UP) &&
- (link->latency_up.l0s > acceptable->l0s))
- link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+ if (link->aspm_capable & ASPM_STATE_L0S_UP) {
+ l0s_latency_up += link->latency_up.l0s;
+ if (l0s_latency_up > acceptable->l0s)
+ link->aspm_capable &= ~ASPM_STATE_L0S_UP;
+ }
/* Check downstream direction L0s latency */
- if ((link->aspm_capable & ASPM_STATE_L0S_DW) &&
- (link->latency_dw.l0s > acceptable->l0s))
- link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+ if (link->aspm_capable & ASPM_STATE_L0S_DW) {
+ l0s_latency_dw += link->latency_dw.l0s;
+ if (l0s_latency_dw > acceptable->l0s)
+ link->aspm_capable &= ~ASPM_STATE_L0S_DW;
+ }
/*
* Check L1 latency.
---
I can send them directly as well if you prefer (i hope the client
doesn't mangle them)
> Kai-Heng
>
> >
> > Bjorn
>