Re: [PATCH 2/2] PCI: vmd: Enable ASPM for mobile platforms
From: Bjorn Helgaas
Date: Mon Oct 05 2020 - 15:19:34 EST
[+cc Ian, who's also working on an ASPM issue]
On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
> > On Oct 3, 2020, at 06:18, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> >> BIOS may not be able to program ASPM for links behind VMD, prevent Intel
> >> SoC from entering deeper power saving state.
> >
> > It's not a question of BIOS not being *able* to configure ASPM. I
> > think BIOS could do it, at least in principle, if it had a driver for
> > VMD. Actually, it probably *does* include some sort of VMD code
> > because it sounds like BIOS can assign some Root Ports to appear
> > either as regular Root Ports or behind the VMD.
> >
> > Since this issue is directly related to the unusual VMD topology, I
> > think it would be worth a quick recap here. Maybe something like:
> >
> > VMD is a Root Complex Integrated Endpoint that acts as a host bridge
> > to a secondary PCIe domain. BIOS can reassign one or more Root
> > Ports to appear within a VMD domain instead of the primary domain.
> >
> > However, BIOS may not enable ASPM for the hierarchies behind a VMD,
> > ...
> >
> > (This is based on the commit log from 185a383ada2e ("x86/PCI: Add
> > driver for Intel Volume Management Device (VMD)")).
>
> Ok, will just copy the portion as-is if there's patch v2 :)
>
> > But we still have the problem that CONFIG_PCIEASPM_DEFAULT=y means
> > "use the BIOS defaults", and this patch would make it so we use the
> > BIOS defaults *except* for things behind VMD.
> >
> > - Why should VMD be a special case?
>
> Because BIOS doesn't handle ASPM for it so it's up to software to do
> the job. In the meantime we want other devices still use the BIOS
> defaults to not introduce any regression.
>
> > - How would we document such a special case?
>
> I wonder whether other devices that add PCIe domain have the same
> behavior? Maybe it's not a special case at all...
What other devices are these?
> I understand the end goal is to keep consistency for the entire ASPM
> logic. However I can't think of any possible solution right now.
>
> > - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
> > SoC power state problem?
>
> Yes.
>
> > - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
>
> This will break many systems, at least for the 1st Gen Ryzen
> desktops and laptops.
>
> All PCIe ASPM are not enabled by BIOS, and those systems immediately
> freeze once ASPM is enabled.
That indicates a defect in the Linux ASPM code. We should fix that.
It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
Are there bug reports for these? The info we would need to start with
includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
might be interesting, too. We'll likely need to add some
instrumentation and do some experimentation, but in principle, this
should be fixable.
Bjorn