Re: [PATCH v2 3/3] PCI/ASPM: Add LTR sysfs attributes

From: Bjorn Helgaas
Date: Mon Oct 25 2021 - 13:31:26 EST


On Mon, Oct 25, 2021 at 06:33:50PM +0800, Kai-Heng Feng wrote:
> On Thu, Oct 21, 2021 at 11:09 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Thu, Oct 21, 2021 at 11:51:59AM +0800, Kai-Heng Feng wrote:
> > > Sometimes BIOS may not be able to program ASPM and LTR settings, for
> > > instance, the NVMe devices behind Intel VMD bridges. For this case, both
> > > ASPM and LTR have to be enabled to have significant power saving.
> > >
> > > Since we still want POLICY_DEFAULT honor the default BIOS ASPM settings,
> > > introduce LTR sysfs knobs so users can set max snoop and max nosnoop
> > > latency manually or via udev rules.
> >
> > How is the user supposed to figure out what "max snoop" and "max
> > nosnoop" values to program?
>
> Actually, the only way I know is to get the value from other OS.

I don't see how this can be a workable solution. IIUC this is
specifically related to ASPM L1.2. L1.2 depends on LTR (the max
snoop/nosnoop values) and the TPOWER_ON and Common_Mode_Restore_Time
values. PCIe r5.0, sec 5.5.4, says:

Prior to setting either or both of the enable bits for L1.2, the
values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM
L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value and
Scale fields) must be programmed.

The TPOWER_ON and Common_Mode_Restore_Time fields must be programmed
to the appropriate values based on the components and AC coupling
capacitors used in the connection linking the two components. The
determination of these values is design implementation specific.

I do not think collecting values from some other OS is a reasonable
way to set these. The values would apparently depend on the
electrical design of the particular machine.

> > If we add this, I'm afraid we'll have people programming random things
> > that seem to work but are not actually reliable.
>
> IMO users need to take full responsibility for own doings.
> Also, it's already doable by using setpci...

I don't think it currently does, but setpci should taint the kernel.

If users want to write setpci scripts to fiddle with stuff, that's
great, but that moves it outside the supportable realm. If we provide
a sysfs interface to do this, then it becomes more of *our* problem to
make it work correctly, and I don't think that's practical in this
case.

> If we don't want to add LTR sysfs, what other options do we have to
> enable VMD ASPM and LTR by default since BIOS doesn't do it for us?
> 1) Enable it in the PCI or VMD driver, however this approach violates
> POLICY_DEFAULT.
> 2) Use `setpci` directly in udev rules to enable VMD ASPM and LTR.
>
> I think 2) is bad, and since 1) isn't so good either, the approach in
> this patch may be the best compromise.

I do not know how to safely enable L1.2. It's likely that I'm just
missing something, but I don't see enough information in PCI config
space and the PCI Firmware interface (_DSM for Latency Tolerance
Reporting) to enable L1.2. It's possible that a new firmware
interface is required.

I don't think it's wise to enable L1.2 unless we have good confidence
that we know how to do it correctly. It's hard enough to debug ASPM
issues as it is.

Bjorn