Re: [PATCH v4 3/3] pci: intel: Add sysfs attributes to configure pcie link

From: Bjorn Helgaas
Date: Thu Oct 31 2019 - 09:22:32 EST


On Thu, Oct 31, 2019 at 06:47:10PM +0800, Dilip Kota wrote:
> On 10/31/2019 6:14 AM, Bjorn Helgaas wrote:
> > On Tue, Oct 29, 2019 at 05:31:18PM +0800, Dilip Kota wrote:
> > > On 10/22/2019 8:59 PM, Bjorn Helgaas wrote:
> > > > [+cc Rafael, linux-pm, beginning of discussion at
> > > > https://lore.kernel.org/r/d8574605f8e70f41ce1e88ccfb56b63c8f85e4df.1571638827.git.eswara.kota@xxxxxxxxxxxxxxx]
> > > >
> > > > On Tue, Oct 22, 2019 at 05:27:38PM +0800, Dilip Kota wrote:
> > > > > On 10/22/2019 1:18 AM, Bjorn Helgaas wrote:
> > > > > > On Mon, Oct 21, 2019 at 02:38:50PM +0100, Andrew Murray wrote:
> > > > > > > On Mon, Oct 21, 2019 at 02:39:20PM +0800, Dilip Kota wrote:
> > > > > > > > PCIe RC driver on Intel Gateway SoCs have a requirement
> > > > > > > > of changing link width and speed on the fly.
> > > > > > Please add more details about why this is needed. Since
> > > > > > you're adding sysfs files, it sounds like it's not
> > > > > > actually the *driver* that needs this; it's something in
> > > > > > userspace?
> > > > > We have use cases to change the link speed and width on the fly.
> > > > > One is EMI check and other is power saving. Some battery backed
> > > > > applications have to switch PCIe link from higher GEN to GEN1 and
> > > > > width to x1. During the cases like external power supply got
> > > > > disconnected or broken. Once external power supply is connected then
> > > > > switch PCIe link to higher GEN and width.
> > > > That sounds plausible, but of course nothing there is specific to the
> > > > Intel Gateway, so we should implement this generically so it would
> > > > work on all hardware.
> > > Agree.
> > > > I'm not sure what the interface should look like -- should it be a
> > > > low-level interface as you propose where userspace would have to
> > > > identify each link of interest, or is there some system-wide
> > > > power/performance knob that could tune all links? Cc'd Rafael and
> > > > linux-pm in case they have ideas.
> > > To my knowledge sysfs is the appropriate way to go.
> > > If there are any other best possible knobs, will be helpful.
> > I agree sysfs is the right place for it; my question was whether we
> > should have files like:
> >
> > /sys/.../0000:00:1f.3/pcie_speed
> > /sys/.../0000:00:1f.3/pcie_width
> >
> > as I think this patch would add (BTW, please include sample paths like
> > the above in the commit log), or whether there should be a more global
> > thing that would affect all the links in the system.
> Sure, i will add them.
> >
> > I think the low-level files like you propose would be better because
> > one might want to tune link performance differently for different
> > types of devices and workloads.
> >
> > We also have to decide if these files should be associated with the
> > device at the upstream or downstream end of the link. For ASPM, the
> > current proposal [1] has the files at the downstream end on the theory
> > that the GPU, NIC, NVMe device, etc is the user-recognizable one.
> > Also, neither ASPM nor link speed/width make any sense unless there
> > *is* a device at the downstream end, so putting them there
> > automatically makes them visible only when they're useful.
>
> This patch places the speed and width in the host controller directory.
> /sys/.../xxx.pcie/pcie_speed
> /sys/.../xxx.pcie/pcie_width
>
> I agree with you partially,  because i am having couple of points
> making me to keep speed and width change entries in controller
> directory:
>
> -- For changing the speed/width with device node, software ends up
> traversing to the controller from the device and do the
> operations.
> -- Change speed and width are performed at controller level,

The controller is effectively a Root Complex, which may contain
several Root Ports. I have the impression that the Synopsys
controller only supports a single Root Port, but that's just a detail
of the Synopsys implementation. I think it should be possible to
configure the width/speed of each Root Port individually.

> -- Keeping speed and width in controller gives a perspective (to the
> user) of changing them only once irrespective of no. of devices.

What if there's a switch? If we change the width/speed of the link
between the Root Port and the Switch Upstream Port, that doesn't do
anything about the links from the Switch Downstream Ports.

> -- For speed and link change in Synopsys PCIe controller, specific
> registers need to be configured. This prevents or complicates
> adding the speed and width change functionality in pci-sysfs or
> pci framework.

Don't the Link Control and related registers in PCIe spec give us
enough control to manage the link width/speed of *all* links,
including those from Root Ports and Switch Downstream Ports?

If the Synopsys controller requires controller-specific registers,
that sounds to me like it doesn't quite conform to the spec. Maybe
that means we would need some sort of quirk or controller callback?

Bjorn