Re: [PATCH v2 7/8] cxl/port: Introduce cxl_port objects
From: Dan Williams
Date: Tue Apr 13 2021 - 13:19:52 EST
On Thu, Apr 8, 2021 at 7:13 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> Hi Bjorn, thanks for taking a look.
>
>
> On Thu, Apr 8, 2021 at 3:42 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > [+cc Greg, Rafael, Matthew: device model questions]
> >
> > Hi Dan,
> >
> > On Thu, Apr 01, 2021 at 07:31:20AM -0700, Dan Williams wrote:
> > > Once the cxl_root is established then other ports in the hierarchy can
> > > be attached. The cxl_port object, unlike cxl_root that is associated
> > > with host bridges, is associated with PCIE Root Ports or PCIE Switch
> > > Ports. Add cxl_port instances for all PCIE Root Ports in an ACPI0016
> > > host bridge.
> >
> > I'm not a device model expert, but I'm not sure about adding a new
> > /sys/bus/cxl/devices hierarchy. I'm under the impression that CXL
> > devices will be enumerated by the PCI core as PCIe devices.
>
> Yes, PCIe is involved, but mostly only for the CXL.io slow path
> (configuration and provisioning via mailbox) when we're talking about
> memory expander devices (CXL calls these Type-3). So-called "Type-3"
> support is the primary driver of this infrastructure.
>
> You might be thinking of CXL accelerator devices that will look like
> plain PCIe devices that happen to participate in the CPU cache
> hierarchy (CXL calls these Type-1). There will also be accelerator
> devices that want to share coherent memory with the system (CXL calls
> these Type-2).
>
> The infrastructure being proposed here is primarily for the memory
> expander (Type-3) device case where the PCI sysfs hierarchy is wholly
> unsuited for modeling it. A single CXL memory region device may span
> multiple endpoints, switches, and host bridges. It poses similar
> stress to an OS device model as RAID where there is a driver for the
> component contributors to an upper level device / driver that exposes
> the RAID Volume (CXL memory region interleave set). The CXL memory
> decode space (HDM: Host Managed Device Memory) is independent of the
> PCIe MMIO BAR space.
>
> That's where the /sys/bus/cxl hierarchy is needed, to manage the HDM
> space across the CXL topology in a way that is foreign to PCIE (HDM
> Decoder hierarchy).
>
> > Doesn't
> > that mean we will have one struct device in the pci_dev, and another
> > one in the cxl_port?
>
> Yes, that is the proposal.
>
> > That seems like an issue to me. More below.
>
> hmm...
>
> >
> > > The cxl_port instances for PCIE Switch Ports are not
> > > included here as those are to be modeled as another service device
> > > registered on the pcie_port_bus_type.
> >
> > I'm hesitant about the idea of adding more uses of pcie_port_bus_type.
> > I really dislike portdrv because it makes a parallel hierarchy:
> >
> > /sys/bus/pci
> > /sys/bus/pci_express
> >
> > for things that really should not be different. There's a struct
> > device in pci_dev, and potentially several pcie_devices, each with
> > another struct device. We make these pcie_device things for AER, DPC,
> > hotplug, etc. E.g.,
> >
> > /sys/bus/pci/devices/0000:00:1c.0
> > /sys/bus/pci_express/devices/0000:00:1c.0:pcie002 # AER
> > /sys/bus/pci_express/devices/0000:00:1c.0:pcie010 # BW notification
> >
> > These are all the same PCI device. AER is a PCI capability.
> > Bandwidth notification is just a feature of all Downstream Ports. I
> > think it makes zero sense to have extra struct devices for them. From
> > a device point of view (enumeration, power management, VM assignment),
> > we can't manage them separately from the underlying PCI device. For
> > example, we have three separate "power/" directories, but obviously
> > there's only one point of control (00:1c.0):
> >
> > /sys/devices/pci0000:00/0000:00:1c.0/power/
> > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie002/power/
> > /sys/devices/pci0000:00/0000:00:1c.0/0000:00:1c.0:pcie010/power/
>
> The superfluous power/ issue can be cleaned up with
> device_set_pm_not_required().
>
> What are the other problems this poses, because in other areas this
> ability to subdivide a device's functionality into sub-drivers is a
> useful organization principle? So much so that several device writer
> teams came together to create the auxiliary-bus for the purpose of
> allowing sub-drivers to be carved off for independent functionality
> similar to the portdrv organization.
>
Bjorn, any further thoughts on this?
This port architecture question is in the critical path for the next
phase of CXL development (targeting v5.14 not v5.13).