Re: [PATCH 1/5] cxl/core: Add cxl-bus driver infrastructure

From: Dan Williams
Date: Fri Jun 11 2021 - 19:26:20 EST


On Fri, Jun 11, 2021 at 12:28 PM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
>
> On 21-06-11 11:55:39, Dan Williams wrote:
> > On Fri, Jun 11, 2021 at 10:47 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> > >
> > > On 21-06-10 15:26:03, Dan Williams wrote:
> > > > Enable devices on the 'cxl' bus to be attached to drivers. The initial
> > > > user of this functionality is a driver for an 'nvdimm-bridge' device
> > > > that anchors a libnvdimm hierarchy attached to CXL persistent memory
> > > > resources. Other device types that will leverage this include:
> > > >
> > > > cxl_port: map and use component register functionality (HDM Decoders)
> > >
> > > Since I'm looking at this now, perhaps I can open the discussion here. Have you
> > > thought about how this works yet? Right now I'm thinking there are two "drivers":
> > > cxl_port: Switches (and ACPI0016)
> > > cxl_mem: The memory device's HDM decoders
> > >
> > > For port, probe() will figure out that the thing is an upstream port, call
> > > cxl_probe_component_regs and then call devm_cxl_add_port(). I think that's
> > > straight forward.
> >
> > I was expecting cxl_port_driver.probe() comes *after* port discovery.
> > Think of it like PCI discovery. Some agent does the hardware topology
> > scan to add devices, in this case devm_cxl_add_port(), and that
> > triggers cxl_port_driver to load. So the initial enumeration done by
> > the cxl_acpi driver will populate the first two levels of the port
> > hierarchy with port objects and populate their component register
> > physical base addresses. For any other port deeper in the hierarchy I
> > was expecting that to be scanned after the discovery of a cxl_memdev
> > that is not attached to the current hierarchy. So, for example imagine
> > a config like:
> >
> > Platform --> Host Bridge --> Switch --> Endpoint
> >
> > ...where in sysfs that's modeled as:
> >
> > root0 --> port1 --> port2 --> port3
> >
> > Where port3 is assuming that the CXL core models the device's
> > connection to the topology as yet another cxl_port. At the beginning
> > of time after cxl_acpi has loaded but before cxl_pci has discovered
> > the endpoint the topology is:
> >
> > root0 --> port1
> >
> > Upon the detection of the endpoint the CXL core can assume that all
> > intermediary switches between the root and this device have been
> > registered as PCI devices. So, it follows that endpoint device arrival
> > triggers "cxl_bus_rescan()" that goes and enumerates all the CXL
> > resources in the topology to produce:
> >
> > root0 --> port1 --> port2 --> port3
> >
>
> Ah, I had written about scan/rescan in an earlier version of my email but
> dropped it. I was actually going to suggest it being a sysfs attr, but I'm fine
> with it being implicit so long as...
>
> How do we assert that cxl_pci doesn't run before cxl_acpi has done anything?

I don't think we need to, or it's broken if the driver load order
matters. The nvdimm enabling code is an example of how to handle this.
The cxl_nvdimm object can be registered before the cxl_nvdimm_bridge,
or after, does not matter. If the cxl_nvdimm comes first it will
trigger the cxl_nvdimm_driver to load. The cxl_nvdimm_driver.probe()
routine finds no bridge present and probe() returns with a failure.
When the bridge arrives it does a rescan of the cxl_bus_type device
list and if it finds a cxl_nvdimm it re-triggers
cxl_nvdimm_driver.probe(). This time through cxl_nvdimm_driver.probe()
finds the bridge and registers the real nvdimm on the nvdimm_bus.

> I
> like the idea that the endpoint device can simply ask cxl_acpi to rescan, I just
> don't see how it works. I suppose we can queue up the requests to rescan in
> cxl_acpi if the ordering can't be guaranteed.

I think this means that the devm_cxl_add_port() would be triggered by
cxl_memdev_driver.probe() if and only if the parent pci_device of the
CXL endpoint is listed as a dport. If the cxl_memdev is registered
first the search it will search for the CXL root port on the
cxl_bus_type device list. If that fails then cxl_memdev_driver.probe()
fails. If that succeeds it asks the root to scan to the CXL endpoint
parent pci_device and return the confirmation that it is registered as
a dport. If that fails then the device is plugged into a pure PCIe
slot.

When cxl_acpi loads it retriggers all cxl_memdev_driver.probe() to
reconsider all cxl_memdev instances that failed to probe previously.

>
> > > For the memory device we've already probed the thing via class code so there is
> > > no need to use this driver registration, however, I think it would be nice to do
> > > so. Is there a clean way to do that?
> >
> > The PCI device associated with the endpoint is already probed, but the
> > cxl_memdev itself can have a driver on the CXL bus. So I think the
> > cxl_memdev driver should try to register a cxl_port after telling
> > cxl_acpi to rescan. If a check like "is_cxl_dport(pdev->dev.parent)"
> > for the endpoint returns false it means that the cxl_bus_rescan()
> > failed to enumerate the CXL topology to this endpoint and this
> > endpoint is limited to only CXL.io operation.
>
> What is going to invoke the memdev driver's probe? That is where we're talking
> about putting that is_cxl_dport(...) right? That is the part that tripped me up
> and inspired the original email FWIW.

I *think* I worked that out above, but yes please do poke at it to see
if it holds up.