Re: [PATCH 1/5] cxl/core: Add cxl-bus driver infrastructure
From: Ben Widawsky
Date: Mon Jun 14 2021 - 17:40:36 EST
On 21-06-11 16:25:05, Dan Williams wrote:
> On Fri, Jun 11, 2021 at 12:28 PM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> >
> > On 21-06-11 11:55:39, Dan Williams wrote:
> > > On Fri, Jun 11, 2021 at 10:47 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> > > >
> > > > On 21-06-10 15:26:03, Dan Williams wrote:
> > > > > Enable devices on the 'cxl' bus to be attached to drivers. The initial
> > > > > user of this functionality is a driver for an 'nvdimm-bridge' device
> > > > > that anchors a libnvdimm hierarchy attached to CXL persistent memory
> > > > > resources. Other device types that will leverage this include:
> > > > >
> > > > > cxl_port: map and use component register functionality (HDM Decoders)
> > > >
> > > > Since I'm looking at this now, perhaps I can open the discussion here. Have you
> > > > thought about how this works yet? Right now I'm thinking there are two "drivers":
> > > > cxl_port: Switches (and ACPI0016)
> > > > cxl_mem: The memory device's HDM decoders
> > > >
> > > > For port, probe() will figure out that the thing is an upstream port, call
> > > > cxl_probe_component_regs and then call devm_cxl_add_port(). I think that's
> > > > straight forward.
> > >
> > > I was expecting cxl_port_driver.probe() comes *after* port discovery.
> > > Think of it like PCI discovery. Some agent does the hardware topology
> > > scan to add devices, in this case devm_cxl_add_port(), and that
> > > triggers cxl_port_driver to load. So the initial enumeration done by
> > > the cxl_acpi driver will populate the first two levels of the port
> > > hierarchy with port objects and populate their component register
> > > physical base addresses. For any other port deeper in the hierarchy I
> > > was expecting that to be scanned after the discovery of a cxl_memdev
> > > that is not attached to the current hierarchy. So, for example imagine
> > > a config like:
> > >
> > > Platform --> Host Bridge --> Switch --> Endpoint
> > >
> > > ...where in sysfs that's modeled as:
> > >
> > > root0 --> port1 --> port2 --> port3
> > >
> > > Where port3 is assuming that the CXL core models the device's
> > > connection to the topology as yet another cxl_port. At the beginning
> > > of time after cxl_acpi has loaded but before cxl_pci has discovered
> > > the endpoint the topology is:
> > >
> > > root0 --> port1
> > >
> > > Upon the detection of the endpoint the CXL core can assume that all
> > > intermediary switches between the root and this device have been
> > > registered as PCI devices. So, it follows that endpoint device arrival
> > > triggers "cxl_bus_rescan()" that goes and enumerates all the CXL
> > > resources in the topology to produce:
> > >
> > > root0 --> port1 --> port2 --> port3
> > >
> >
> > Ah, I had written about scan/rescan in an earlier version of my email but
> > dropped it. I was actually going to suggest it being a sysfs attr, but I'm fine
> > with it being implicit so long as...
> >
> > How do we assert that cxl_pci doesn't run before cxl_acpi has done anything?
>
> I don't think we need to, or it's broken if the driver load order
> matters. The nvdimm enabling code is an example of how to handle this.
> The cxl_nvdimm object can be registered before the cxl_nvdimm_bridge,
> or after, does not matter. If the cxl_nvdimm comes first it will
> trigger the cxl_nvdimm_driver to load. The cxl_nvdimm_driver.probe()
> routine finds no bridge present and probe() returns with a failure.
> When the bridge arrives it does a rescan of the cxl_bus_type device
> list and if it finds a cxl_nvdimm it re-triggers
> cxl_nvdimm_driver.probe(). This time through cxl_nvdimm_driver.probe()
> finds the bridge and registers the real nvdimm on the nvdimm_bus.
>
> > I
> > like the idea that the endpoint device can simply ask cxl_acpi to rescan, I just
> > don't see how it works. I suppose we can queue up the requests to rescan in
> > cxl_acpi if the ordering can't be guaranteed.
>
> I think this means that the devm_cxl_add_port() would be triggered by
> cxl_memdev_driver.probe() if and only if the parent pci_device of the
> CXL endpoint is listed as a dport. If the cxl_memdev is registered
> first the search it will search for the CXL root port on the
> cxl_bus_type device list. If that fails then cxl_memdev_driver.probe()
> fails. If that succeeds it asks the root to scan to the CXL endpoint
> parent pci_device and return the confirmation that it is registered as
> a dport. If that fails then the device is plugged into a pure PCIe
> slot.
>
> When cxl_acpi loads it retriggers all cxl_memdev_driver.probe() to
> reconsider all cxl_memdev instances that failed to probe previously.
>
> >
> > > > For the memory device we've already probed the thing via class code so there is
> > > > no need to use this driver registration, however, I think it would be nice to do
> > > > so. Is there a clean way to do that?
> > >
> > > The PCI device associated with the endpoint is already probed, but the
> > > cxl_memdev itself can have a driver on the CXL bus. So I think the
> > > cxl_memdev driver should try to register a cxl_port after telling
> > > cxl_acpi to rescan. If a check like "is_cxl_dport(pdev->dev.parent)"
> > > for the endpoint returns false it means that the cxl_bus_rescan()
> > > failed to enumerate the CXL topology to this endpoint and this
> > > endpoint is limited to only CXL.io operation.
> >
> > What is going to invoke the memdev driver's probe? That is where we're talking
> > about putting that is_cxl_dport(...) right? That is the part that tripped me up
> > and inspired the original email FWIW.
>
> I *think* I worked that out above, but yes please do poke at it to see
> if it holds up.
I think it works. I have some concerns around synchronization of memdev probing
and cxl_acpi enumerating, but I believe it's workable.
Thanks for the thought.