Re: [PATCH 1/5] cxl/core: Add cxl-bus driver infrastructure

From: Ben Widawsky
Date: Fri Jun 11 2021 - 15:28:37 EST


On 21-06-11 11:55:39, Dan Williams wrote:
> On Fri, Jun 11, 2021 at 10:47 AM Ben Widawsky <ben.widawsky@xxxxxxxxx> wrote:
> >
> > On 21-06-10 15:26:03, Dan Williams wrote:
> > > Enable devices on the 'cxl' bus to be attached to drivers. The initial
> > > user of this functionality is a driver for an 'nvdimm-bridge' device
> > > that anchors a libnvdimm hierarchy attached to CXL persistent memory
> > > resources. Other device types that will leverage this include:
> > >
> > > cxl_port: map and use component register functionality (HDM Decoders)
> >
> > Since I'm looking at this now, perhaps I can open the discussion here. Have you
> > thought about how this works yet? Right now I'm thinking there are two "drivers":
> > cxl_port: Switches (and ACPI0016)
> > cxl_mem: The memory device's HDM decoders
> >
> > For port, probe() will figure out that the thing is an upstream port, call
> > cxl_probe_component_regs and then call devm_cxl_add_port(). I think that's
> > straight forward.
>
> I was expecting cxl_port_driver.probe() comes *after* port discovery.
> Think of it like PCI discovery. Some agent does the hardware topology
> scan to add devices, in this case devm_cxl_add_port(), and that
> triggers cxl_port_driver to load. So the initial enumeration done by
> the cxl_acpi driver will populate the first two levels of the port
> hierarchy with port objects and populate their component register
> physical base addresses. For any other port deeper in the hierarchy I
> was expecting that to be scanned after the discovery of a cxl_memdev
> that is not attached to the current hierarchy. So, for example imagine
> a config like:
>
> Platform --> Host Bridge --> Switch --> Endpoint
>
> ...where in sysfs that's modeled as:
>
> root0 --> port1 --> port2 --> port3
>
> Where port3 is assuming that the CXL core models the device's
> connection to the topology as yet another cxl_port. At the beginning
> of time after cxl_acpi has loaded but before cxl_pci has discovered
> the endpoint the topology is:
>
> root0 --> port1
>
> Upon the detection of the endpoint the CXL core can assume that all
> intermediary switches between the root and this device have been
> registered as PCI devices. So, it follows that endpoint device arrival
> triggers "cxl_bus_rescan()" that goes and enumerates all the CXL
> resources in the topology to produce:
>
> root0 --> port1 --> port2 --> port3
>

Ah, I had written about scan/rescan in an earlier version of my email but
dropped it. I was actually going to suggest it being a sysfs attr, but I'm fine
with it being implicit so long as...

How do we assert that cxl_pci doesn't run before cxl_acpi has done anything? I
like the idea that the endpoint device can simply ask cxl_acpi to rescan, I just
don't see how it works. I suppose we can queue up the requests to rescan in
cxl_acpi if the ordering can't be guaranteed.

> > For the memory device we've already probed the thing via class code so there is
> > no need to use this driver registration, however, I think it would be nice to do
> > so. Is there a clean way to do that?
>
> The PCI device associated with the endpoint is already probed, but the
> cxl_memdev itself can have a driver on the CXL bus. So I think the
> cxl_memdev driver should try to register a cxl_port after telling
> cxl_acpi to rescan. If a check like "is_cxl_dport(pdev->dev.parent)"
> for the endpoint returns false it means that the cxl_bus_rescan()
> failed to enumerate the CXL topology to this endpoint and this
> endpoint is limited to only CXL.io operation.

What is going to invoke the memdev driver's probe? That is where we're talking
about putting that is_cxl_dport(...) right? That is the part that tripped me up
and inspired the original email FWIW.

>
> > Also, I'd like to make sure we're on the same page about struct cxl_decoder.
> > Right now they are only created for active HDM decoders.
>
> No, I was expecting they are also created for inactive ones. I am
> thinking that all decoders ultimately belong to the cxl_acpi driver,
> or whatever driver is acting as the root on a non-ACPI system. All
> decoder programming is driven by region activation stimulus that asks
> the root driver to try to establish a decode chain through the
> hieararchy per a given region.
>
> > Going forward, we can
> > either maintain a count of unused decoders on the given CXL component, or we can
> > instantiate a struct cxl_decoder that isn't active, ie. no interleave ways
> > granularit, base, etc. What's your thinking there?
>
> All resources are enumerated, just like PCI. Decode setup belongs to
> the core, just like PCI MMIO resource setup. The difference is that
> port drivers are needed to map component registers and service
> requests from cxl_acpi to reconfigure, but other than that
> cxl_decoders themselves don't have drivers and just reflect the
> current state of what cxl_acpi / cxl_core have established.

Okay.