Re: [PATCH V13 6/9] cxl/port: Read CDAT table

From: Ira Weiny
Date: Thu Jul 14 2022 - 21:10:18 EST


On Thu, Jul 14, 2022 at 01:19:03PM -0700, Dan Williams wrote:
> Ira Weiny wrote:
> > On Thu, Jul 14, 2022 at 08:38:29AM -0700, Dan Williams wrote:
> > > ira.weiny@ wrote:
> > > > From: Ira Weiny <ira.weiny@xxxxxxxxx>
> >
> > [snip]
> >
> > > > ---
> > > > drivers/cxl/cdat.h | 100 ++++++++++++++++++++++++
> > > > drivers/cxl/core/pci.c | 170 +++++++++++++++++++++++++++++++++++++++++
> > > > drivers/cxl/cxl.h | 5 ++
> > > > drivers/cxl/cxlpci.h | 1 +
> > > > drivers/cxl/port.c | 51 +++++++++++++
> > > > 5 files changed, 327 insertions(+)
> > > > create mode 100644 drivers/cxl/cdat.h
> > >
> > > The bin_attr additions to CXL sysfs need to be added to
> > > Documentation/ABI/testing/sysfs-bus-cxl.
> >
> > Done. But with a very simple text:
> >
> > What: /sys/bus/cxl/devices/endpointX/CDAT/cdat
> > Date: July, 2022
> > KernelVersion: v5.19
> > Contact: linux-cxl@xxxxxxxxxxxxxxx
> > Description:
> > (RO) Raw binary CDAT data.
> >
> > Is this enough?
>
> It should talk about what it means when this is absent, present and
> 0-sized, present and non-zero sized.

ok.

[snip]

> >
> > >
> > > > +{
> > > > + struct cxl_memdev *cxlmd;
> > > > + struct cxl_dev_state *cxlds;
> > > > + unsigned long index;
> > > > + void *entry;
> > > > +
> > > > + if (!is_cxl_memdev(uport))
> > > > + return NULL;
> > >
> > > If this only supports endpoints just call this from the part of
> > > cxl_port_probe() that does:
> > >
> > > if (is_cxl_endpoint(port)) {
> > > struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport);
> > > ...
> > >
> > > ...and let the person who adds CDAT support for switches refactor it
> > > later. I.e. find_cdat_doe() can just take a @cxlmd arg directly.
> > >
> >
> > I was trying to make this generic for all ports originally. But in the end we
> > punted and said end points only since we don't have the plumbing for PCI port
> > driver yet. Won't I have to move this back when that comes online?
>
> Yes, but:
>
> if (!is_cxl_memdev(uport))
> return NULL;
>
> ...in this path has no reason to exist in v5.19. The need to consider
> switch-port-CDAT is at least a full kernel dev-cycle away.

Ok removed this check and called read_cdat_data() within is_cxl_memdev() check
in cxl_port_probe().

>
> > > It's
> > > enough that this patch located the cdat cached buffer on 'struct
> > > cxl_port' and the 'CDAT' sysfs attribute under
> > > /sys/bus/cxl/device/cxl_portX.
> >
> > I'm not parsing this sentence?
> >
> > .../cxl_portX/...
> >
> > does not exist?
>
> No, I'm saying that the sysfs location is fine, and find_cdat_doe() need
> not consider switch-port-CDAT yet.

ok.

>
> >
> > root fedora /sys/bus/cxl/drivers
> > 12:46:39 > ll
> > total 0
> > drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_mem
> > drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm
> > drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_nvdimm_bridge
> > drwxr-xr-x 2 root root 0 Jul 14 12:44 cxl_port
> >
> > endpointX points back to the same place as /sys/bus/cxl/devices/endpointX
> >
> > root fedora /sys/bus/cxl/drivers/cxl_port
> > 12:47:03 > ll
> > total 0
> > ...
> > lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
> > ...
> >
> > root fedora /sys/bus/cxl/devices
> > 12:47:59 > ll
> > total 0
> > ...
> > lrwxrwxrwx 1 root root 0 Jul 14 12:44 endpoint3 -> ../../../devices/platform/ACPI0017:00/root0/port1/endpoint3
> >
> > So what is different from what I'm doing?
>
> Nothing wrong here, sysfs CDAT location is fine.
>
> >
> > >
> > > > +
> > > > + cxlmd = to_cxl_memdev(uport);
> > > > + cxlds = cxlmd->cxlds;
> > > > +
> > > > + xa_for_each(&cxlds->doe_mbs, index, entry) {
> > > > + struct pci_doe_mb *cur = entry;
> > > > +
> > > > + if (pci_doe_supports_prot(cur, PCI_DVSEC_VENDOR_ID_CXL,
> > > > + CXL_DOE_PROTOCOL_TABLE_ACCESS))
> > > > + return cur;
> > >
> > > This has me thinking the doe_mbs xarray is overkill if all the other
> > > DOEs are just enumerated and then ignored. When / if more DOEs need to
> > > be handled then we can think about instantiating all of them and keeping
> > > them aruond. Otherwise, just do this pci_doe_supports_prot() check in
> > > cxl_pci and throw away the rest. Then cxlds would just carry a single
> > > @cdat_doe attribute for this first use of DOE capabilities in the
> > > kernel.
> >
> > I was hoping that code could be lifted to the PCI side later. For now CXL
> > could handle all DOE's.
>
> It's only a couple of code to fetch the CDAT DOE out of all the DOEs.
>
> > At this point CXL devices are the only one using DOE in the kernel. I know
> > there are other PCI uses coming but for any 1 device only one mailbox object
> > should be created or we are going to have the kernel drivers conflicting with
> > each other.
>
> In what scenario would a PCI driver conflict with another PCI driver for
> the same device?

If the ownership of struct pci_doe_mb is not clear each could iterates the DOE
offsets find that the same mailbox supports 2 different protocols that each
needs and they both create a struct pci_doe_mb for that offset.

And bad things will happen. :-(

I know that we have cautioned implementations not to make the protocol support
on mailboxes too complex but the spec does not preclude this. You have
explained in the past that the CXL drivers are parallel drivers to the PCI
layer. In this case PCI should eventually control the mailbox objects and

>
> In the CXL case cxl_pci finds the one DOE it cares about and throws away
> the rest and nothing else in the kernel will ever conflict with that.
>
> Don't try to solve future problems until they become present problems.
>
> > So I'd rather leave this series with CXL controlling the DOE mailboxes.
>
> Sounds good, I'm not asking you to change that, just asking for this few
> lines of filtering to move to cxl_pci.

Ok.

>
> > I think later we will need to lift the mailbox objects to PCI and CXL
> > drivers will need to query into that side as needed. There was a
> > version like this before (Probably Jonathan's) and I think the aux bus
> > moved it all over here. <sigh> Sorry getting pretty confused myself
> > at this point.
>
> Sure cross that bridge when we get to it.

ok.

>
> > The code change you suggest is easy to do but I think it would be better to
> > land this and then lift all of the mailboxes to PCI in a follow on series.
>
> That follow on move is not prevented by just the small move of this
> filter to cxl_pci, right?

Yes I'm starting to rethink all this and I think this will simplify this patch
set and get this support in.

>
> > > > +#define CDAT_DOE_REQ(entry_handle) \
> > > > + (FIELD_PREP(CXL_DOE_TABLE_ACCESS_REQ_CODE, \
> > > > + CXL_DOE_TABLE_ACCESS_REQ_CODE_READ) | \
> > > > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_TABLE_TYPE, \
> > > > + CXL_DOE_TABLE_ACCESS_TABLE_TYPE_CDATA) | \
> > > > + FIELD_PREP(CXL_DOE_TABLE_ACCESS_ENTRY_HANDLE, (entry_handle)))
> > > > +
> > > > +static void cxl_doe_task_complete(struct pci_doe_task *task)
> > > > +{
> > > > + complete(task->private);
> > > > +}
> > > > +
> > > > +static int cxl_cdat_get_length(struct device *dev,
> > > > + struct pci_doe_mb *cdat_mb,
> > > > + size_t *length)
> > > > +{
> > > > + u32 cdat_request_pl = CDAT_DOE_REQ(0);
> > > > + u32 cdat_response_pl[32];
> > > > + DECLARE_COMPLETION_ONSTACK(c);
> > > > + struct pci_doe_task task = {
> > > > + .prot.vid = PCI_DVSEC_VENDOR_ID_CXL,
> > > > + .prot.type = CXL_DOE_PROTOCOL_TABLE_ACCESS,
> > > > + .request_pl = &cdat_request_pl,
> > > > + .request_pl_sz = sizeof(cdat_request_pl),
> > > > + .response_pl = cdat_response_pl,
> > > > + .response_pl_sz = sizeof(cdat_response_pl),
> > > > + .complete = cxl_doe_task_complete,
> > > > + .private = &c,
> > > > + };
> > >
> > > I think this wants to be macro'ized to something like:
> > >
> > > #define DECLARE_PCI_DOE_TASK(vid, type, req, rsp, priv)
> >
> > I was thinking the same thing after you wanted the task init here too.
> >
> > Can I follow on with that work? I think the following members need to be
> > wrapped and hidden from the caller API.
>
> Since this patch already needs a respin you could just do the simple
> DECLARE_PCI_DOE_TASK() now and figure out the hiding of @work later.
> I.e. keep the INIT_WORK() in submit for now.

Never mind what I said. I was thinking you wanted a generic macro to declare a
task in pci-doe.h. But I see what you are saying now.

Ira