Re: [PATCH v4 7/7] cxl/memdev: Register for and process CPER events

From: Ira Weiny
Date: Tue Dec 19 2023 - 18:27:33 EST


Jonathan Cameron wrote:
> On Fri, 15 Dec 2023 15:26:33 -0800
> Ira Weiny <ira.weiny@xxxxxxxxx> wrote:
>
> > If the firmware has configured CXL event support to be firmware first
> > the OS can process those events through CPER records. The CXL layer has
> > unique DPA to HPA knowledge and standard event trace parsing in place.
> >
> > CPER records contain Bus, Device, Function information which can be used
> > to identify the PCI device which is sending the event.
> >
> > Change pci driver registration to include registration for a CXL CPER
> > notifier to process the events through the trace subsystem.
> >
> > Define and use scoped based management to simplify the handling of the
> > pci device object.
> >
> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> > Signed-off-by: Ira Weiny <ira.weiny@xxxxxxxxx>
>
> I'd break out the pci guard stuff as a precursor patch. That's likely
> to be used elsewhere so it would help for backporting for other users
> if it wasn't buried in a patch doing other stuff.

That is good. I've done that.

>
> Not to mention that has a different set of likely reviewers to the rest
> of this patch.

Yep.

>
> More generally maybe we should just hardcode the UUID in the tracepoint
> definitions? I think for everything other than the generic one we
> only ever call trace_cxl_memory_module(... &mem_mod_event_uuid..)
> etc.
>
> It's a little ugly to match on the UUID to call a function where it
> hard coded, but less so than inserting the UUID like this does.
> Better I think to make it obvious that this isn't actually a variable
> (for the ones we understand).

I thought about that a bit. But I built the tracing code with generic
macros which contained the UUID. That complicated my efforts.

I've reworked it again and it took a bit of time but I got it to work. It
was not that hard but there is a caveat in the generic macros, which I
made a note of.

[snip]

> >
> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +static void cxl_cper_event_call(enum cxl_event_type ev_type,
> > + struct cxl_cper_event_rec *rec)
> > +{
> > + struct cper_cxl_event_devid *device_id = &rec->hdr.device_id;
> > + struct pci_dev *pdev __free(pci_dev_put) = NULL;
> > + struct cxl_dev_state *cxlds = NULL;
> > + enum cxl_event_log_type log_type;
> > + unsigned int devfn;
> > + u32 hdr_flags;
> > +
> > + devfn = PCI_DEVFN(device_id->device_num, device_id->func_num);
> > + pdev = pci_get_domain_bus_and_slot(device_id->segment_num,
> > + device_id->bus_num, devfn);
> > + if (!pdev)
> > + return;
> > +
> > + guard(device)(&pdev->dev);
> > + if (pdev->driver == &cxl_pci_driver)
> > + cxlds = pci_get_drvdata(pdev);
> > + if (!cxlds)
> > + return;
>
> This is handling two conditions. I'd find it more readable split like:
>
> if (pdev->driver != &cxl_pci_driver)
> return;
>
> cxlds = pci_get_drvdata(pdev);
> if (!cxlds)
> return;
>
> and drop the = NULL above.

Done.

Ira