Re: [PATCH RFC v2 3/3] cxl/memdev: Register for and process CPER events

From: Ira Weiny
Date: Tue Oct 31 2023 - 23:36:09 EST


Smita Koralahalli wrote:
> >

[snip]

> > +#define CXL_EVENT_HDR_FLAGS_REC_SEVERITY GENMASK(1, 0)
> > +int cxl_cper_event_call(struct notifier_block *nb, unsigned long action, void *data)
> > +{
> > + struct cxl_cper_notifier_data *nd = data;
> > + struct cxl_event_record_raw record = (struct cxl_event_record_raw) {
> > + .hdr.id = UUID_INIT(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
> > + };
> > + enum cxl_event_log_type log_type;
> > + struct cxl_memdev_state *mds;
> > + u32 hdr_flags;
> > +
> > + mds = container_of(nb, struct cxl_memdev_state, cxl_cper_nb);
> > +
> > + /* Need serial number for device identification */
> > + if (!(nd->rec->hdr.validation_bits & CPER_CXL_DEVICE_SN_VALID))
> > + return NOTIFY_DONE;
>
> For all the event records that I tested so far, this has never been
> true. That is CPER_CXL_DEVICE_SN_VALID is never set which might not log
> the records at all. Should we be bit more lenient here and include
> validating device_id (bdf) instead and check if cxlds exist?
>
> pci_get_domain_bus_and_slot() and pci_get_drvdata()..

Checking BDF is reasonable. Not sure what you mean by 'check if cxlds
exists'?

This will be called on each memdev so the cxlds should be valid. Do you
think we need some locking? I think the unregister will block device
removal if this is running.

>
> > +
> > + /* FIXME endianess and bytes of serial number need verification */
> > + /* FIXME Should other values be checked? */
> > + if (memcmp(&mds->cxlds.serial, &nd->rec->hdr.dev_serial_num,
> > + sizeof(mds->cxlds.serial)))
> > + return NOTIFY_DONE;
> > +
> > + /* ensure record can always handle the full CPER provided data */
> > + BUILD_BUG_ON(sizeof(record) <
> > + (CPER_CXL_COMP_EVENT_LOG_SIZE + sizeof(record.hdr.id)));
> > +
> > + /*
> > + * UEFI v2.10 defines N.2.14 defines the CXL CPER record as not
> > + * including the uuid field.
> > + */
> > + memcpy(&record.hdr.length, &nd->rec->comp_event_log,
> > + CPER_CXL_REC_LEN(nd->rec));
>
> I'm doubtful this will do the job.

I'm not sure why but, see below...

> I think we should copy into each
> field of struct cxl_event_record_hdr individually starting from length
> by pointer arithmetic (which is definitely bad, but I cannot think of a
> better way to do this) and then do memcpy for data field in struct
> cxl_event_record_raw..
>
> Any other suggestions would be helpful as well.

Based on Dan's suggestion to share the structures this memcpy can be
avoided altogether. Let's see how that works.

>
> I can make these changes and validate it on my end if that works..?

Any testing would be welcome. I don't have a test setup readily
available.

Ira

[snip]