Re: [PATCH v16 01/10] PCI/AER: Introduce AER-CXL Kfifo

From: Dan Williams

Date: Fri Mar 27 2026 - 20:29:32 EST


Terry Bowman wrote:
> CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
> soon. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used as an
> indication for the CXL drivers to handle and log the CXL RAS errors.
>
> Note, 'CXL protocol error' terminology will refer to CXL VH and not
> CXL RCH errors unless specifically noted going forward.
>
> Introduce a new file in the AER driver to handle the CXL protocol errors
> named pci/pcie/aer_cxl_vh.c.
>
> Add a kfifo work queue to be used by the AER and CXL drivers. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
> Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
>
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
>
> Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
> This will contain a reference to the PCI error source device and the error
> severity. This will be used when the work is dequeued by the cxl_core driver.
>
> Introduce cxl_forward_error() to take a given CXL protocol error and add it
> to a work structure before pushing onto the AER-CXL kfifo. This function
> takes a reference count increment of the PCI device. The kfifo consumer is
> responsible for reference decrementing. If there is an error on adding the
> work then this function must decrement the reference count.
>
> Synchronize accesses to the work function pointer during registration,
> deregistration, enqueue, and dequeue. Further synchronization fixes will
> be added in the following patch.
>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx>
>
> ---
>
> Changes in v15->v16:
> - Add pci_dev_put() and comment in pci_dev_get() (Dan)
> - /rw_sema/rwsema/ (Dan)

To be clear I asked for s/rw_sema/rwsem/, easy enough to fix up.

[..]
> drivers/pci/pcie/aer_cxl_vh.c | 87 +++++++++++++++++++++++++++++++++++
> create mode 100644 drivers/pci/pcie/aer_cxl_vh.c

Bjorn, do you want this additional burden to fall on PCI core reviewers,
or perhaps make this change to the "COMPUTE EXPRESS LINK (CXL)" entry?

diff --git a/MAINTAINERS b/MAINTAINERS
index 61bf550fd37c..1b46ac52839d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6438,6 +6438,8 @@ S: Maintained
F: Documentation/driver-api/cxl
F: Documentation/userspace-api/fwctl/fwctl-cxl.rst
F: drivers/cxl/
+F: drivers/pci/pcie/aer_cxl_rch.c
+F: drivers/pci/pcie/aer_cxl_vh.c
F: include/cxl/
F: include/uapi/linux/cxl_mem.h
F: tools/testing/cxl/

[..]
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> new file mode 100644
> index 000000000000..7e2bc1894395
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
[..]
> +void cxl_register_proto_err_work(struct work_struct *work)
> +{
> + guard(rwsem_write)(&cxl_proto_err_kfifo.rwsema);
> + cxl_proto_err_kfifo.work = work;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");

Probably more appropriate to make all of these exports:

EXPORT_SYMBOL_FOR_MODULES(..., "cxl_core");

> +void cxl_unregister_proto_err_work(void)
> +{
> + guard(rwsem_write)(&cxl_proto_err_kfifo.rwsema);
> + cxl_proto_err_kfifo.work = NULL;

Where is the work cancellation for this?

Oh, patch2 has it, that should really be in this patch. I would fold in
that fix... but it needs a bit more, see below.

> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_unregister_proto_err_work, "CXL");
> +
> +int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd)
> +{
> + guard(rwsem_read)(&cxl_proto_err_kfifo.rwsema);
> + return kfifo_get(&cxl_proto_err_kfifo.fifo, wd);

I just realized that this reacquires the semaphore on every invocation,
and leaks stranded references.

How about something like below which I think also address Jonathan's
concern about the awkwardness of the consumer needing to manage the
producer's refcounting. With this change all the refcounting is internal
to the producer, and it is properly cleaned up as to not leave PCI
devices with dangling refcounts if someone unloads the CXL core. The
rule is errors are dropped on the floor when the work handler is
missing, and any errors that race unregistration are also dropped on the
floor.

diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
index 348374859ee4..cc4e443511d0 100644
--- a/drivers/pci/pcie/aer_cxl_vh.c
+++ b/drivers/pci/pcie/aer_cxl_vh.c
@@ -72,16 +72,43 @@ void cxl_register_proto_err_work(struct work_struct *work)
}
EXPORT_SYMBOL_FOR_MODULES(cxl_register_proto_err_work, "cxl_core");

-void cxl_unregister_proto_err_work(void)
+static struct work_struct *cancel_cxl_proto_err(void)
{
+ struct work_struct *work;
+ struct cxl_proto_err_work_data wd;
+
guard(rwsem_write)(&cxl_proto_err_kfifo.rwsem);
+ work = cxl_proto_err_kfifo.work;
cxl_proto_err_kfifo.work = NULL;
+ while (kfifo_get(&cxl_proto_err_kfifo.fifo, &wd)) {
+ dev_err_ratelimited(&wd.pdev->dev,
+ "AER-CXL error report canceled\n");
+ pci_dev_put(wd.pdev);
+ }
+ return work;
+}
+
+void cxl_unregister_proto_err_work(void)
+{
+ struct work_struct *work = cancel_cxl_proto_err();
+
+ if (work)
+ cancel_work_sync(work);
}
EXPORT_SYMBOL_FOR_MODULES(cxl_unregister_proto_err_work, "cxl_core");

-int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd)
+int for_each_cxl_proto_err(struct cxl_proto_err_work_data *wd, int (*fn)(struct cxl_proto_err_work_data *))
{
+ int rc;
+
guard(rwsem_read)(&cxl_proto_err_kfifo.rwsem);
- return kfifo_get(&cxl_proto_err_kfifo.fifo, wd);
+ while (kfifo_get(&cxl_proto_err_kfifo.fifo, wd)) {
+ rc = fn(wd);
+ pci_dev_put(wd->pdev);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
}
-EXPORT_SYMBOL_FOR_MODULES(cxl_proto_err_kfifo_get, "cxl_core");
+EXPORT_SYMBOL_FOR_MODULES(for_each_cxl_proto_err, "cxl_core");
diff --git a/include/linux/aer.h b/include/linux/aer.h
index f351e41dd979..8d60fd97ed67 100644
--- a/include/linux/aer.h
+++ b/include/linux/aer.h
@@ -79,12 +79,19 @@ static inline void pci_aer_unmask_internal_errors(struct pci_dev *dev) { }
struct work_struct;

#ifdef CONFIG_CXL_RAS
-int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd);
void cxl_register_proto_err_work(struct work_struct *work);
+int for_each_cxl_proto_err(struct cxl_proto_err_work_data *wd,
+ int (*fn)(struct cxl_proto_err_work_data *));
void cxl_unregister_proto_err_work(void);
#else
static inline int cxl_proto_err_kfifo_get(struct cxl_proto_err_work_data *wd) { return 0; }
static inline void cxl_register_proto_err_work(struct work_struct *work) { }
+static inline int
+for_each_cxl_proto_err(struct cxl_proto_err_work_data *wd,
+ int (*fn)(struct cxl_proto_err_work_data *))
+{
+ return 0;
+}
static inline void cxl_unregister_proto_err_work(void) { }
#endif