Re: [PATCH v16 05/10] PCI: Establish common CXL Port protocol error flow
From: Dan Williams
Date: Sun Mar 29 2026 - 20:09:08 EST
Terry Bowman wrote:
> Introduce CXL Port protocol error handling callbacks to unify detection,
> logging, and recovery across CXL Ports and Endpoints. Establish a consistent
> flow for correctable and uncorrectable CXL protocol errors. Support for RCH
> Downstream Port error handling will be added in a future patch.
>
> Provide the solution by adding cxl_port_cor_error_detected() and
> cxl_port_error_detected() to handle correctable and uncorrectable handling
> through CXL RAS helpers, coordinating uncorrectable recovery in
> cxl_do_recovery(), and panicking when the handler returns PCI_ERS_RESULT_PANIC
> to preserve fatal cachemem behavior. Gate Endpoint handling on the Endpoint
> driver being bound to avoid processing errors on disabled devices.
>
> Centralize the RAS base lookup in cxl_get_ras_base(), selecting the
> downstream-port dport->regs.ras for Root/Downstream Ports and port->regs.ras
> for Upstream Ports/Endpoints.
>
> Export pcie_clear_device_status() and pci_aer_clear_fatal_status() to enable
> cxl_core to clear PCIe/AER state in these flows.
>
> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx>
> Acked-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx>
> Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
[..]
> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h
> index 5051800882c5..0eb2e28bb2c2 100644
> --- a/drivers/cxl/core/core.h
> +++ b/drivers/cxl/core/core.h
> @@ -208,6 +208,9 @@ static inline void devm_cxl_dport_ras_setup(struct cxl_dport *dport) { }
> #endif /* CONFIG_CXL_RAS */
>
> int cxl_gpf_port_setup(struct cxl_dport *dport);
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport);
> +struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev);
>
> struct cxl_hdm;
> int cxl_hdm_decode_init(struct cxl_dev_state *cxlds, struct cxl_hdm *cxlhdm,
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 0c5957d1d329..27271402915f 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -1386,8 +1386,8 @@ static struct cxl_port *__find_cxl_port(struct cxl_find_port_ctx *ctx)
> return NULL;
> }
>
> -static struct cxl_port *find_cxl_port(struct device *dport_dev,
> - struct cxl_dport **dport)
> +struct cxl_port *find_cxl_port(struct device *dport_dev,
> + struct cxl_dport **dport)
Now that these is becoming a public function, and the fact that the
"find" namespace is getting crowded, it deserves more description. It
also turns out that the process of describing this appears to generate a
better name / calling convention for get_cxl_port(), see below.
[..]
> @@ -185,6 +175,117 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>
> +/*
> + * get_cxl_port - Return the parent CXL Port of a PCI device
> + * @pdev: PCI device whose parent CXL Port is being queried
> + *
> + * Looks up and returns the parent CXL Port associated with @pdev. On
> + * success, the returned port has its reference count incremented and must
> + * be released by the caller. Returns NULL if no associated CXL port is
> + * found.
> + *
> + * Return: Pointer to the parent &struct cxl_port or NULL on failure
> + */
No, 'struct cxl_port' instances are not "parented" by PCI devices. A
'struct cxl_port' is a "companion" device. This is similar in spirit to
how PCI devices can have ACPI / OF device companions.
This is also clearly another "find_cxl_port" routine, but in this case
it is "find_cxl_port_by_dev" where it is ambiguous whether the @dev is a
dport or uport.
It is also unfortunate that this code is duplicated inside
cxl_get_ras_base().
> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> +{
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM: {
> + struct cxl_dport *dport;
> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
This is a "find" function, why is it reporting errors? Only the caller
might know if this an error condition. Indeed callers do have their own:
cxl_proto_err_work_fn():
+ pr_err_ratelimited("%s: Failed to find parent port device in CXL topology\n",
+ pci_name(pdev));
cxl_do_recovery()
+ pci_err(pdev, "Failed to find the CXL device\n");
Why is an uncoditional error print followed by a rate-limited print of
the same information?
Push the error logging to the location that has the proper context.
> + return NULL;
> + }
> + return port;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + case PCI_EXP_TYPE_ENDPOINT:
> + case PCI_EXP_TYPE_RC_END: {
> + struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + }
> +
> + pr_err_ratelimited("%s: Error - Unsupported device type (%#x)",
> + pci_name(pdev), pci_pcie_type(pdev));
Set aside that this find function probably should not be tasked with
reporting errors, I notice that pci_notice_ratelimited() and
pci_info_ratelimited() exist. This seems to be asking for
pci_err_ratelimited().
> + return NULL;
> +}
> +
> +static u64 cxl_serial_number(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> + struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> + struct device *port_dev = port ? port->uport_dev : NULL;
> + struct cxl_memdev *cxlmd;
> +
> + if (!port_dev || !is_cxl_memdev(dev))
> + return 0;
> +
> + cxlmd = to_cxl_memdev(port_dev);
> + return cxlmd->cxlds->serial;
This is quite a bit of work just get a serial number. Given we are
already committed to reading some device registers, why not a few more?
I.e. just call pci_get_dsn() as needed. That will also happen to work
for switches that have serial numbers.
> +}
> +
> +static void __iomem *cxl_get_ras_base(struct device *dev)
> +{
> + struct pci_dev *pdev = to_pci_dev(dev);
> +
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM: {
> + struct cxl_dport *dport = NULL;
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!dport) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return dport->regs.ras;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + case PCI_EXP_TYPE_ENDPOINT:
> + case PCI_EXP_TYPE_RC_END: {
> + struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port->regs.ras;
> + }
> + }
> + dev_warn_once(dev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
> + return NULL;
> +}
Here is a cleanup patch I will send incrementally to change the
get_cxl_port() calling convention to find_cxl_port_by_dev() and reuse
the result rather than duplicating the lookup and the implementation.
---
drivers/cxl/core/ras.c | 116 +++++++++++++----------------------------
1 file changed, 35 insertions(+), 81 deletions(-)
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 1d4be2d78469..3c503ae870df 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -176,97 +176,47 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
/*
- * get_cxl_port - Return the parent CXL Port of a PCI device
- * @pdev: PCI device whose parent CXL Port is being queried
+ * find_cxl_port_by_dev - Use @dev as hint to do a _by_dport or _by_uport lookup
+ * @dev: generic device that may either by a companion of port or target dport
+ * @dport: mandatory output parameter that is set when @dev is determined to be
+ * a companion of a dport.
*
- * Looks up and returns the parent CXL Port associated with @pdev. On
- * success, the returned port has its reference count incremented and must
- * be released by the caller. Returns NULL if no associated CXL port is
- * found.
- *
- * Return: Pointer to the parent &struct cxl_port or NULL on failure
+ * Return a 'struct cxl_port' with an elevated reference if found. Use
+ * __free(put_cxl_port) to release.
*/
-static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
+static struct cxl_port *find_cxl_port_by_dev(struct device *dev, struct cxl_dport **dport)
{
+ struct pci_dev *pdev;
+
+ *dport = NULL;
+ if (!dev_is_pci(dev))
+ return NULL;
+
+ pdev = to_pci_dev(dev);
switch (pci_pcie_type(pdev)) {
case PCI_EXP_TYPE_ROOT_PORT:
- case PCI_EXP_TYPE_DOWNSTREAM: {
- struct cxl_dport *dport;
- struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
-
- if (!port) {
- pci_err(pdev, "Failed to find the CXL device");
- return NULL;
- }
- return port;
- }
+ case PCI_EXP_TYPE_DOWNSTREAM:
+ return find_cxl_port_by_dport(dev, dport);
case PCI_EXP_TYPE_UPSTREAM:
case PCI_EXP_TYPE_ENDPOINT:
- case PCI_EXP_TYPE_RC_END: {
- struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
-
- if (!port) {
- pci_err(pdev, "Failed to find the CXL device");
- return NULL;
- }
- return port;
- }
+ case PCI_EXP_TYPE_RC_END:
+ return find_cxl_port_by_uport(dev);
}
- pr_err_ratelimited("%s: Error - Unsupported device type (%#x)",
- pci_name(pdev), pci_pcie_type(pdev));
return NULL;
}
-static u64 cxl_serial_number(struct device *dev)
+static void __iomem *to_ras_base(struct cxl_port *port, struct cxl_dport *dport)
{
- struct pci_dev *pdev = to_pci_dev(dev);
- struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
- struct device *port_dev = port ? port->uport_dev : NULL;
- struct cxl_memdev *cxlmd;
-
- if (!port_dev || !is_cxl_memdev(dev))
- return 0;
-
- cxlmd = to_cxl_memdev(port_dev);
- return cxlmd->cxlds->serial;
-}
-
-static void __iomem *cxl_get_ras_base(struct device *dev)
-{
- struct pci_dev *pdev = to_pci_dev(dev);
-
- switch (pci_pcie_type(pdev)) {
- case PCI_EXP_TYPE_ROOT_PORT:
- case PCI_EXP_TYPE_DOWNSTREAM: {
- struct cxl_dport *dport = NULL;
- struct cxl_port *port __free(put_cxl_port) = find_cxl_port(&pdev->dev, &dport);
-
- if (!dport) {
- pci_err(pdev, "Failed to find the CXL device");
- return NULL;
- }
+ if (!port)
+ return NULL;
+ if (dport)
return dport->regs.ras;
- }
- case PCI_EXP_TYPE_UPSTREAM:
- case PCI_EXP_TYPE_ENDPOINT:
- case PCI_EXP_TYPE_RC_END: {
- struct cxl_port *port __free(put_cxl_port) = find_cxl_port_by_uport(&pdev->dev);
-
- if (!port) {
- pci_err(pdev, "Failed to find the CXL device");
- return NULL;
- }
- return port->regs.ras;
- }
- }
- dev_warn_once(dev, "Error: Unsupported device type (%#x)", pci_pcie_type(pdev));
- return NULL;
+ return port->regs.ras;
}
-static void cxl_do_recovery(struct pci_dev *pdev)
+static void cxl_do_recovery(struct pci_dev *pdev, struct cxl_port *port, struct cxl_dport *dport)
{
- struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
struct device *dev = &pdev->dev;
pci_ers_result_t status;
@@ -275,7 +225,8 @@ static void cxl_do_recovery(struct pci_dev *pdev)
return;
}
- status = cxl_handle_ras(dev, cxl_serial_number(dev), cxl_get_ras_base(dev));
+ status = cxl_handle_ras(dev, pci_get_dsn(pdev),
+ to_ras_base(port, dport));
if (status == PCI_ERS_RESULT_PANIC)
panic("CXL cachemem error.");
@@ -429,7 +380,8 @@ pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
}
EXPORT_SYMBOL_NS_GPL(cxl_error_detected, "CXL");
-static void cxl_handle_proto_error(struct pci_dev *pdev, int severity)
+static void cxl_handle_proto_error(struct pci_dev *pdev, struct cxl_port *port,
+ struct cxl_dport *dport, int severity)
{
if (severity == AER_CORRECTABLE) {
struct device *dev = &pdev->dev;
@@ -442,11 +394,11 @@ static void cxl_handle_proto_error(struct pci_dev *pdev, int severity)
pdev->aer_cap + PCI_ERR_COR_STATUS,
0, PCI_ERR_COR_INTERNAL);
- cxl_handle_cor_ras(dev, cxl_serial_number(dev),
- cxl_get_ras_base(dev));
+ cxl_handle_cor_ras(dev, pci_get_dsn(pdev),
+ to_ras_base(port, dport));
pcie_clear_device_status(pdev);
} else {
- cxl_do_recovery(pdev);
+ cxl_do_recovery(pdev, port, dport);
}
}
@@ -460,7 +412,9 @@ static void cxl_proto_err_work_fn(struct work_struct *work)
*/
while (cxl_proto_err_kfifo_get(&wd)) {
struct pci_dev *pdev __free(pci_dev_put) = wd.pdev;
- struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
+ struct cxl_dport *dport;
+ struct cxl_port *port __free(put_cxl_port) =
+ find_cxl_port_by_dev(&pdev->dev, &dport);
if (!port) {
pr_err_ratelimited("%s: Failed to find parent port device in CXL topology\n",
@@ -475,7 +429,7 @@ static void cxl_proto_err_work_fn(struct work_struct *work)
continue;
}
- cxl_handle_proto_error(pdev, wd.severity);
+ cxl_handle_proto_error(pdev, port, dport, wd.severity);
}
}
--
2.53.0
[..]
> +int cxl_ras_init(void)
> +{
> + if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
> + pr_err("Failed to initialize CXL RAS CPER\n");
> +
> + cxl_register_proto_err_work(&cxl_proto_err_work);
> +
> + return 0;
> +}
> +
> +void cxl_ras_exit(void)
> +{
> + cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> + cxl_unregister_proto_err_work();
> +}
The asymmetry in the above is unwanted. It arises from trying to make
cxl_cper_[un]register_prot_err_work() needlessly generic for any
potential consumer.
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 8479c2e1f74f..2c4bad5ad2b1 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -2246,6 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
> pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
> pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
> }
> +EXPORT_SYMBOL_NS_GPL(pcie_clear_device_status, "CXL");
> #endif
Similarly, there is no need to tolerate any consumer but the cxl_core for
this export.
---
include/cxl/event.h | 10 ++++------
drivers/acpi/apei/ghes.c | 23 +++++++++--------------
drivers/cxl/core/ras.c | 6 ++----
drivers/pci/pci.c | 2 +-
4 files changed, 16 insertions(+), 25 deletions(-)
diff --git a/include/cxl/event.h b/include/cxl/event.h
index ff97fea718d2..86f9b0b5b635 100644
--- a/include/cxl/event.h
+++ b/include/cxl/event.h
@@ -289,8 +289,8 @@ struct cxl_cper_prot_err_work_data {
int cxl_cper_register_work(struct work_struct *work);
int cxl_cper_unregister_work(struct work_struct *work);
int cxl_cper_kfifo_get(struct cxl_cper_work_data *wd);
-int cxl_cper_register_prot_err_work(struct work_struct *work);
-int cxl_cper_unregister_prot_err_work(struct work_struct *work);
+void cxl_cper_register_prot_err_work(struct work_struct *work);
+void cxl_cper_unregister_prot_err_work(void);
int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd);
#else
static inline int cxl_cper_register_work(struct work_struct *work)
@@ -310,13 +310,11 @@ static inline int cxl_cper_register_prot_err_work(struct work_struct *work)
{
return 0;
}
-static inline int cxl_cper_unregister_prot_err_work(struct work_struct *work)
+static inline void cxl_cper_unregister_prot_err_work(struct work_struct *work)
{
- return 0;
}
-static inline int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
+static inline void cxl_cper_prot_err_kfifo_get(void)
{
- return 0;
}
#endif
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index de935e0e1dcf..11432374b364 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -760,37 +760,32 @@ static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
#endif
}
-int cxl_cper_register_prot_err_work(struct work_struct *work)
+void cxl_cper_register_prot_err_work(struct work_struct *work)
{
- if (cxl_cper_prot_err_work)
- return -EINVAL;
-
guard(spinlock)(&cxl_cper_prot_err_work_lock);
cxl_cper_prot_err_work = work;
- return 0;
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_register_prot_err_work, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(cxl_cper_register_prot_err_work, "cxl_core");
-int cxl_cper_unregister_prot_err_work(struct work_struct *work)
+void cxl_cper_unregister_prot_err_work(void)
{
- if (cxl_cper_prot_err_work != work)
- return -EINVAL;
+ struct work_struct *work;
spin_lock(&cxl_cper_prot_err_work_lock);
+ work = cxl_cper_prot_err_work;
cxl_cper_prot_err_work = NULL;
spin_unlock(&cxl_cper_prot_err_work_lock);
- cancel_work_sync(work);
-
- return 0;
+ if (work)
+ cancel_work_sync(work);
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_unregister_prot_err_work, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(cxl_cper_unregister_prot_err_work, "cxl_core");
int cxl_cper_prot_err_kfifo_get(struct cxl_cper_prot_err_work_data *wd)
{
return kfifo_get(&cxl_cper_prot_err_fifo, wd);
}
-EXPORT_SYMBOL_NS_GPL(cxl_cper_prot_err_kfifo_get, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(cxl_cper_prot_err_kfifo_get, "cxl_core");
/* Room for 8 entries for each of the 4 event log queues */
#define CXL_CPER_FIFO_DEPTH 32
diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
index 3c503ae870df..b6fb6e2925f2 100644
--- a/drivers/cxl/core/ras.c
+++ b/drivers/cxl/core/ras.c
@@ -437,9 +437,7 @@ static DECLARE_WORK(cxl_proto_err_work, cxl_proto_err_work_fn);
int cxl_ras_init(void)
{
- if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
- pr_err("Failed to initialize CXL RAS CPER\n");
-
+ cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
cxl_register_proto_err_work(&cxl_proto_err_work);
return 0;
@@ -447,6 +445,6 @@ int cxl_ras_init(void)
void cxl_ras_exit(void)
{
- cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
+ cxl_cper_unregister_prot_err_work();
cxl_unregister_proto_err_work();
}
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 2c4bad5ad2b1..1bb59b0f96e8 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2246,7 +2246,7 @@ void pcie_clear_device_status(struct pci_dev *dev)
pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
}
-EXPORT_SYMBOL_NS_GPL(pcie_clear_device_status, "CXL");
+EXPORT_SYMBOL_FOR_MODULES(pcie_clear_device_status, "cxl_core");
#endif
/**
--
2.53.0
[..]
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> index ebca1112652a..818ec0d0a012 100644
> --- a/drivers/pci/pcie/aer_cxl_vh.c
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -33,7 +33,10 @@ bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> if (!info || !info->is_cxl)
> return false;
>
> - if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> + if ((pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_ROOT_PORT) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_UPSTREAM) &&
> + (pci_pcie_type(pdev) != PCI_EXP_TYPE_DOWNSTREAM))
Is this really necessary? If for some reason someone is able to get
info->is_cxl set on a PCIe device of type PCI_EXP_TYPE_PCI_BRIDGE, what
breaks? For example, find_cxl_port_by_dev() fails gracefully.