Re: [PATCH 7/7] cxl/port: Reset cxlmd->endpoint to -ENXIO by default
From: Li Ming
Date: Wed Mar 11 2026 - 08:15:24 EST
在 2026/3/11 03:29, Dan Williams 写道:
Li Ming wrote:Will change it in V2.
cxlmd->endpoint is set to -ENXIO by default, This intentional invalidSo I disagree that this is a fix. It may be a consistency change to make
value ensures that any unintended access to the endpoint will be
detectable. But CXL driver didn't reset it to the default value when the
endpoint is released in delete_endpoint().
Besides, cxlmd->endpoint is updated to point to an valid endpoint in
cxl_port_add(), but if the device_add() in cxl_port_add() fails, the
endpoint will be released, but cxlmd->endpoint remains pointing to the
released endpoint, it may introduce a potential use-after-free issue.
Fixes: 3d8be8b398e3 ("cxl: Set cxlmd->endpoint before adding port device")
Fixes: 29317f8dc6ed ("cxl/mem: Introduce cxl_memdev_attach for CXL-dependent operation")
sure that all invalid ->endpoint accesses have the same result, but it
is not a fix for any discernable end user problem.
Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>I see no need for this. All of the upper levels paths ensure that all
---
drivers/cxl/core/port.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0c5957d1d329..ec3cb62b44b7 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -834,11 +834,14 @@ static int cxl_port_add(struct cxl_port *port,
struct cxl_dport *parent_dport)
{
struct device *dev __free(put_device) = &port->dev;
+ struct cxl_memdev *cxlmd = NULL;
int rc;
if (is_cxl_memdev(port->uport_dev)) {
- struct cxl_memdev *cxlmd = to_cxl_memdev(port->uport_dev);
- struct cxl_dev_state *cxlds = cxlmd->cxlds;
+ struct cxl_dev_state *cxlds;
+
+ cxlmd = to_cxl_memdev(port->uport_dev);
+ cxlds = cxlmd->cxlds;
rc = dev_set_name(dev, "endpoint%d", port->id);
if (rc)
@@ -865,8 +868,11 @@ static int cxl_port_add(struct cxl_port *port,
}
rc = device_add(dev);
- if (rc)
+ if (rc) {
+ if (cxlmd)
+ cxlmd->endpoint = ERR_PTR(-ENXIO);
the results of setup are torn down on failure. Consider that if
device_add() succeeds and any of the follow on functions in
__devm_cxl_add_port() then this cleanup will also not happen.
So you must rely on upper level cleanup. This is similar to the
semantics of dev_set_drvdata(). That is only reset up after _probe()
finishes failing.
Now, if you want to do some separate cleanups in this space I would redo
the anti-patterns of scope-based cleanup in cxl_port_alloc() and
cxl_port_add(). cxl_port_alloc() does the "__free = NULL" anti-pattern.
It appears to be doing that to avoid adding a DEFINE_FREE() for ida
cleanup if the port allocation fails.
Also, if the cxl_port_alloc() call became something like:
struct cxl_port *port __free(put_cxl_port) = cxl_port_alloc(...)
...then I do not think cxl_port_add() would need to do the odd
__free(put_device()) initialization without a corresponding
get_device(). Nor the odd "dev = NULL" at the end.
To make sure that I didn't misunderstand your comment, I draft the following changes.
The changes also fix an issue that the unregister_port() was not invoked to release port if devm_cxl_link_uport() or devm_cxl_link_parent_dport() failed in __devm_cxl_add_port() .
If the changes are OK, I think they should be in a separate patchset(including reset cxlmd->endpoint to -ENXIO if adding endpoint failed)
diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
index 0c5957d1d329..c1975b28b570 100644
--- a/drivers/cxl/core/port.c
+++ b/drivers/cxl/core/port.c
@@ -552,10 +552,13 @@ static void cxl_port_release(struct device *dev)
xa_destroy(&port->dports);
xa_destroy(&port->regions);
ida_free(&cxl_port_ida, port->id);
- if (is_cxl_root(port))
+ if (is_cxl_root(port)) {
kfree(to_cxl_root(port));
- else
+ } else {
+ if (is_cxl_endpoint(port))
+ to_cxl_memdev(&port->dev)->endpoint = ERR_PTR(-ENXIO);
kfree(port);
+ }
}
static ssize_t decoders_committed_show(struct device *dev,
@@ -833,7 +836,7 @@ static int cxl_port_add(struct cxl_port *port,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport)
{
- struct device *dev __free(put_device) = &port->dev;
+ struct device *dev = &port->dev;
int rc;
if (is_cxl_memdev(port->uport_dev)) {
@@ -864,47 +867,53 @@ static int cxl_port_add(struct cxl_port *port,
return rc;
}
- rc = device_add(dev);
- if (rc)
- return rc;
-
- /* Inhibit the cleanup function invoked */
- dev = NULL;
- return 0;
+ return device_add(dev);
}
+DEFINE_FREE(cxl_port_host_release_dr_group, struct cxl_port *,
+ if (_T) devres_release_group(port_to_host(_T), _T))
+
static struct cxl_port *__devm_cxl_add_port(struct device *host,
struct device *uport_dev,
resource_size_t component_reg_phys,
struct cxl_dport *parent_dport)
{
- struct cxl_port *port;
+ struct cxl_port *_port;
int rc;
- port = cxl_port_alloc(uport_dev, parent_dport);
+ struct cxl_port *port __free(put_cxl_port) =
+ cxl_port_alloc(uport_dev, parent_dport);
if (IS_ERR(port))
return port;
+ void *port_dr_group __free(cxl_port_host_release_dr_group) =
+ devres_open_group(host, port, GFP_KERNEL);
+ if (!port_dr_group)
+ return ERR_PTR(-ENOMEM);
+
rc = cxl_port_add(port, component_reg_phys, parent_dport);
if (rc)
return ERR_PTR(rc);
- rc = devm_add_action_or_reset(host, unregister_port, port);
+ _port = no_free_ptr(port);
+ rc = devm_add_action_or_reset(host, unregister_port, _port);
if (rc)
return ERR_PTR(rc);
- rc = devm_cxl_link_uport(host, port);
+ rc = devm_cxl_link_uport(host, _port);
if (rc)
return ERR_PTR(rc);
- rc = devm_cxl_link_parent_dport(host, port, parent_dport);
+ rc = devm_cxl_link_parent_dport(host, _port, parent_dport);
if (rc)
return ERR_PTR(rc);
+ devres_remove_group(host, no_free_ptr(port_dr_group));
+
if (parent_dport && dev_is_pci(uport_dev))
- port->pci_latency = cxl_pci_get_latency(to_pci_dev(uport_dev));
+ _port->pci_latency = cxl_pci_get_latency(to_pci_dev(uport_dev));
- return port;
+ return _port;
}
OK, will remove it in v2.return rc;I also see no need for this. Either dereferencing NULL or ERR_PTR() will
+ }
/* Inhibit the cleanup function invoked */
dev = NULL;
@@ -1425,7 +1431,7 @@ static void delete_endpoint(void *data)
devm_release_action(host, cxl_unlink_uport, endpoint);
devm_release_action(host, unregister_port, endpoint);
}
- cxlmd->endpoint = NULL;
+ cxlmd->endpoint = ERR_PTR(-ENXIO);
result in what is needed, a kernel crash. How the use after-free crash
is signalled does not really matter. The only expectation that other
code has is that "->endpoint != NULL" does not mean ->endpoint is valid.
Nothing should care about that though because cxlmd->dev.driver is the
authoritative gate for ->endpoint validity.