Re: [PATCH] cxl/region: Hold memdev lock during region poison injection/clear

From: Dan Williams

Date: Thu Mar 19 2026 - 22:30:41 EST

Li Ming wrote:
> cxl_dpa_to_region() has expectations that cxlmd->endpoint remains valid
> for the duration of the call. When userspace performs poison injection
> or clearing on a region via debugfs, holding cxl_rwsem.region and
> cxl_rwsem.dpa alone is insufficient, these locks do not prevent the
> retrieved CXL memdev from being destroyed, nor do they protect against
> concurrent driver detachment. Therefore, hold CXL memdev lock in the
> debugfs callbacks to ensure the cxlmd->dev.driver remains stable for the
> entire execution of the callback functions.

I took another look at this and there may be a simpler way to ensure
this safety.

In the case where we already have a region you can be assured that
the device is not going to detach from the region while holding the
proper rwsems. Maybe that was what Alison was referring to about it
being "ok"?

However, the problem is that cxl_dpa_to_region() is sometimes called
from paths where the memdev has unknown association to a region like
cxl_{inject,clear}_poison().

There are two ways to solve that problem. Hold the cxlmd lock, or move
the registration of debugfs *after* creation of cxlmd->endpoint. With
that ordering it would ensure that debugfs is only usable in the
interval here ->endpoint is known to be stable.

If that ends up simpler, we should go that route instead.

> To keep lock sequence(cxlmd.dev -> cxl_rwsem.region -> cxl_rwsem.dpa)
> for avoiding deadlock. the interfaces have to find out the correct CXL
> memdev at first, holding lock in the sequence then checking if the DPA
> data has been changed before holding locks.
>
> Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
> ---
> drivers/cxl/core/region.c | 112 ++++++++++++++++++++++++++++++++++++----------
> 1 file changed, 88 insertions(+), 24 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index f24b7e754727..1a509acc52a3 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -4101,12 +4101,70 @@ static int validate_region_offset(struct cxl_region *cxlr, u64 offset)
> return 0;
> }
>
> +static int __cxl_region_poison_lookup(struct cxl_region *cxlr, u64 offset,
> + struct dpa_result *res)
> +{
> + int rc;
> +
> + *res = (struct dpa_result){ .dpa = ULLONG_MAX, .cxlmd = NULL };
> +
> + if (validate_region_offset(cxlr, offset))
> + return -EINVAL;
> +
> + offset -= cxlr->params.cache_size;
> + rc = region_offset_to_dpa_result(cxlr, offset, res);
> + if (rc || !res->cxlmd || res->dpa == ULLONG_MAX) {
> + dev_dbg(&cxlr->dev,
> + "Failed to resolve DPA for region offset %#llx rc %d\n",
> + offset, rc);
> +
> + return rc ? rc : -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int cxl_region_poison_lookup(struct cxl_region *cxlr, u64 offset,
> + struct dpa_result *res)
> +{
> + int rc;
> +
> + ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
> + return rc;
> +
> + ACQUIRE(rwsem_read_intr, dpa_rwsem)(&cxl_rwsem.dpa);
> + if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem)))
> + return rc;
> +
> + rc = __cxl_region_poison_lookup(cxlr, offset, res);
> + if (rc)
> + return rc;
> +
> + /*
> + * Hold the device reference in case
> + * the device is destroyed after that.
> + */
> + get_device(&res->cxlmd->dev);

This is what made me take another look at alternate approaches because
this reference count is messy.

A region always holds an implicit reference on attached memdevs by
holding its endpoint decoders. The only value of a reference is making
sure that if devices are removed while the lock is dropped you can be
sure that no new memdev aliases the allocation of the old memdev. The
pointer does not necessarily need to be live for that.

> + return 0;
> +}
> +
> static int cxl_region_debugfs_poison_inject(void *data, u64 offset)
> {
> - struct dpa_result result = { .dpa = ULLONG_MAX, .cxlmd = NULL };
> struct cxl_region *cxlr = data;
> + struct dpa_result res1, res2;
> int rc;
>
> + /* To retrieve the correct memdev */
> + rc = cxl_region_poison_lookup(cxlr, offset, &res1);
> + if (rc)
> + return rc;
> +
> + struct device *dev __free(put_device) = &res1.cxlmd->dev;

I do not like magic mid-function scopes that are not generated by actual
allocation functions.

> + ACQUIRE(device_intr, devlock)(dev);
> + if ((rc = ACQUIRE_ERR(device_intr, &devlock)))
> + return rc;
> +
> ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
> return rc;
> @@ -4115,20 +4173,18 @@ static int cxl_region_debugfs_poison_inject(void *data, u64 offset)
> if ((rc = ACQUIRE_ERR(rwsem_read_intr, &dpa_rwsem)))
> return rc;
>
> - if (validate_region_offset(cxlr, offset))
> - return -EINVAL;
> -
> - offset -= cxlr->params.cache_size;
> - rc = region_offset_to_dpa_result(cxlr, offset, &result);
> - if (rc || !result.cxlmd || result.dpa == ULLONG_MAX) {
> + /*
> + * Retrieve memdev and DPA data again in case that the data
> + * has been changed before holding locks.
> + */
> + rc = __cxl_region_poison_lookup(cxlr, offset, &res2);
> + if (rc || res2.cxlmd != res1.cxlmd || res2.dpa != res1.dpa) {

Nothing does a put_device() on res2.cxlmd->dev.

So, it seems it would indeed be altogether cleaner to eliminate the
cxlmd->endpoint validity checking altogether by reordering
initialization and teardown.