Re: [PATCH 2/7] cxl/memdev: Hold memdev lock during memdev poison injection/clear

From: Li Ming

Date: Wed Mar 11 2026 - 06:57:59 EST



在 2026/3/11 05:34, Alison Schofield 写道:
On Tue, Mar 10, 2026 at 11:57:54PM +0800, Li Ming wrote:
CXL memdev poison injection/clearing debugfs interfaces are visible
before the CXL memdev endpoint initialization, If user accesses the
interfaces before cxlmd->endpoint updated, it is possible to access an
invalid endpoint in cxl_dpa_to_region().

Hold CXL memdev lock at the beginning of the interfaces, this blocks the
interfaces until CXL memdev probing completed.

The following patch will check the given endpoint validity in
cxl_dpa_to_region().

Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
---
drivers/cxl/core/memdev.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
index 273c22118d3d..8ebaf9e96035 100644
--- a/drivers/cxl/core/memdev.c
+++ b/drivers/cxl/core/memdev.c
@@ -295,6 +295,7 @@ int cxl_inject_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
if (!IS_ENABLED(CONFIG_DEBUG_FS))
return 0;
+ device_lock_assert(&cxlmd->dev);
lockdep_assert_held(&cxl_rwsem.dpa);
lockdep_assert_held(&cxl_rwsem.region);
I'm having second thoughts about this since this call site is not
the 'beginning of the interfaces' as the commit msg suggests.

What about taking the device lock in the debugfs func, ie -
mem.c : cxl_inject_poison. If the goal is to avoid using the debugfs
interface before probe completes, that does it.

At this callsite, we make sure nothing changes out from under us,
no endpoints attach or detach during the work.

Thanks for taking time to dive into this issue.

But I don't quite understand your comment, do you mean that we don't need above device_lock_assert() in cxl_inject/clear_poison_locked()?

You mentioned that taking the device lock in cxl_inject_poison() to ensure endpoint won't be changed during the debugfs interfaces calling,

That is right and that is what this patch does. So I am a little bit confused.


Ming

@@ -331,6 +332,10 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
{
int rc;
+ ACQUIRE(device_intr, devlock)(&cxlmd->dev);
+ if ((rc = ACQUIRE_ERR(device_intr, &devlock)))
+ return rc;
+
ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
return rc;
@@ -355,6 +360,7 @@ int cxl_clear_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
if (!IS_ENABLED(CONFIG_DEBUG_FS))
return 0;
+ device_lock_assert(&cxlmd->dev);
lockdep_assert_held(&cxl_rwsem.dpa);
lockdep_assert_held(&cxl_rwsem.region);
@@ -400,6 +406,10 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
{
int rc;
+ ACQUIRE(device_intr, devlock)(&cxlmd->dev);
+ if ((rc = ACQUIRE_ERR(device_intr, &devlock)))
+ return rc;
+
ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
return rc;

--
2.43.0