Re: [PATCH 2/7] cxl/memdev: Hold memdev lock during memdev poison injection/clear

From: Alison Schofield

Date: Thu Mar 12 2026 - 00:05:46 EST

On Wed, Mar 11, 2026 at 06:53:26PM +0800, Li Ming wrote:
>
> 在 2026/3/11 05:34, Alison Schofield 写道:
> > On Tue, Mar 10, 2026 at 11:57:54PM +0800, Li Ming wrote:
> > > CXL memdev poison injection/clearing debugfs interfaces are visible
> > > before the CXL memdev endpoint initialization, If user accesses the
> > > interfaces before cxlmd->endpoint updated, it is possible to access an
> > > invalid endpoint in cxl_dpa_to_region().
> > >
> > > Hold CXL memdev lock at the beginning of the interfaces, this blocks the
> > > interfaces until CXL memdev probing completed.
> > >
> > > The following patch will check the given endpoint validity in
> > > cxl_dpa_to_region().
> > >
> > > Suggested-by: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > Signed-off-by: Li Ming <ming.li@xxxxxxxxxxxx>
> > > ---
> > > drivers/cxl/core/memdev.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/drivers/cxl/core/memdev.c b/drivers/cxl/core/memdev.c
> > > index 273c22118d3d..8ebaf9e96035 100644
> > > --- a/drivers/cxl/core/memdev.c
> > > +++ b/drivers/cxl/core/memdev.c
> > > @@ -295,6 +295,7 @@ int cxl_inject_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
> > > if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > > return 0;
> > > + device_lock_assert(&cxlmd->dev);
> > > lockdep_assert_held(&cxl_rwsem.dpa);
> > > lockdep_assert_held(&cxl_rwsem.region);
> > I'm having second thoughts about this since this call site is not
> > the 'beginning of the interfaces' as the commit msg suggests.
> >
> > What about taking the device lock in the debugfs func, ie -
> > mem.c : cxl_inject_poison. If the goal is to avoid using the debugfs
> > interface before probe completes, that does it.
> >
> > At this callsite, we make sure nothing changes out from under us,
> > no endpoints attach or detach during the work.
> >
> Thanks for taking time to dive into this issue.
>
> But I don't quite understand your comment, do you mean that we don't need
> above device_lock_assert() in cxl_inject/clear_poison_locked()?
>
> You mentioned that taking the device lock in cxl_inject_poison() to ensure
> endpoint won't be changed during the debugfs interfaces calling,
>
> That is right and that is what this patch does. So I am a little bit
> confused.

I was only thinking of moving the ACQUIRE one level up, to here:
drivers/cxl/mem.c: cxl_debugfs_poison_inject|clear ()

That would mean dropping the assert in clear_poison_locked().

>
>
> Ming
>
> > > @@ -331,6 +332,10 @@ int cxl_inject_poison(struct cxl_memdev *cxlmd, u64 dpa)
> > > {
> > > int rc;
> > > + ACQUIRE(device_intr, devlock)(&cxlmd->dev);
> > > + if ((rc = ACQUIRE_ERR(device_intr, &devlock)))
> > > + return rc;
> > > +
> > > ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> > > if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
> > > return rc;
> > > @@ -355,6 +360,7 @@ int cxl_clear_poison_locked(struct cxl_memdev *cxlmd, u64 dpa)
> > > if (!IS_ENABLED(CONFIG_DEBUG_FS))
> > > return 0;
> > > + device_lock_assert(&cxlmd->dev);
> > > lockdep_assert_held(&cxl_rwsem.dpa);
> > > lockdep_assert_held(&cxl_rwsem.region);
> > > @@ -400,6 +406,10 @@ int cxl_clear_poison(struct cxl_memdev *cxlmd, u64 dpa)
> > > {
> > > int rc;
> > > + ACQUIRE(device_intr, devlock)(&cxlmd->dev);
> > > + if ((rc = ACQUIRE_ERR(device_intr, &devlock)))
> > > + return rc;
> > > +
> > > ACQUIRE(rwsem_read_intr, region_rwsem)(&cxl_rwsem.region);
> > > if ((rc = ACQUIRE_ERR(rwsem_read_intr, &region_rwsem)))
> > > return rc;
> > >
> > > --
> > > 2.43.0
> > >