Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

From: Jason Gunthorpe
Date: Tue Mar 30 2021 - 11:48:00 EST


On Tue, Mar 30, 2021 at 08:37:19AM -0700, Dan Williams wrote:
> On Tue, Mar 30, 2021 at 4:16 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
> >
> > On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote:
> >
> > > @@ -1155,21 +1175,12 @@ static void cxlmdev_unregister(void *_cxlmd)
> > > struct cxl_memdev *cxlmd = _cxlmd;
> > > struct device *dev = &cxlmd->dev;
> > >
> > > - percpu_ref_kill(&cxlmd->ops_active);
> > > cdev_device_del(&cxlmd->cdev, dev);
> > > - wait_for_completion(&cxlmd->ops_dead);
> > > + synchronize_srcu(&cxl_memdev_srcu);
> >
> > This needs some kind of rcu protected pointer for SRCU to to
> > work.. The write side has to null the pointer and the read side has to
> > copy the pointer to the stack and check for NULL.
> >
> > Otherwise the read side can't detect when the write side is shutting
> > down.
> >
> > Basically you must use rcu_derference(), rcu_assign_pointer(), etc
> > when working with RCU.
>
> If the shutdown path was not using synchronize_rcu() then I would
> agree with you. This usage of srcu is also used to protect dax device
> shutdown after talking through rwsem vs srcu in this thread with Jan
> and Paul [1]. The syncrhonize_rcu() guarantees that all read-side
> critical sections have had at least one chance to quiesce.
>
> So this could either use rcu pointer accessors and call_srcu to free
> the object in a quiescent state, or it can use synchronize_srcu()
> relative to a condition that aborts usage of the pointer.

synchronize_rcu doesn't stop the read side from running it. It only
guarentees that all running or future read sides will see the *write*
performed prior to synchronize_rcu.

If you can't clearly point to the *data* under RCU protection it is
being used wrong.

Same as if you can't point to the *data* being protected by a rwsem it
is probably being used wrong.

We are not locking code.

Jason