Re: [PATCH v2 2/4] cxl/mem: Fix synchronization mechanism for device removal vs ioctl operations

From: Dan Williams
Date: Tue Mar 30 2021 - 11:39:01 EST


On Tue, Mar 30, 2021 at 4:16 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Mon, Mar 29, 2021 at 07:47:49PM -0700, Dan Williams wrote:
>
> > @@ -1155,21 +1175,12 @@ static void cxlmdev_unregister(void *_cxlmd)
> > struct cxl_memdev *cxlmd = _cxlmd;
> > struct device *dev = &cxlmd->dev;
> >
> > - percpu_ref_kill(&cxlmd->ops_active);
> > cdev_device_del(&cxlmd->cdev, dev);
> > - wait_for_completion(&cxlmd->ops_dead);
> > + synchronize_srcu(&cxl_memdev_srcu);
>
> This needs some kind of rcu protected pointer for SRCU to to
> work.. The write side has to null the pointer and the read side has to
> copy the pointer to the stack and check for NULL.
>
> Otherwise the read side can't detect when the write side is shutting
> down.
>
> Basically you must use rcu_derference(), rcu_assign_pointer(), etc
> when working with RCU.

If the shutdown path was not using synchronize_rcu() then I would
agree with you. This usage of srcu is also used to protect dax device
shutdown after talking through rwsem vs srcu in this thread with Jan
and Paul [1]. The syncrhonize_rcu() guarantees that all read-side
critical sections have had at least one chance to quiesce.

So this could either use rcu pointer accessors and call_srcu to free
the object in a quiescent state, or it can use synchronize_srcu()
relative to a condition that aborts usage of the pointer.

[1]: https://lore.kernel.org/lkml/20180408031113.GO3948@xxxxxxxxxxxxxxxxxx/