Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

From: Jean-Philippe Brucker
Date: Thu Jun 16 2022 - 04:15:17 EST


On Thu, Jun 16, 2022 at 12:10:18PM +0800, Zhangfei Gao wrote:
> > > diff --git a/drivers/misc/uacce/uacce.c b/drivers/misc/uacce/uacce.c
> > > index 281c54003edc..b6219c6bfb48 100644
> > > --- a/drivers/misc/uacce/uacce.c
> > > +++ b/drivers/misc/uacce/uacce.c
> > > @@ -136,9 +136,16 @@ static int uacce_fops_open(struct inode *inode, struct file *filep)
> > > if (!q)
> > > return -ENOMEM;
> > > + mutex_lock(&uacce->queues_lock);
> > > +
> > > + if (!uacce->parent->driver) {
> > I don't think this is useful, because the core clears parent->driver after
> > having run uacce_remove():
> >
> > rmmod hisi_zip open()
> > ... uacce_fops_open()
> > __device_release_driver() ...
> > pci_device_remove()
> > hisi_zip_remove()
> > hisi_qm_uninit()
> > uacce_remove()
> > ... ...
> > mutex_lock(uacce->queues_lock)
> > ... if (!uacce->parent->driver)
> > device_unbind_cleanup() /* driver still valid, proceed */
> > dev->driver = NULL
>
> The check  if (!uacce->parent->driver) is required, otherwise NULL pointer
> may happen.

I agree we need something, what I mean is that this check is not
sufficient.

> iommu_sva_bind_device
> const struct iommu_ops *ops = dev_iommu_ops(dev);  ->
> dev->iommu->iommu_dev->ops
>
> rmmod has no issue, but remove parent pci device has the issue.

Ah right, relying on the return value of bind() wouldn't be enough even if
we mandated SVA.

[...]
> >
> > I think we need the global uacce_mutex to serialize uacce_remove() and
> > uacce_fops_open(). uacce_remove() would do everything, including
> > xa_erase(), while holding that mutex. And uacce_fops_open() would try to
> > obtain the uacce object from the xarray while holding the mutex, which
> > fails if the uacce object is being removed.
>
> Since fops_open get char device refcount, uacce_release will not happen
> until open returns.

The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
if (!uacce->parent->driver)
/* Still valid, keep going */
... rmmod
uacce_remove()
... free_module()
uacce->ops->get_queue() /* BUG */

Accessing uacce->ops after free_module() is a use-after-free. We need all
the fops to synchronize with uacce_remove() to ensure they don't use any
resource of the parent after it's been freed.

I see uacce_fops_poll() may have the same problem, and should be inside
uacce_mutex.

Thanks,
Jean