Re: [PATCH] uacce: fix concurrency of fops_open and uacce_remove

From: Zhangfei Gao
Date: Tue Jun 21 2022 - 03:39:47 EST




On 2022/6/20 下午9:36, Greg Kroah-Hartman wrote:
On Mon, Jun 20, 2022 at 02:24:31PM +0100, Jean-Philippe Brucker wrote:
On Fri, Jun 17, 2022 at 02:05:21PM +0800, Zhangfei Gao wrote:
The refcount only ensures that the uacce_device object is not freed as
long as there are open fds. But uacce_remove() can run while there are
open fds, or fds in the process of being opened. And atfer uacce_remove()
runs, the uacce_device object still exists but is mostly unusable. For
example once the module is freed, uacce->ops is not valid anymore. But
currently uacce_fops_open() may dereference the ops in this case:

uacce_fops_open()
if (!uacce->parent->driver)
/* Still valid, keep going */
... rmmod
uacce_remove()
... free_module()
uacce->ops->get_queue() /* BUG */
uacce_remove should wait for uacce->queues_lock, until fops_open release the
lock.
If open happen just after the uacce_remove: unlock, uacce_bind_queue in open
should fail.
Ah yes sorry, I lost sight of what this patch was adding. But we could
have the same issue with the patch, just in a different order, no?

uacce_fops_open()
uacce = xa_load()
... rmmod
Um, how is rmmod called if the file descriptor is open?

That should not be possible if the owner of the file descriptor is
properly set. Please fix that up.
Thanks Greg

Set cdev owner or use module_get/put can block rmmod once fops_open.
-       uacce->cdev->owner = THIS_MODULE;
+       uacce->cdev->owner = uacce->parent->driver->owner;

However, still not find good method to block removing parent pci device.

$ echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove &

[   32.563350]  uacce_remove+0x6c/0x148
[   32.563353]  hisi_qm_uninit+0x12c/0x178
[   32.563356]  hisi_zip_remove+0xa0/0xd0 [hisi_zip]
[   32.563361]  pci_device_remove+0x44/0xd8
[   32.563364]  device_remove+0x54/0x88
[   32.563367]  device_release_driver_internal+0xec/0x1a0
[   32.563370]  device_release_driver+0x20/0x30
[   32.563372]  pci_stop_bus_device+0x8c/0xe0
[   32.563375]  pci_stop_and_remove_bus_device_locked+0x28/0x60
[   32.563378]  remove_store+0x9c/0xb0
[   32.563379]  dev_attr_store+0x20/0x38

mutex_lock(&dev->device_lock) can be used, which used in device_release_driver_internal.
Or use internal mutex.

Thanks