RE: [Patch v2 2/3] Drivers: hv: add Azure Blob driver
From: Long Li
Date: Fri Jul 02 2021 - 19:59:51 EST
> Subject: RE: [Patch v2 2/3] Drivers: hv: add Azure Blob driver
>
> From: Long Li <longli@xxxxxxxxxxxxx> Sent: Wednesday, June 30, 2021 11:51
> PM
>
> [snip]
>
> > > > +static void az_blob_remove_device(struct az_blob_device *dev) {
> > > > + wait_event(dev->file_wait, list_empty(&dev->file_list));
> > > > + misc_deregister(&az_blob_misc_device);
> > > > +#ifdef CONFIG_DEBUG_FS
> > > > + debugfs_remove_recursive(az_blob_debugfs_root);
> > > > +#endif
> > > > + /* At this point, we won't get any requests from user-mode */ }
> > > > +
> > > > +static int az_blob_create_device(struct az_blob_device *dev) {
> > > > + int rc;
> > > > + struct dentry *d;
> > > > +
> > > > + rc = misc_register(&az_blob_misc_device);
> > > > + if (rc) {
> > > > + az_blob_err("misc_register failed rc %d\n", rc);
> > > > + return rc;
> > > > + }
> > > > +
> > > > +#ifdef CONFIG_DEBUG_FS
> > > > + az_blob_debugfs_root = debugfs_create_dir("az_blob", NULL);
> > > > + if (!IS_ERR_OR_NULL(az_blob_debugfs_root)) {
> > > > + d = debugfs_create_file("pending_requests", 0400,
> > > > + az_blob_debugfs_root, NULL,
> > > > + &az_blob_debugfs_fops);
> > > > + if (IS_ERR_OR_NULL(d)) {
> > > > + az_blob_warn("failed to create debugfs file\n");
> > > > + debugfs_remove_recursive(az_blob_debugfs_root);
> > > > + az_blob_debugfs_root = NULL;
> > > > + }
> > > > + } else
> > > > + az_blob_warn("failed to create debugfs root\n"); #endif
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int az_blob_connect_to_vsp(struct hv_device *device, u32
> > > > +ring_size) {
> > > > + int ret;
> > > > +
> > > > + spin_lock_init(&az_blob_dev.file_lock);
> > >
> > > I'd argue that the spin lock should not be re-initialized here.
> > > Here's the sequence where things go wrong:
> > >
> > > 1) The driver is unbound, so az_blob_remove() is called.
> > > 2) az_blob_remove() sets the "removing" flag to true, and calls
> > > az_blob_remove_device().
> > > 3) az_blob_remove_device() waits for the file_list to become empty.
> > > 4) After the file_list becomes empty, but before misc_deregister()
> > > is called, a separate thread opens the device again.
> > > 5) In the separate thread, az_blob_fop_open() obtains the file_lock spin
> lock.
> > > 6) Before az_blob_fop_open() releases the spin lock,
> > > az_blob_remove_device() completes, along with az_blob_remove().
> > > 7) Then the device gets rebound, and az_blob_connect_to_vsp() gets
> > > called, all while az_blob_fop_open() still holds the spin lock. So
> > > the spin lock get re- initialized while it is held.
> > >
> > > This is admittedly a far-fetched scenario, but stranger things have
> > > happened. :-) The issue is that you are counting on the az_blob_dev
> > > structure to persist and have a valid file_lock, even while the
> > > device is unbound. So any initialization should only happen in
> az_blob_drv_init().
> >
> > I'm not sure if az_blob_probe() and az_blob_remove() can be called at
> > the same time, as az_blob_remove_vmbus() is called the last in
> az_blob_remove().
> > Is it possible for vmbus asking the driver to probe a new channel
> > before the old channel is closed? I expect the vmbus provide guarantee
> > that those calls are made in sequence.
>
> In my scenario above, az_blob_remove_vmbus() and az_blob_remove() run
> to completion in Step #6, all while some other thread is still in the middle of
> an
> open() call and holding the file_lock spin lock. Then in Step #7
> az_blob_probe() runs. So az_blob_remove() and az_blob_probe() execute
> sequentially, not at the same time.
>
> Michael
I think it's a valid scenario. The return value of devtmpfs_delete_node() is not checked in misc_deregister(). It decreases the refcount on inodes but it's not guaranteed that someone else is still using it (in the middle of opening a file).
However, this works fine for "rmmod" that causes device to be removed. Before file is opened the refcount on the module is increased so it can't be removed when file is being opened. The scenario you described can't happen.
But during VMBUS rescind, it can happen. It's possible that the driver is using the spinlock that has been re-initialized, when the next VMBUS offer on the same channel comes before all the attempting open file calls exit.
This is a very rare. I agree things happen that we should make sure the driver can handle this. I'll update the driver.
Long
>
> >
> > >
> > > > + INIT_LIST_HEAD(&az_blob_dev.file_list);
> > > > + init_waitqueue_head(&az_blob_dev.file_wait);
> > > > +
> > > > + az_blob_dev.removing = false;
> > > > +
> > > > + az_blob_dev.device = device;
> > > > + device->channel->rqstor_size = device_queue_depth;
> > > > +
> > > > + ret = vmbus_open(device->channel, ring_size, ring_size, NULL, 0,
> > > > + az_blob_on_channel_callback, device->channel);
> > > > +
> > > > + if (ret) {
> > > > + az_blob_err("failed to connect to VSP ret %d\n", ret);
> > > > + return ret;
> > > > + }
> > > > +
> > > > + hv_set_drvdata(device, &az_blob_dev);
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static void az_blob_remove_vmbus(struct hv_device *device) {
> > > > + /* At this point, no VSC/VSP traffic is possible over vmbus */
> > > > + hv_set_drvdata(device, NULL);
> > > > + vmbus_close(device->channel);
> > > > +}
> > > > +
> > > > +static int az_blob_probe(struct hv_device *device,
> > > > + const struct hv_vmbus_device_id *dev_id) {
> > > > + int rc;
> > > > +
> > > > + az_blob_dbg("probing device\n");
> > > > +
> > > > + rc = az_blob_connect_to_vsp(device, az_blob_ringbuffer_size);
> > > > + if (rc) {
> > > > + az_blob_err("error connecting to VSP rc %d\n", rc);
> > > > + return rc;
> > > > + }
> > > > +
> > > > + // create user-mode client library facing device
> > > > + rc = az_blob_create_device(&az_blob_dev);
> > > > + if (rc) {
> > > > + az_blob_remove_vmbus(device);
> > > > + return rc;
> > > > + }
> > > > +
> > > > + az_blob_dbg("successfully probed device\n");
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static int az_blob_remove(struct hv_device *dev) {
> > > > + struct az_blob_device *device = hv_get_drvdata(dev);
> > > > + unsigned long flags;
> > > > +
> > > > + spin_lock_irqsave(&device->file_lock, flags);
> > > > + device->removing = true;
> > > > + spin_unlock_irqrestore(&device->file_lock, flags);
> > > > +
> > > > + az_blob_remove_device(device);
> > > > + az_blob_remove_vmbus(dev);
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static struct hv_driver az_blob_drv = {
> > > > + .name = KBUILD_MODNAME,
> > > > + .id_table = id_table,
> > > > + .probe = az_blob_probe,
> > > > + .remove = az_blob_remove,
> > > > + .driver = {
> > > > + .probe_type = PROBE_PREFER_ASYNCHRONOUS,
> > > > + },
> > > > +};
> > > > +
> > > > +static int __init az_blob_drv_init(void) {
> > > > + int ret;
> > > > +
> > > > + ret = vmbus_driver_register(&az_blob_drv);
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static void __exit az_blob_drv_exit(void) {
> > > > + vmbus_driver_unregister(&az_blob_drv);
> > > > +}