Re: [PATCH v2 4/5]nbd: make nbd device wait for its users.

From: Markus Pargmann
Date: Wed Jun 15 2016 - 02:32:42 EST


Hi Pranay,

On Tuesday 14 June 2016 15:03:40 Pranay Srivastava wrote:
> Hi Markus,
>
> On Tue, Jun 14, 2016 at 2:29 PM, Markus Pargmann <mpa@xxxxxxxxxxxxxx> wrote:
> >
> > On Thursday 02 June 2016 13:25:00 Pranay Kr. Srivastava wrote:
> > > When a timeout occurs or a recv fails, then
> > > instead of abruplty killing nbd block device
> > > wait for it's users to finish.
> > >
> > > This is more required when filesystem(s) like
> > > ext2 or ext3 don't expect their buffer heads to
> > > disappear while the filesystem is mounted.
> > >
> > > Each open of a nbd device is refcounted, while
> > > the userland program [nbd-client] doing the
> > > NBD_DO_IT ioctl would now wait for any other users
> > > of this device before invalidating the nbd device.
> > >
> > > Signed-off-by: Pranay Kr. Srivastava <pranjas@xxxxxxxxx>
> > > ---
> > > drivers/block/nbd.c | 58 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 58 insertions(+)
> > >
> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > index d1d898d..4da40dc 100644
> > > --- a/drivers/block/nbd.c
> > > +++ b/drivers/block/nbd.c
> > > @@ -70,10 +70,13 @@ struct nbd_device {
> > > #if IS_ENABLED(CONFIG_DEBUG_FS)
> > > struct dentry *dbg_dir;
> > > #endif
> > > + atomic_t inuse;
> > > /*
> > > *This is specifically for calling sock_shutdown, for now.
> > > */
> > > struct work_struct ws_shutdown;
> > > + struct kref users;
> > > + struct completion user_completion;
> > > };
> > >
> > > #if IS_ENABLED(CONFIG_DEBUG_FS)
> > > @@ -104,6 +107,7 @@ static DEFINE_SPINLOCK(nbd_lock);
> > > * Shutdown function for nbd_dev work struct.
> > > */
> > > static void nbd_ws_func_shutdown(struct work_struct *);
> > > +static void nbd_kref_release(struct kref *);
> > >
> > > static inline struct device *nbd_to_dev(struct nbd_device *nbd)
> > > {
> > > @@ -682,6 +686,8 @@ static void nbd_reset(struct nbd_device *nbd)
> > > nbd->flags = 0;
> > > nbd->xmit_timeout = 0;
> > > INIT_WORK(&nbd->ws_shutdown, nbd_ws_func_shutdown);
> > > + init_completion(&nbd->user_completion);
> > > + kref_init(&nbd->users);
> > > queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, nbd->disk->queue);
> > > del_timer_sync(&nbd->timeout_timer);
> > > }
> > > @@ -815,6 +821,14 @@ static int __nbd_ioctl(struct block_device *bdev, struct nbd_device *nbd,
> > > kthread_stop(thread);
> > >
> > > sock_shutdown(nbd);
> > > + /*
> > > + * kref_init initializes with ref count as 1,
> > > + * nbd_client, or the user-land program executing
> > > + * this ioctl will make the refcount to 2[at least]
> > > + * so subtracting 2 from refcount.
> > > + */
> > > + kref_sub(&nbd->users, 2, nbd_kref_release);
> >
> > Why don't you use a kref_put?
>
> Ok, so I'll try to explain as I've understood the problem.
>
> When the module is loaded the kref is initialized to 1.
>
> Suppose now, someone has started nbd-client [nbdC-1] , then this
> nbd-client will increase the ref count to 2. So far so good...
>
> Now let's say this device is being shutdown via nbd-client[nbdC-2].
>
> nbdC-1 will subtract the refcount by two, it has to do in NBD_DO_IT
> since device file will not
> be closed until after ioctl is over, and it'll wait_for_completion.
>
> nbdC-2 now closes it's use of device file, this makes the refcount as
> zero and completion
> is triggered with nbdC-1 completed.
>
> Now we don't want to trigger kref_put when nbdC-1 closes the device
> file so kref_put needs
> to be conditional in this regard so for that in_use is used.
>
>
> >
> > > + wait_for_completion(&nbd->user_completion);
> > > mutex_lock(&nbd->tx_lock);
> > > nbd_clear_que(nbd);
> > > kill_bdev(bdev);
> > > @@ -865,13 +879,56 @@ static int nbd_ioctl(struct block_device *bdev, fmode_t mode,
> > >
> > > return error;
> > > }
> > > +static void nbd_kref_release(struct kref *kref_users)
> > > +{
> > > + struct nbd_device *nbd = container_of(kref_users, struct nbd_device,
> > > + users);
> >
> > Not indented to opening bracket.
> >
> > > + pr_debug("Releasing kref [%s]\n", __func__);
> > > + atomic_set(&nbd->inuse, 0);
> > > + complete(&nbd->user_completion);
> > > +
> > > +}
> > > +
> > > +static int nbd_open(struct block_device *bdev, fmode_t mode)
> > > +{
> > > + struct nbd_device *nbd_dev = bdev->bd_disk->private_data;
> > > +
> > > + if (kref_get_unless_zero(&nbd_dev->users))
> > > + atomic_set(&nbd_dev->inuse, 1);
> > > +
> > > + pr_debug("Opening nbd_dev %s. Active users = %u\n",
> > > + bdev->bd_disk->disk_name,
> > > + atomic_read(&nbd_dev->users.refcount) - 1);
> >
> > Indent to opening bracket.
> >
> > > + return 0;
> > > +}
> > > +
> > > +static void nbd_release(struct gendisk *disk, fmode_t mode)
> > > +{
> > > + struct nbd_device *nbd_dev = disk->private_data;
> > > + /*
> > > + *kref_init initializes ref count to 1, so we
> > > + *we check for refcount to be 2 for a final put.
> > > + *
> > > + *kref needs to be re-initialized just here as the
> > > + *other process holding it must see the ref count as 2.
> > > + */
> > > + if (atomic_read(&nbd_dev->inuse))
> > > + kref_put(&nbd_dev->users, nbd_kref_release);
> >
>
> > What is this inuse atomic for? Everyone that releases the nbd device
> > will need to execute a kref_put().
>
> To do away with inuse, perhaps we can do
>
> kref_get just before leaving the NBD_DO_IT? so that when device file
> is closed everyone
> would do a kref_put? However there's a small race window while the
> kref is being initialized,
> and another process [not just nbd-client] is trying to open the device.
>
> Do you think it's better to do this by introducing a spin_lock instead
> of atomic?
>
> Let me know if my understanding is correct.

Thanks for the explanations. I think my understanding was off by one ;).
I didn't realize that the DO_IT thread from the userspace has the block
device open as well.

I thought a bit about this, does it make sense to delay the essential
cleanup steps until really all open file handles were closed? So that
even if the DO_IT thread exits, the block device is still there. Only if
the file is closed everything is cleaned up. Maybe this makes the code
simpler and we can directly use krefs without any strange constructs.
What do you think?

This would also allow the client to setup a new socket as long as it
does not close the nbd file handle.

Could this behavior be potentially problematic for any client
implementation? Does it solve our other issue with setting up a new
sockets for an existing nbd blockdevice?

Cc Wouter

Best Regards,

Markus

>
>
> >
> > Best Regards,
> >
> > Markus
> >
> > > +
> > > + pr_debug("Closing nbd_dev %s. Active users = %u\n",
> > > + disk->disk_name,
> > > + atomic_read(&nbd_dev->users.refcount) - 1);
> > > +}
> > >
> > > static const struct block_device_operations nbd_fops = {
> > > .owner = THIS_MODULE,
> > > .ioctl = nbd_ioctl,
> > > .compat_ioctl = nbd_ioctl,
> > > + .open = nbd_open,
> > > + .release = nbd_release
> > > };
> > >
> > > +
> > > static void nbd_ws_func_shutdown(struct work_struct *ws_nbd)
> > > {
> > > struct nbd_device *nbd_dev = container_of(ws_nbd, struct nbd_device,
> > > @@ -1107,6 +1164,7 @@ static int __init nbd_init(void)
> > > disk->fops = &nbd_fops;
> > > disk->private_data = &nbd_dev[i];
> > > sprintf(disk->disk_name, "nbd%d", i);
> > > + atomic_set(&nbd_dev[i].inuse, 0);
> > > nbd_reset(&nbd_dev[i]);
> > > add_disk(disk);
> > > }
> > >
> >
> > --
> > Pengutronix e.K. | |
> > Industrial Linux Solutions | http://www.pengutronix.de/ |
> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
>
>
>
>
>

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

Attachment: signature.asc
Description: This is a digitally signed message part.