Re: [PATCH] NBD: replace kill_bdev() with __invalidate_device()
From: Markus Pargmann
Date: Thu Apr 28 2016 - 05:00:34 EST
Hi,
On Saturday 23 April 2016 07:47:21 Ratna Manoj wrote:
> Thanks for the review.
>
> Atleast for ext4 this crash happens on a sys_umount() call, timing of
> which is not in control of block driver. Block driver cannot force the
> filesystems to be unmounted, and the file system does not expect
> buffers to get unmapped under it.
Yes the block driver can't force a clean umount.
>
> Ext4 can be fixed with the this patch:
> http://www.spinics.net/lists/linux-ext4/msg51112.html
> It did not make to the kernel. It checks the state of the buffer head
> before committing.
>
> When we consider diskett/CD as user space thread that called NBD_DO_IT,
> this problem is analogous to changing disk with another or the same
> disk suddenly when the file system is still mounted.
>
> If we completely kill the block device we would loss some writes when
> same thread is reconnected.
I am not so sure about your exact use-case here.
If the NBD_DO_IT thread returns I am considering the connection and
block device as dead and disconnected. Securing any data afterwards with
a new connection is potentially dangerous as it may be a different
server.
>
> if we do not completely kill or if we only invalidate clean buffers,
> we will have inconsistency on re-attach with a different thread
> (analogous to replacing disk with different disk suddenly).
Yes exactly. That's why I suggested that NBD_DO_IT waits until all
blockdevice users are gone. This would avoid any issues with
writing/reading data to a wrong server.
Best Regards,
Markus
>
> Ratna.
>
>
> On Wed, Apr 20, 2016 at 4:36 PM, Markus Pargmann <mpa@xxxxxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > On Thursday 24 March 2016 07:04:10 Ratna Manoj wrote:
> > > From: Ratna Manoj Bolla <manoj.br@xxxxxxxxx>
> > >
> > > When a filesystem is mounted on a nbd device and on a disconnect, because
> > > of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
> > > getting destroyed under mounted filesystem.
> > >
> > > After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
> > > followed by a sys_umount(),
> > > generic_shutdown_super()->...
> > > ->__sync_blockdev()->...
> > > -> blkdev_writepages()->...
> > > ->do_invalidatepage()->...
> > > -> discard_buffer() is discarding superblock buffer_head
> > assumed
> > > to be in mapped state by ext4_commit_super().
> > >
> > >
> > >
> > > Signed-off-by: Ratna Manoj Bolla <manoj.br@xxxxxxxxx>
> > > ---
> > > This script reproduces both the kernel panic scenarios:
> > >
> > > $ qemu-img create -f qcow2 f.img 1G
> > > $ mkfs.ext4 f.img
> > > $ qemu-nbd -c /dev/nbd0 f.img
> > > $ mount /dev/nbd0 dir
> > > $ killall -KILL qemu-nbd
> > > $ sleep 1
> > > $ ls dir
> > > $ umount dir
> > >
> > > Bug reports:
> > > http://www.kernelhub.org/?p=2&msg=361407
> > >
> > https://www.mail-archive.com/nbd-general@xxxxxxxxxxxxxxxxxxxxx/msg02388.html
> >
> > Thanks, please CC nbd-general@xxxxxxxxxxxxxxxxxxxxx,
> > linux-kernel@xxxxxxxxxxxxxxx as well.
> >
> > So this patch simply does not cleanup the blockdevice to avoid any
> > errors on the filesystem side. The userspace thread that called
> > NBD_DO_IT will exit immediately before the filesystem decided to release
> > the blockdevice. The nbd driver assumes that the shutdown was done and
> > accepts new clients setting up sockets and so on. Couldn't this lead to
> > a lot of problems?
> >
> > Currently NBD_DO_IT returns when it is save to use the NBD device again.
> > This patch changes this as the blockdevice may still be in use when
> > NBD_DO_IT returns. I think it would be better to delay NBD_DO_IT until
> > everything is cleaned up and all filesystems are closed.
> >
> > Best Regards,
> >
> > Markus
> >
> > >
> > > diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> > > index f6b51d7..6e77b3a 100644
> > > --- a/drivers/block/nbd.c
> > > +++ b/drivers/block/nbd.c
> > > @@ -119,7 +119,8 @@ static const char *nbdcmd_to_ascii(int cmd)
> > >
> > > static int nbd_size_clear(struct nbd_device *nbd, struct block_device
> > *bdev)
> > > {
> > > - bdev->bd_inode->i_size = 0;
> > > + if (bdev->bd_openers <= 1)
> > > + bdev->bd_inode->i_size = 0;
> > > set_capacity(nbd->disk, 0);
> > > kobject_uevent(&nbd_to_dev(nbd)->kobj, KOBJ_CHANGE);
> > >
> > > @@ -678,6 +679,9 @@ static void nbd_reset(struct nbd_device *nbd)
> > >
> > > static void nbd_bdev_reset(struct block_device *bdev)
> > > {
> > > + if (bdev->bd_openers > 1)
> > > + return;
> > > +
> > > set_device_ro(bdev, false);
> > > bdev->bd_inode->i_size = 0;
> > > if (max_part > 0) {
> > > @@ -735,7 +739,7 @@ static int __nbd_ioctl(struct block_device *bdev,
> > struct nbd_device *nbd,
> > > nbd_clear_que(nbd);
> > > BUG_ON(!list_empty(&nbd->queue_head));
> > > BUG_ON(!list_empty(&nbd->waiting_queue));
> > > - kill_bdev(bdev);
> > > + __invalidate_device(bdev, true);
> > > return 0;
> > >
> > > case NBD_SET_SOCK: {
> > > @@ -809,7 +813,7 @@ static int __nbd_ioctl(struct block_device *bdev,
> > struct nbd_device *nbd,
> > >
> > > sock_shutdown(nbd);
> > > nbd_clear_que(nbd);
> > > - kill_bdev(bdev);
> > > + __invalidate_device(bdev, true);
> > > nbd_bdev_reset(bdev);
> > >
> > > if (nbd->disconnect) /* user requested, ignore socket
> > errors */
> > >
> > >
> > >
> >
> > --
> > Pengutronix e.K. | |
> > Industrial Linux Solutions | http://www.pengutronix.de/ |
> > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
> > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
> >
>
>
>
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Attachment:
signature.asc
Description: This is a digitally signed message part.