Re: [Xen-devel] [PATCH 1/1] xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect

From: Roger Pau Monné
Date: Tue Aug 22 2017 - 03:32:41 EST


On Fri, Aug 18, 2017 at 06:24:06PM +0100, Roger Pau Monné wrote:
> On Fri, Aug 18, 2017 at 10:29:15AM -0400, annie li wrote:
> >
> > On 8/18/2017 5:14 AM, Roger Pau Monné wrote:
> > > On Thu, Aug 17, 2017 at 06:43:46PM -0400, Annie Li wrote:
> > > > If there is inflight I/O in any non-last queue, blkback returns -EBUSY
> > > > directly, and never stops thread of remaining queue and processs them. When
> > > > removing vbd device with lots of disk I/O load, some queues with inflight
> > > > I/O still have blkback thread running even though the corresponding vbd
> > > > device or guest is gone.
> > > > And this could cause some problems, for example, if the backend device type
> > > > is file, some loop devices and blkback thread always lingers there forever
> > > > after guest is destroyed, and this causes failure of umounting repositories
> > > > unless rebooting the dom0. So stop all threads properly and return -EBUSY
> > > > if any queue has inflight I/O.
> > > >
> > > > Signed-off-by: Annie Li <annie.li@xxxxxxxxxx>
> > > > Reviewed-by: Herbert van den Bergh <herbert.van.den.bergh@xxxxxxxxxx>
> > > > Reviewed-by: Bhavesh Davda <bhavesh.davda@xxxxxxxxxx>
> > > > Reviewed-by: Adnan Misherfi <adnan.misherfi@xxxxxxxxxx>
> > > > ---
> > > > drivers/block/xen-blkback/xenbus.c | 10 ++++++++--
> > > > 1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-blkback/xenbus.c
> > > > index 792da68..2adb859 100644
> > > > --- a/drivers/block/xen-blkback/xenbus.c
> > > > +++ b/drivers/block/xen-blkback/xenbus.c
> > > > @@ -244,6 +244,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
> > > > {
> > > > struct pending_req *req, *n;
> > > > unsigned int j, r;
> > > > + bool busy = false;
> > > > for (r = 0; r < blkif->nr_rings; r++) {
> > > > struct xen_blkif_ring *ring = &blkif->rings[r];
> > > > @@ -261,8 +262,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
> > > > * don't have any discard_io or other_io requests. So, checking
> > > > * for inflight IO is enough.
> > > > */
> > > > - if (atomic_read(&ring->inflight) > 0)
> > > > - return -EBUSY;
> > > > + if (atomic_read(&ring->inflight) > 0) {
> > > > + busy = true;
> > > > + continue;
> > > > + }
> > > I guess I'm missing something, but I don't see how this is solving the
> > > problem described in the description.
> > >
> > > If the problem is that xen_blkif_disconnect returns without cleaning
> > > all the queues, this patch keeps the current behavior, just that it
> > > will try to remove more queues before returning, as opposed to
> > > returning when finding the first busy queue.
> > Before checking inflight, following code stops the blkback thread,
> > if (ring->xenblkd) {
> > kthread_stop(ring->xenblkd);
> > wake_up(&ring->shutdown_wq);
> > }
> > This patch allows thread of every queue has the chance to get stopped.
> > Otherwise, only thread of queue before(including) first busy one get
> > stopped, threads of remaining queue will still run, and these blkthread and
> > corresponding loop device will linger forever even after guest is destroyed.
>
> Thanks for the explanation:
>
> Acked-by: Roger Pau Monné <roger.pau@xxxxxxxxxx>

Forgot to add, this needs to be backported to stable branches, so:

Cc: stable@xxxxxxxxxxxxxxx

Roger.