Re: [PATCH 3/3] xen-blkfront: dynamic configuration of per-vbd resources
From: Roger Pau Monné
Date: Mon Jul 25 2016 - 08:11:48 EST
On Mon, Jul 25, 2016 at 07:08:36PM +0800, Bob Liu wrote:
>
> On 07/25/2016 06:53 PM, Roger Pau Monné wrote:
> ..[snip..]
> >>>> * We get the device lock and blk_mq_freeze_queue() in dynamic_reconfig_device(),
> >>>> and then have to release in blkif_recover() asynchronously which makes the code more difficult to readable.
> >>>
> >>> I'm not sure I'm following here, do you mean that you will pick the lock in
> >>> dynamic_reconfig_device and release it in blkif_recover? Why wouldn't you
> >>
> >> Yes.
> >>
> >>> release the lock in dynamic_reconfig_device after doing whatever is needed?
> >>>
> >>
> >> Both 'dynamic configuration' and migration:xenbus_dev_resume() use { blkfront_resume(); blkif_recover() } and depends on the change of xbdev->state.
> >> If they happen simultaneously, the State machine of xbdev->state is going to be a mess and very difficult to track.
> >>
> >> The lock(actually a mutex) is like a big lock to make sure no race would happen at all.
> >
> > So the only thing that you should do is set the frontend state to closed and
> > wait for the backend to also switch to state closed, and then switch the
> > frontend state to init again in order to trigger a reconnection.
> >
>
> But if migration:xenbus_dev_resume() is triggered at the same time, any state be set might be changed.
> =====
> E.g
> Dynamic_reconfig_device: Migration:xenbus_dev_resume()
>
> Set the frontend state to closed
>
> ^^^^
> frontend state will be reset to XenbusStateInitialising by xenbus_dev_resume()
>
> Wait for the backend to also switch to state closed
This is not really a race, the backend will not switch to state closed, and
instead will appear at state init again, which is what we wanted anyway in
order to reconnect, so I don't see an issue with this flow.
> =====
> Similar situation may happen at any other place regarding set the state.
Right, but I don't see how your proposed patch also avoids this issues. I
see that you pick the device lock in dynamic_reconfig_device, but I don't
see xenbus_dev_resume picking the lock at all, and in any case I don't think
you should prevent the VM from migrating.
Depending on the toolstack it might decide to kill the VM because it has not
been able to migrate it, in which case the result is not better.
Roger.