Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed
From: Roger Pau Monné
Date: Mon Dec 09 2019 - 10:13:48 EST
On Mon, Dec 09, 2019 at 02:41:40PM +0000, Durrant, Paul wrote:
> > -----Original Message-----
> > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Sent: 09 December 2019 14:29
> > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Juergen
> > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> > closed
> >
> > On Mon, Dec 09, 2019 at 12:40:47PM +0000, Durrant, Paul wrote:
> > > > -----Original Message-----
> > > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > > Sent: 09 December 2019 12:26
> > > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> > Juergen
> > > > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > forced to
> > > > closed
> > > >
> > > > On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote:
> > > > > > -----Original Message-----
> > > > > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > > > > Sent: 09 December 2019 11:39
> > > > > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> > > > Juergen
> > > > > > Gross <jgross@xxxxxxxx>; Stefano Stabellini
> > <sstabellini@xxxxxxxxxx>;
> > > > > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > > > forced to
> > > > > > closed
> > > > > >
> > > > > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote:
> > > > > > > Only force state to closed in the case when the toolstack may
> > need
> > > > to
> > > > > > > clean up. This can be detected by checking whether the state in
> > > > xenstore
> > > > > > > has been set to closing prior to device removal.
> > > > > >
> > > > > > I'm not sure I see the point of this, I would expect that a
> > failure to
> > > > > > probe or the removal of the device would leave the xenbus state as
> > > > > > closed, which is consistent with the actual driver state.
> > > > > >
> > > > > > Can you explain what's the benefit of leaving a device without a
> > > > > > driver in such unknown state?
> > > > > >
> > > > >
> > > > > If probe fails then I think it should leave the state alone. If the
> > > > > state is moved to closed then basically you just killed that
> > > > > connection to the guest (as the frontend will normally close down
> > > > > when it sees this change) so, if the probe failure was due to a bug
> > > > > in blkback or, e.g., a transient resource issue then it's game over
> > > > > as far as that guest goes.
> > > >
> > > > But the connection can be restarted by switching the backend to the
> > > > init state again.
> > >
> > > Too late. The frontend saw closed and you already lost.
> > >
> > > >
> > > > > The ultimate goal here is PV backend re-load that is completely
> > > > transparent to the guest. Modifying anything in xenstore compromises
> > that
> > > > so we need to be careful.
> > > >
> > > > That's a fine goal, but not switching to closed state in
> > > > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > > > without a matching backend and with the state not set to closed.
> > > >
> > >
> > > Why is this a problem? With this series fully applied a (block) backend
> > can come and go without needing to change the state. Relying on guests to
> > DTRT is not a sustainable option for a cloud deployment.
> > >
> > > > Ie: that would be fine if you explicitly state this is some kind of
> > > > internal blkback reload, but not for the general case where blkback
> > > > has been unbound. I think we need someway to difference a blkback
> > > > reload vs a unbound.
> > > >
> > >
> > > Why do we need that though? Why is it advantageous for a backend to go
> > to closed. No PV backends cope with an unbind as-is, and a toolstack
> > initiated unplug will always set state to 5 anyway. So TBH any state
> > transition done directly in the xenbus code looks wrong to me anyway (but
> > appears to be a necessary evil to keep the toolstack working in the event
> > it spawns a backend where there is actually to driver present, or it
> > doesn't come online).
> >
> > IMO the normal flow for unbind would be to attempt to close open
> > connections and then remove the driver: leaving frontends connected
> > without any attached backends is not correct, and will just block the
> > guest frontend until requests start timing out.
> >
> > I can see the reasoning for doing that for the purpose of updating a
> > blkback module without guests noticing, but I would prefer that
> > leaving connections open was an option that could be given when
> > unbinding (or maybe a driver option in sysfs?), so that the default
> > behaviour would be to try to close everything when unbinding if
> > possible.
>
> Well unbind is pretty useless now IMO since bind doesn't work, and a transition straight to closed is just plain wrong anyway.
Why do you claim that a straight transition into the closed state is
wrong?
I don't see any such mention in blkif.h, which also doesn't contain
any guidelines regarding closing state transitions, so unless
otherwise stated somewhere else transitions into closed can happen
from any state IMO.
> But, we could have a flag that the backend driver sets to say that it supports transparent re-bind that gates this code. Would that make you feel more comfortable?
Having an option to leave state untouched when unbinding would be fine
for me, otherwise state should be set to closed when unbinding. I
don't think there's anything else that needs to be done in this
regard, the cleanup should be exactly the same the only difference
being the setting of all the active backends to closed state.
> If you want unbind to actually do a proper unplug then that's extra work and not really something I want to tackle (and re-bind would still need to be toolstack initiated as something would have to re-create the xenstore area).
Why do you say the xenstore area would need to be recreated?
Setting state to closed shouldn't cause any cleanup of the xenstore
area, as that should already happen for example when using pvgrub
since in that case grub itself disconnects and already causes a
transition to closed and a re-attachment afterwards by the guest
kernel.
Thanks, Roger.