RE: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

From: Durrant, Paul
Date: Mon Dec 09 2019 - 09:41:47 EST


> -----Original Message-----
> From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> Sent: 09 December 2019 14:29
> To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Juergen
> Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> closed
>
> On Mon, Dec 09, 2019 at 12:40:47PM +0000, Durrant, Paul wrote:
> > > -----Original Message-----
> > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > Sent: 09 December 2019 12:26
> > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> Juergen
> > > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> forced to
> > > closed
> > >
> > > On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote:
> > > > > -----Original Message-----
> > > > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > > > Sent: 09 December 2019 11:39
> > > > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> > > Juergen
> > > > > Gross <jgross@xxxxxxxx>; Stefano Stabellini
> <sstabellini@xxxxxxxxxx>;
> > > > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > > forced to
> > > > > closed
> > > > >
> > > > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote:
> > > > > > Only force state to closed in the case when the toolstack may
> need
> > > to
> > > > > > clean up. This can be detected by checking whether the state in
> > > xenstore
> > > > > > has been set to closing prior to device removal.
> > > > >
> > > > > I'm not sure I see the point of this, I would expect that a
> failure to
> > > > > probe or the removal of the device would leave the xenbus state as
> > > > > closed, which is consistent with the actual driver state.
> > > > >
> > > > > Can you explain what's the benefit of leaving a device without a
> > > > > driver in such unknown state?
> > > > >
> > > >
> > > > If probe fails then I think it should leave the state alone. If the
> > > > state is moved to closed then basically you just killed that
> > > > connection to the guest (as the frontend will normally close down
> > > > when it sees this change) so, if the probe failure was due to a bug
> > > > in blkback or, e.g., a transient resource issue then it's game over
> > > > as far as that guest goes.
> > >
> > > But the connection can be restarted by switching the backend to the
> > > init state again.
> >
> > Too late. The frontend saw closed and you already lost.
> >
> > >
> > > > The ultimate goal here is PV backend re-load that is completely
> > > transparent to the guest. Modifying anything in xenstore compromises
> that
> > > so we need to be careful.
> > >
> > > That's a fine goal, but not switching to closed state in
> > > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > > without a matching backend and with the state not set to closed.
> > >
> >
> > Why is this a problem? With this series fully applied a (block) backend
> can come and go without needing to change the state. Relying on guests to
> DTRT is not a sustainable option for a cloud deployment.
> >
> > > Ie: that would be fine if you explicitly state this is some kind of
> > > internal blkback reload, but not for the general case where blkback
> > > has been unbound. I think we need someway to difference a blkback
> > > reload vs a unbound.
> > >
> >
> > Why do we need that though? Why is it advantageous for a backend to go
> to closed. No PV backends cope with an unbind as-is, and a toolstack
> initiated unplug will always set state to 5 anyway. So TBH any state
> transition done directly in the xenbus code looks wrong to me anyway (but
> appears to be a necessary evil to keep the toolstack working in the event
> it spawns a backend where there is actually to driver present, or it
> doesn't come online).
>
> IMO the normal flow for unbind would be to attempt to close open
> connections and then remove the driver: leaving frontends connected
> without any attached backends is not correct, and will just block the
> guest frontend until requests start timing out.
>
> I can see the reasoning for doing that for the purpose of updating a
> blkback module without guests noticing, but I would prefer that
> leaving connections open was an option that could be given when
> unbinding (or maybe a driver option in sysfs?), so that the default
> behaviour would be to try to close everything when unbinding if
> possible.

Well unbind is pretty useless now IMO since bind doesn't work, and a transition straight to closed is just plain wrong anyway. But, we could have a flag that the backend driver sets to say that it supports transparent re-bind that gates this code. Would that make you feel more comfortable?

If you want unbind to actually do a proper unplug then that's extra work and not really something I want to tackle (and re-bind would still need to be toolstack initiated as something would have to re-create the xenstore area).

Paul

>
> Thanks, Roger.