Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed
From: Roger Pau Monné
Date: Mon Dec 09 2019 - 09:29:01 EST
On Mon, Dec 09, 2019 at 12:40:47PM +0000, Durrant, Paul wrote:
> > -----Original Message-----
> > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > Sent: 09 December 2019 12:26
> > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Juergen
> > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
> > closed
> >
> > On Mon, Dec 09, 2019 at 12:01:38PM +0000, Durrant, Paul wrote:
> > > > -----Original Message-----
> > > > From: Roger Pau Monné <roger.pau@xxxxxxxxxx>
> > > > Sent: 09 December 2019 11:39
> > > > To: Durrant, Paul <pdurrant@xxxxxxxxxx>
> > > > Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
> > Juergen
> > > > Gross <jgross@xxxxxxxx>; Stefano Stabellini <sstabellini@xxxxxxxxxx>;
> > > > Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
> > > > Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
> > forced to
> > > > closed
> > > >
> > > > On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote:
> > > > > Only force state to closed in the case when the toolstack may need
> > to
> > > > > clean up. This can be detected by checking whether the state in
> > xenstore
> > > > > has been set to closing prior to device removal.
> > > >
> > > > I'm not sure I see the point of this, I would expect that a failure to
> > > > probe or the removal of the device would leave the xenbus state as
> > > > closed, which is consistent with the actual driver state.
> > > >
> > > > Can you explain what's the benefit of leaving a device without a
> > > > driver in such unknown state?
> > > >
> > >
> > > If probe fails then I think it should leave the state alone. If the
> > > state is moved to closed then basically you just killed that
> > > connection to the guest (as the frontend will normally close down
> > > when it sees this change) so, if the probe failure was due to a bug
> > > in blkback or, e.g., a transient resource issue then it's game over
> > > as far as that guest goes.
> >
> > But the connection can be restarted by switching the backend to the
> > init state again.
>
> Too late. The frontend saw closed and you already lost.
>
> >
> > > The ultimate goal here is PV backend re-load that is completely
> > transparent to the guest. Modifying anything in xenstore compromises that
> > so we need to be careful.
> >
> > That's a fine goal, but not switching to closed state in
> > xenbus_dev_remove seems wrong, as you have actually left the frontend
> > without a matching backend and with the state not set to closed.
> >
>
> Why is this a problem? With this series fully applied a (block) backend can come and go without needing to change the state. Relying on guests to DTRT is not a sustainable option for a cloud deployment.
>
> > Ie: that would be fine if you explicitly state this is some kind of
> > internal blkback reload, but not for the general case where blkback
> > has been unbound. I think we need someway to difference a blkback
> > reload vs a unbound.
> >
>
> Why do we need that though? Why is it advantageous for a backend to go to closed. No PV backends cope with an unbind as-is, and a toolstack initiated unplug will always set state to 5 anyway. So TBH any state transition done directly in the xenbus code looks wrong to me anyway (but appears to be a necessary evil to keep the toolstack working in the event it spawns a backend where there is actually to driver present, or it doesn't come online).
IMO the normal flow for unbind would be to attempt to close open
connections and then remove the driver: leaving frontends connected
without any attached backends is not correct, and will just block the
guest frontend until requests start timing out.
I can see the reasoning for doing that for the purpose of updating a
blkback module without guests noticing, but I would prefer that
leaving connections open was an option that could be given when
unbinding (or maybe a driver option in sysfs?), so that the default
behaviour would be to try to close everything when unbinding if
possible.
Thanks, Roger.