Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to closed

From: JÃrgen GroÃ
Date: Mon Dec 09 2019 - 09:41:18 EST


On 09.12.19 15:23, Durrant, Paul wrote:
-----Original Message-----
From: JÃrgen Groà <jgross@xxxxxxxx>
Sent: 09 December 2019 14:10
To: Durrant, Paul <pdurrant@xxxxxxxxxx>; Roger Pau MonnÃ
<roger.pau@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx; Stefano
Stabellini <sstabellini@xxxxxxxxxx>; Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced to
closed

On 09.12.19 15:06, Durrant, Paul wrote:
-----Original Message-----
From: JÃrgen Groà <jgross@xxxxxxxx>
Sent: 09 December 2019 13:39
To: Durrant, Paul <pdurrant@xxxxxxxxxx>; Roger Pau MonnÃ
<roger.pau@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
Stefano
Stabellini <sstabellini@xxxxxxxxxx>; Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is forced
to
closed

On 09.12.19 13:19, Durrant, Paul wrote:
-----Original Message-----
From: JÃrgen Groà <jgross@xxxxxxxx>
Sent: 09 December 2019 12:09
To: Durrant, Paul <pdurrant@xxxxxxxxxx>; Roger Pau MonnÃ
<roger.pau@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
Stefano
Stabellini <sstabellini@xxxxxxxxxx>; Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
forced
to
closed

On 09.12.19 13:03, Durrant, Paul wrote:
-----Original Message-----
From: JÃrgen Groà <jgross@xxxxxxxx>
Sent: 09 December 2019 11:55
To: Roger Pau Monnà <roger.pau@xxxxxxxxxx>; Durrant, Paul
<pdurrant@xxxxxxxxxx>
Cc: linux-kernel@xxxxxxxxxxxxxxx; xen-devel@xxxxxxxxxxxxxxxxxxxx;
Stefano
Stabellini <sstabellini@xxxxxxxxxx>; Boris Ostrovsky
<boris.ostrovsky@xxxxxxxxxx>
Subject: Re: [Xen-devel] [PATCH 2/4] xenbus: limit when state is
forced
to
closed

On 09.12.19 12:39, Roger Pau Monnà wrote:
On Thu, Dec 05, 2019 at 02:01:21PM +0000, Paul Durrant wrote:
Only force state to closed in the case when the toolstack may
need
to
clean up. This can be detected by checking whether the state in
xenstore
has been set to closing prior to device removal.

I'm not sure I see the point of this, I would expect that a
failure
to
probe or the removal of the device would leave the xenbus state as
closed, which is consistent with the actual driver state.

Can you explain what's the benefit of leaving a device without a
driver in such unknown state?

And more concerning: did you check that no frontend/backend is
relying on the closed state to be visible without closing having
been
set before?

Blkfront doesn't seem to mind and I believe the Windows PV drivers
cope,
but I don't really understand the comment since this patch is
actually
removing a case where the backend transitions directly to closed.

I'm not speaking of blkfront/blkback only, but of net, tpm, scsi,
pvcall
etc. frontends/backends. After all you are modifying a function
common
to all PV driver pairs.

You are removing a state switc to "closed" in case the state was
_not_
"closing" before.

Yes, which AFAIK is against the intention of the generic PV protocol
such that it ever existed anyway.

While this might be the case we should _not_ break any guests
running now. So this kind of reasoning is dangerous.


So any PV driver reacting to "closed" of the other end
in case the previous state might not have been "closing" before is at
risk to misbehave with your patch.

Well, they will see nothing now. If the state was not closing, it gets
left alone, so the frontend shouldn't do anything. The only risk that I
can see is that some frontend/backend pair needed a direct 4 -> 6
transition to support 'unbind' before but AFAIK nothing has ever
supported
that, and blk and net crash'n'burn if you try that on upstream as it
stands. A clean unplug would always set state to 5 first, since that's
part of the unplug protocol.

That was my question: are you sure all current and previous
guest frontends and backends are handling unplug this way?

Not "should handle", but "do handle".

That depends on the toolstack. IIUC the only 'supported' toolstack is
xl/libxl, which will set 'state' to 5 and 'online' to 0 to initiate an
unplug.

I guess libvirt/libxl is doing the same?


The unplug mechansism is all in libxl AFAICT, so it should be identical.

At least at SUSE we still have some customers running xend based
Xen installations with recent Linux or Windows guests.


Is that something the upstream code can/should support though? I'd be surprised if xend is actually doing anything different to libxl since I've been coding the Windows PV drivers to trigger off the combined closing/online transition for as long as I can remember.

I'd rather not have to carry a private patch for new Linux kernel to be
able to run on those hosts.

AFAIK you at Amazon have some quite old Xen installations, too. How are
you handling that (assuming the customer is updating the kernel to a
recent version in his guest)?


Juergen