Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.

From: Konrad Rzeszutek Wilk
Date: Mon Mar 31 2014 - 16:33:32 EST


On Mon, Dec 02, 2013 at 11:41:57AM +0000, David Vrabel wrote:
> On 26/11/13 16:50, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote:
> >> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> >>> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> >>> 'process_msg' is running from within the 'xenbus' thread. Whenever
> >>> a message shows up in XenBus it is put on a xs_state.reply_list list
> >>> and 'read_reply' picks it up.
> >>>
> >>> The problem is if the backend domain or the xenstored process is killed.
> >>> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> >>> stuck forever waiting for the reply_list to have some contents.
> >>>
> >>> This is normally not a problem - as the backend domain can come back
> >>> or the xenstored process can be restarted. However if the domain
> >>> is in process of being powered off/restarted/halted - there is no
> >>> point of waiting on it coming back - as we are effectively being
> >>> terminated and should not impede the progress.
> >>>
> >>> This patch solves this problem by checking the 'system_state' value
> >>> to see if we are in heading towards death. We also make the wait
> >>> mechanism a bit more asynchronous.
> >>
> >> This seems to be checking the wrong thing conceptually. We should abort
> >> the wait if xenstored is dead not if our domain is dying.
> >>
> >> I think you can consider xenstored as dead if:
> >>
> >> a) it's local and we're dying.
> >
> > OK. Not sure exactly how to do that but that should be possible.
>
> xen_store_domain_type == XS_LOCAL and looking at system_state?
>
> >> b) it's remote and the remote domain is dead.
> >
> > OK, any idea how to do that? As in check if a remote domain is dead?
>
> Let someone who cares about xenstore domains fix this -- this is not the
> most common use case.
>
> I'd be happy to have some thing like:
>
> bool xenbus_ok(void)
> {
> switch (xen_store_domain_type) {
> case XS_LOCAL:
> return system_state != dying;
> case XS_PV:
> case XS_HVM;
> /* FIXME: could check remote domain is alive, but it's
> normally dom0. */
> return true;
> // ...
> default:
> return true;
> }
> }