Introduce xenwatch multithreading (mtwatch)

From: Dongli Zhang
Date: Fri Sep 14 2018 - 03:34:32 EST


Hi,

This patch set introduces xenwatch multithreading (mtwatch) based on the
below xen summit 2018 design session notes:

https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg00017.html


xenwatch_thread is a single kernel thread processing the callback function
for subscribed xenwatch events successively. The xenwatch is stalled in 'D'
state if any of callback function is stalled and uninterruptible.

The domU create/destroy is failed if xenwatch is stalled in 'D' state as
the paravirtual driver init/uninit cannot complete. Usually, the only
option is to reboot dom0 server unless there is solution/workaround to
move forward and complete the stalled xenwatch event callback function.
Below is the output of 'xl create' when xenwatch is stalled (the issue is
reproduced on purpose by hooking netif_receive_skb() to intercept an
sk_buff sent out from vifX.Y on dom0 with patch at
https://github.com/finallyjustice/patchset/blob/master/xenwatch-stall-by-vif.patch):

# xl create pv.cfg
Parsing config from pv.cfg
libxl: error: libxl_device.c:1080:device_backend_callback: Domain 2:unable to add device with path /local/domain/0/backend/vbd/2/51712
libxl: error: libxl_create.c:1278:domcreate_launch_dm: Domain 2:unable to add disk devices
libxl: error: libxl_device.c:1080:device_backend_callback: Domain 2:unable to remove device with path /local/domain/0/backend/vbd/2/51712
libxl: error: libxl_domain.c:1073:devices_destroy_cb: Domain 2:libxl__devices_destroy failed
libxl: error: libxl_domain.c:1000:libxl__destroy_domid: Domain 2:Non-existant domain
libxl: error: libxl_domain.c:959:domain_destroy_callback: Domain 2:Unable to destroy guest
libxl: error: libxl_domain.c:886:domain_destroy_cb: Domain 2:Destruction of domain failed


The idea of this patch set is to create a per-domU xenwatch thread for each
domid. The per-domid thread is created when the 1st pv backend device (for
this domid and with xenwatch multithreading enabled) is created, while this
thread is destroyed when the last pv backend device (for this domid and
with xenwatch multithreading enabled) is removed. Per-domid xs_watch_event
is never put on the default event list, but is put on the per-domid event
list directly.


For more details, please refer to the xen summit 2018 design session notes
and presentation slides:

https://lists.xenproject.org/archives/html/xen-devel/2018-07/msg00017.html
http://www.donglizhang.org/xenwatch_multithreading.pdf

----------------------------------------------------------------

Dongli Zhang (6):
xenbus: prepare data structures and parameter for xenwatch multithreading
xenbus: implement the xenwatch multithreading framework
xenbus: dispatch per-domU watch event to per-domU xenwatch thread
xenbus: process otherend_watch event at 'state' entry in xenwatch multithreading
xenbus: process be_watch events in xenwatch multithreading
drivers: enable xenwatch multithreading for xen-netback and xen-blkback driver

Documentation/admin-guide/kernel-parameters.txt | 3 +
drivers/block/xen-blkback/xenbus.c | 3 +-
drivers/net/xen-netback/xenbus.c | 1 +
drivers/xen/xenbus/xenbus_probe.c | 24 +-
drivers/xen/xenbus/xenbus_probe_backend.c | 32 +++
drivers/xen/xenbus/xenbus_xs.c | 357 +++++++++++++++++++++++-
include/xen/xenbus.h | 70 +++++
7 files changed, 484 insertions(+), 6 deletions(-)

Thank you very much!

Dongli Zhang