Re: [PATCH v2] xen: avoid deadlock in xenbus driver

From: Andre Przywara
Date: Mon Jun 12 2017 - 12:30:14 EST


Hi,

On 08/06/17 15:03, Juergen Gross wrote:
> There has been a report about a deadlock in the xenbus driver:
>
> [ 247.979498] ======================================================
> [ 247.985688] WARNING: possible circular locking dependency detected
> [ 247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
> [ 247.997040] ------------------------------------------------------
> [ 248.003232] xenbus/91 is trying to acquire lock:
> [ 248.007875] (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
> xenbus_dev_queue_reply+0x3c/0x230
> [ 248.017163]
> [ 248.017163] but task is already holding lock:
> [ 248.023096] (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [ 248.031267]
> [ 248.031267] which lock already depends on the new lock.
> [ 248.031267]
> [ 248.039615]
> [ 248.039615] the existing dependency chain (in reverse order) is:
> [ 248.047176]
> [ 248.047176] -> #1 (xb_write_mutex){+.+...}:
> [ 248.052943] __lock_acquire+0x1728/0x1778
> [ 248.057498] lock_acquire+0xc4/0x288
> [ 248.061630] __mutex_lock+0x84/0x868
> [ 248.065755] mutex_lock_nested+0x3c/0x50
> [ 248.070227] xs_send+0x164/0x1f8
> [ 248.074015] xenbus_dev_request_and_reply+0x6c/0x88
> [ 248.079427] xenbus_file_write+0x260/0x420
> [ 248.084073] __vfs_write+0x48/0x138
> [ 248.088113] vfs_write+0xa8/0x1b8
> [ 248.091983] SyS_write+0x54/0xb0
> [ 248.095768] el0_svc_naked+0x24/0x28
> [ 248.099897]
> [ 248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
> [ 248.106088] print_circular_bug+0x80/0x2e0
> [ 248.110730] __lock_acquire+0x1768/0x1778
> [ 248.115288] lock_acquire+0xc4/0x288
> [ 248.119417] __mutex_lock+0x84/0x868
> [ 248.123545] mutex_lock_nested+0x3c/0x50
> [ 248.128016] xenbus_dev_queue_reply+0x3c/0x230
> [ 248.133005] xenbus_thread+0x788/0x798
> [ 248.137306] kthread+0x110/0x140
> [ 248.141087] ret_from_fork+0x10/0x40
>
> It is rather easy to avoid by dropping xb_write_mutex before calling
> xenbus_dev_queue_reply().
>
> Fixes: fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
> driver for multiple concurrent xenstore accesses").
>
> Cc: <stable@xxxxxxxxxxxxxxx> # 4.11
> Reported-by: Andre Przywara <andre.przywara@xxxxxxx>

I managed to find a reliable (though weird) reproducer and can confirm
that this patch fixes the issue.
So many thanks for the quick work!

Tested-by: Andre Przywara <andre.przywara@xxxxxxx>

Cheers,
Andre.

> Signed-off-by: Juergen Gross <jgross@xxxxxxxx>
> ---
> drivers/xen/xenbus/xenbus_comms.c | 21 ++++++++++-----------
> 1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
> index 856ada5d39c9..5b081a01779d 100644
> --- a/drivers/xen/xenbus/xenbus_comms.c
> +++ b/drivers/xen/xenbus/xenbus_comms.c
> @@ -299,17 +299,7 @@ static int process_msg(void)
> mutex_lock(&xb_write_mutex);
> list_for_each_entry(req, &xs_reply_list, list) {
> if (req->msg.req_id == state.msg.req_id) {
> - if (req->state == xb_req_state_wait_reply) {
> - req->msg.type = state.msg.type;
> - req->msg.len = state.msg.len;
> - req->body = state.body;
> - req->state = xb_req_state_got_reply;
> - list_del(&req->list);
> - req->cb(req);
> - } else {
> - list_del(&req->list);
> - kfree(req);
> - }
> + list_del(&req->list);
> err = 0;
> break;
> }
> @@ -317,6 +307,15 @@ static int process_msg(void)
> mutex_unlock(&xb_write_mutex);
> if (err)
> goto out;
> +
> + if (req->state == xb_req_state_wait_reply) {
> + req->msg.type = state.msg.type;
> + req->msg.len = state.msg.len;
> + req->body = state.body;
> + req->state = xb_req_state_got_reply;
> + req->cb(req);
> + } else
> + kfree(req);
> }
>
> mutex_unlock(&xs_response_mutex);
>