Re: [PROBLEM] nbd requests become stuck when devices watched by inotify emit udev uevent changes

From: yukuai (C)
Date: Fri May 13 2022 - 23:39:36 EST


在 2022/05/13 21:13, Josef Bacik 写道:
On Fri, May 13, 2022 at 02:56:18PM +1200, Matthew Ruffell wrote:
Hi Josef,

Just a friendly ping, I am more than happy to test a patch, if you send it
inline in the email, since the pastebin you used expired after 1 day, and I
couldn't access it.

I came across and tested Yu Kuai's patches [1][2] which are for the same issue,
and they indeed fix the hang. Thank you Yu.

[1] nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
https://lists.debian.org/nbd/2022/04/msg00212.html

[2] nbd: fix io hung while disconnecting device
https://lists.debian.org/nbd/2022/04/msg00207.html

I am also happy to test any patches to fix the I/O errors.


Sorry, you caught me on vacation before and I forgot to reply. Here's part one
of the patch I wanted you to try which fixes the io hung part. Thanks,

Josef

From 0a6123520380cb84de8ccefcccc5f112bce5efb6 Mon Sep 17 00:00:00 2001
Message-Id: <0a6123520380cb84de8ccefcccc5f112bce5efb6.1652447517.git.josef@xxxxxxxxxxxxxx>
From: Josef Bacik <josef@xxxxxxxxxxxxxx>
Date: Sat, 23 Apr 2022 23:51:23 -0400
Subject: [PATCH] timeout thing

---
drivers/block/nbd.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 526389351784..ab365c0e9c04 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1314,7 +1314,10 @@ static void nbd_config_put(struct nbd_device *nbd)
kfree(nbd->config);
nbd->config = NULL;
- nbd->tag_set.timeout = 0;
+ /* Reset our timeout to something sane. */
+ nbd->tag_set.timeout = 30 * HZ;
+ blk_queue_rq_timeout(nbd->disk->queue, 30 * HZ);
+
nbd->disk->queue->limits.discard_granularity = 0;
nbd->disk->queue->limits.discard_alignment = 0;
blk_queue_max_discard_sectors(nbd->disk->queue, 0);

Hi, Josef

This seems to try to fix the same problem that I described here:

nbd: fix io hung while disconnecting device
https://lists.debian.org/nbd/2022/04/msg00207.html

There are still some io that are stuck, which means the devcie is
probably still opened. Thus nbd_config_put() can't reach here.
I'm afraid this patch can't fix the io hung.

Matthew, can you try a test with this patch together with my patch below
to comfirm my thought?

nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
https://lists.debian.org/nbd/2022/04/msg00212.html.

Thanks,
Kuai