[PATCH] nvme: unquiesce the queue before cleaup it

From: Jianchao Wang
Date: Thu Apr 19 2018 - 04:29:10 EST


There is race between nvme_remove and nvme_reset_work that can
lead to io hang.

nvme_remove nvme_reset_work
-> change state to DELETING
-> fail to change state to LIVE
-> nvme_remove_dead_ctrl
-> nvme_dev_disable
-> quiesce request_queue
-> queue remove_work
-> cancel_work_sync reset_work
-> nvme_remove_namespaces
-> splice ctrl->namespaces
nvme_remove_dead_ctrl_work
-> nvme_kill_queues
-> nvme_ns_remove do nothing
-> blk_cleanup_queue
-> blk_freeze_queue
Finally, the request_queue is quiesced state when wait freeze,
we will get io hang here.

To fix it, unquiesce the request_queue directly before nvme_ns_remove.
We have spliced the ctrl->namespaces, so nobody could access them
and quiesce the queue any more.

Signed-off-by: Jianchao Wang <jianchao.w.wang@xxxxxxxxxx>
---
drivers/nvme/host/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9df4f71..0e95082 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3249,8 +3249,15 @@ void nvme_remove_namespaces(struct nvme_ctrl *ctrl)
list_splice_init(&ctrl->namespaces, &ns_list);
up_write(&ctrl->namespaces_rwsem);

- list_for_each_entry_safe(ns, next, &ns_list, list)
+ /*
+ * After splice the namespaces list from the ctrl->namespaces,
+ * nobody could get them anymore, let's unquiesce the request_queue
+ * forcibly to avoid io hang.
+ */
+ list_for_each_entry_safe(ns, next, &ns_list, list) {
+ blk_mq_unquiesce_queue(ns->queue);
nvme_ns_remove(ns);
+ }
}
EXPORT_SYMBOL_GPL(nvme_remove_namespaces);

--
2.7.4