Re: [PATCH] nvme: fix double blk_mq_complete_request for timeout request with low probability

From: Sagi Grimberg
Date: Fri Apr 07 2023 - 17:34:30 EST



diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 53ef028596c6..c1cc384f4f3e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -450,8 +450,8 @@ bool nvme_cancel_request(struct request *req, void *data)
dev_dbg_ratelimited(((struct nvme_ctrl *) data)->device,
"Cancelling I/O %d", req->tag);
- /* don't abort one completed request */
- if (blk_mq_request_completed(req))
+ /* don't abort one completed or idle request */
+ if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT)
return true;

I was suspicious about this path too, and had the same change long ago, but
shelved it when I couldn't produce any errors there. But the change makes sense
to me!

Reviewed-by: Keith Busch <kbusch@xxxxxxxxxx>

We need to change nvmf_complete_timed_out_request() too.

Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx>