Re: [PATCH v3 2/3] nvme: trigger reset when keep alive fails

From: Sagi Grimberg
Date: Tue Dec 24 2024 - 05:38:36 EST





On 29/11/2024 11:28, Daniel Wagner wrote:
nvme_keep_alive_work setups a keep alive command and uses
blk_execute_rq_nowait to send out the command in an asynchronously
manner. Eventually, nvme_keep_alive_end_io is called. If the status
argument is 0, a new keep alive is send out. When the status argument is
not 0, only an error is logged. The keep alive machinery does not
trigger the error recovery.

The FC driver is relying on the keep alive machinery to trigger recovery
when an error is detected. Whenever an error happens during the creation
of the association the idea is that the operation is aborted and retried
later. Though there is a window where an error happens and
nvme_fc_create_assocation can't detect the error.

1) nvme nvme10: NVME-FC{10}: create association : ...
2) nvme nvme10: NVME-FC{10}: controller connectivity lost. Awaiting Reconnect
nvme nvme10: queue_size 128 > ctrl maxcmd 32, reducing to maxcmd
3) nvme nvme10: Could not set queue count (880)
nvme nvme10: Failed to configure AEN (cfg 900)
4) nvme nvme10: NVME-FC{10}: controller connect complete
5) nvme nvme10: failed nvme_keep_alive_end_io error=4

A connection attempt starts 1) and the ctrl is in state CONNECTING.
Shortly after the LLDD driver detects a connection lost event and calls
nvme_fc_ctrl_connectivity_loss 2). Because we are still in CONNECTING
state, this event is ignored.

nvme_fc_create_association continues to run in parallel and tries to
communicate with the controller and those commands fail. Though these
errors are filtered out, e.g in 3) setting the I/O queues numbers fails
which leads to an early exit in nvme_fc_create_io_queues. Because the
number of IO queues is 0 at this point, there is nothing left in
nvme_fc_create_association which could detected the connection drop.
Thus the ctrl enters LIVE state 4).

The keep alive timer fires and a keep alive command is send off but
gets rejected by nvme_fc_queue_rq and the rq status is set to
NVME_SC_HOST_PATH_ERROR. The nvme status is then mapped to a block layer
status BLK_STS_TRANSPORT/4 in nvme_end_req. Eventually,
nvme_keep_alive_end_io sees the status != 0 and just logs an error 5).

We should obviously detect the problem in 3) and abort there (will
address this later), but that still leaves a race window open. There is
a race window open in nvme_fc_create_association after starting the IO
queues and setting the ctrl state to LIVE.

Thus trigger a reset from the keep alive handler when an error is
reported.

Signed-off-by: Daniel Wagner <wagi@xxxxxxxxxx>
---
drivers/nvme/host/core.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index bfd71511c85f8b1a9508c6ea062475ff51bf27fe..2a07c2c540b26c8cbe886711abaf6f0afbe6c4df 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1320,6 +1320,12 @@ static enum rq_end_io_ret nvme_keep_alive_end_io(struct request *rq,
dev_err(ctrl->device,
"failed nvme_keep_alive_end_io error=%d\n",
status);
+ /*
+ * The driver reports that we lost the connection,
+ * trigger a recovery.
+ */
+ if (status == BLK_STS_TRANSPORT)
+ nvme_reset_ctrl(ctrl);
return RQ_END_IO_NONE;
}


A lengthy explanation that results in nvme core behavior that assumes a very specific driver
behavior. Isn't the root of the problem that FC is willing to live peacefully with a controller
without any queues/connectivity to it without periodically reconnecting?