Re: [PATCH v2 12/14] nvme-fc: Decouple error recovery from controller reset

From: James Smart

Date: Tue Feb 03 2026 - 17:49:22 EST

On 2/3/2026 11:19 AM, James Smart wrote:

On 1/30/2026 2:34 PM, Mohamed Khalfella wrote:

...

static void
nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
{
@@ -2049,9 +2061,8 @@ nvme_fc_fcpio_done(struct nvmefc_fcp_req *req)
          nvme_fc_complete_rq(rq);
check_error:
-    if (terminate_assoc &&
-        nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_RESETTING)
-        queue_work(nvme_reset_wq, &ctrl->ioerr_work);
+    if (terminate_assoc)
+        nvme_fc_start_ioerr_recovery(ctrl, "io error");

this is ok. the ioerr_recovery will bounce the RESETTING state if it's already in the state. So this is a little cleaner.a

What is problematic here is - if the start_ioerr path includes the CONNECTING logic that terminates i/o's, it's running in the LLDD's context that called this iodone routine. Not good. In existing code, the LLDD context was swapped to the work queue where error_recovery was called.

}
static int
@@ -2495,39 +2506,6 @@ __nvme_fc_abort_outstanding_ios(struct nvme_fc_ctrl *ctrl, bool start_queues)
          nvme_unquiesce_admin_queue(&ctrl->ctrl);
}
-static void
-nvme_fc_error_recovery(struct nvme_fc_ctrl *ctrl, char *errmsg)
-{
-    enum nvme_ctrl_state state = nvme_ctrl_state(&ctrl->ctrl);
-
-    /*
-     * if an error (io timeout, etc) while (re)connecting, the remote
-     * port requested terminating of the association (disconnect_ls)
-     * or an error (timeout or abort) occurred on an io while creating
-     * the controller. Abort any ios on the association and let the
-     * create_association error path resolve things.
-     */
-    if (state == NVME_CTRL_CONNECTING) {
-        __nvme_fc_abort_outstanding_ios(ctrl, true);
-        dev_warn(ctrl->ctrl.device,
-            "NVME-FC{%d}: transport error during (re)connect\n",
-            ctrl->cnum);
-        return;
-    }

This logic needs to be preserved. Its no longer part of nvme_fc_start_ioerr_recovery(). Failures during CONNECTING should not be "fenced". They should fail immediately.

this logic, if left in start_ioerr_recovery

-- james