Re: [PATCH v4 08/15] nvme: Implement cross-controller reset recovery

From: Randy Jennings

Date: Fri Apr 24 2026 - 19:08:17 EST


On Fri, Mar 27, 2026 at 5:46 PM Mohamed Khalfella
<mkhalfella@xxxxxxxxxxxxxxx> wrote:
> Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>

> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c

> +int nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> +{
> + unsigned long deadline, timeout;
> + struct nvme_ctrl *sctrl;
> + u32 min_cntlid = 0;
> + int ret;
> +
> + timeout = nvme_fence_timeout_ms(ictrl);
> + dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
> +
> + deadline = jiffies + msecs_to_jiffies(timeout);
> + while (time_is_after_jiffies(deadline)) {
> + sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid);
> + if (!sctrl) {
> + dev_dbg(ictrl->device,
> + "failed to find source controller\n");
> + return -EIO;
> + }
> +
> + ret = nvme_issue_wait_ccr(sctrl, ictrl, deadline);
> + if (!ret) {
> + dev_info(ictrl->device, "CCR succeeded using %s\n",
> + dev_name(sctrl->device));
> + nvme_put_ctrl_ccr(sctrl);
> + return 0;
> + }
> +
> + min_cntlid = sctrl->cntlid + 1;
> + nvme_put_ctrl_ccr(sctrl);
> +

If we remove this code from here
> + if (ret == -EIO) /* CCR command failed */
> + continue;
> +
> + /* CCR operation failed or timed out */
> + return ret;
to here, failed CCR operations (not just failed CCR cmds)
will get retried (until we run out of ctrls or time). This is
important if controllers cannot handle a CCR for some
other controllers. Sagi, you requested that we not retry
the CCR operation on another controller, and I told you
that was affecting Igor's and my testing. May we please
remove this code?

> + }
> +
> + dev_info(ictrl->device, "CCR operation timeout\n");
> + return -ETIMEDOUT;
> +}

Sincerely,
Randy Jennings