Re: [PATCH v3 08/21] nvme: Implement cross-controller reset recovery
From: Randy Jennings
Date: Wed Feb 25 2026 - 21:41:56 EST
On Fri, Feb 13, 2026 at 8:28 PM Mohamed Khalfella
<mkhalfella@xxxxxxxxxxxxxxx> wrote:
>
> A host that has more than one path connecting to an nvme subsystem
> typically has an nvme controller associated with every path. This is
> mostly applicable to nvmeof. If one path goes down, inflight IOs on that
> path should not be retried immediately on another path because this
> could lead to data corruption as described in TP4129. TP8028 defines
> cross-controller reset mechanism that can be used by host to terminate
> IOs on the failed path using one of the remaining healthy paths. Only
> after IOs are terminated, or long enough time passes as defined by
> TP4129, inflight IOs should be retried on another path. Implement core
> cross-controller reset shared logic to be used by the transports.
>
> Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> + if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> + ret = -ETIMEDOUT;
> + goto out;
> + }
The more I look at this, the less I can ignore that this tmo should be
capped by deadline - now..
> +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> + deadline = now + msecs_to_jiffies(timeout);
> + while (time_before(now, deadline)) {
...
> + ret = nvme_issue_wait_ccr(sctrl, ictrl);
...
> + }
Sincerely,
Randy Jennings