Re: [PATCH v3 08/21] nvme: Implement cross-controller reset recovery

Next message: Jacob Keller: "Re: [PATCH] doc tools: better handle KBUILD_VERBOSE"
Previous message: Jacob Keller: "Re: kernel-doc overly verbose with V=0"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Mohamed Khalfella

Date: Fri Mar 27 2026 - 14:34:10 EST

On Wed 2026-02-25 18:37:44 -0800, Randy Jennings wrote:
> On Fri, Feb 13, 2026 at 8:28 PM Mohamed Khalfella
> <mkhalfella@xxxxxxxxxxxxxxx> wrote:
> >
> > A host that has more than one path connecting to an nvme subsystem
> > typically has an nvme controller associated with every path. This is
> > mostly applicable to nvmeof. If one path goes down, inflight IOs on that
> > path should not be retried immediately on another path because this
> > could lead to data corruption as described in TP4129. TP8028 defines
> > cross-controller reset mechanism that can be used by host to terminate
> > IOs on the failed path using one of the remaining healthy paths. Only
> > after IOs are terminated, or long enough time passes as defined by
> > TP4129, inflight IOs should be retried on another path. Implement core
> > cross-controller reset shared logic to be used by the transports.
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> > +static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> > + if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> > + ret = -ETIMEDOUT;
> > + goto out;
> > + }
> The more I look at this, the less I can ignore that this tmo should be
> capped by deadline - now..

I updated nvme_issue_wait_ccr() to do that.

>
> > +unsigned long nvme_fence_ctrl(struct nvme_ctrl *ictrl)
> > + deadline = now + msecs_to_jiffies(timeout);
> > + while (time_before(now, deadline)) {
> ...
> > + ret = nvme_issue_wait_ccr(sctrl, ictrl);
> ...
> > + }
> Sincerely,
> Randy Jennings