Re: [PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT

From: Randy Jennings

Date: Thu Feb 19 2026 - 21:11:26 EST

On Thu, Feb 19, 2026 at 5:22 PM James Smart <jsmart833426@xxxxxxxxx> wrote:
>
> On 2/17/2026 7:35 AM, Mohamed Khalfella wrote:
> > On Tue 2026-02-17 08:09:33 +0100, Hannes Reinecke wrote:
> >> On 2/16/26 19:45, Mohamed Khalfella wrote:
> >>> On Mon 2026-02-16 13:54:18 +0100, Hannes Reinecke wrote:
> >>>> On 2/14/26 05:25, Mohamed Khalfella wrote:
> >>>>> TP8028 Rapid Path Failure Recovery does not define how much time the
> >>>>> host should wait for CCR operation to complete. It is reasonable to
> >>>>> assume that CCR operation can take up to ctrl->cqt. Update wait time for
> >>>>> CCR operation to be max(ctrl->cqt, ctrl->kato).
> >>>>>
> >>>>> Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> >>>>> ---
> >>>>> drivers/nvme/host/core.c | 2 +-
> >>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> >>>>> index 0680d05900c1..ff479c0263ab 100644
> >>>>> --- a/drivers/nvme/host/core.c
> >>>>> +++ b/drivers/nvme/host/core.c
> >>>>> @@ -631,7 +631,7 @@ static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> >>>>> if (result & 0x01) /* Immediate Reset Successful */
> >>>>> goto out;
> >>>>>
> >>>>> - tmo = secs_to_jiffies(ictrl->kato);
> >>>>> + tmo = msecs_to_jiffies(max(ictrl->cqt, ictrl->kato * 1000));
> >>>>> if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> >>>>> ret = -ETIMEDOUT;
> >>>>> goto out;
> >>>>
> >>>> That is not my understanding. I was under the impression that CQT is the
> >>>> _additional_ time a controller requires to clear out outstanding
> >>>> commands once it detected a loss of communication (ie _after_ KATO).
> >>>> Which would mean we have to wait for up to
> >>>> (ctrl->kato * 1000) + ctrl->cqt.
> >>>
> >>> At this point the source controller knows about communication loss. We
> >>> do not need kato wait. In theory we should just wait for CQT.
> >>> max(cqt, kato) is a conservative guess I made.
> >>>
> >> Not quite. The source controller (on the host!) knows about the
> >> communication loss. But the target might not, as the keep-alive
> >> command might have arrived at the target _just_ before KATO
> >> triggered on the host. So the target is still good, and will
> >> be waiting for _another_ KATO interval before declaring
> >> a loss of communication.
> >> And only then will the CQT period start at the target.
> >>
> >> Randy, please correct me if I'm wrong ...
> >>
> >
> > wait_for_completion_timeout(&ccr.complete, tmo)) waits for CCR operation
> > to complete. The wait starts after CCR command completed successfully.
> > IOW, it starts after the host received a CQE from source controller on
> > the target telling us all is good. If the source controller on the target
> > already know about loss of communication then there is no need to wait
> > for KATO. We just need to wait for CCR operation to finish because we
> > know it has been started successfully.
> >
> > The specs does not tell us how much time to wait for CCR operation to
> > complete. max(cqt, kato) is an estimate I think reasonable to make.
>
> So, we've sent CCR, received a CQE for the CCR within KATO (timeout in
> nvme_issue_wait_ccr()), then are waiting another max(KATO, CQT) for the
> io to die.
>
> As CQT is the time to wait once the ctrl is killing the io, and as the
> response indicated it certainly passed that point, a minimum of CQT
> should be all that is needed. Why are we bringing KATO into the picture?
Good point.

>
> -- this takes me over to patch 8 and the timeout on CCR response being KATO:
> Why is KATO being used ? nothing about getting the response says it is
> related to the keep alive. Keepalive can move along happily while CCR
> hangs out and really has nothing to do with KATO.
>
> If using the rationale of a keepalive cmd processing - has roundtrip
> time and minimal and prioritized processing, as CCR needs to do more and
> as the spec allows holding on to always return 1, it should be
> KATO+<something>, where <something> is no more than CQT.
Well, CCR was supposed to decide to fail at some time less than CQT
on the controller. But I see your reasoning. Using the normal admin
timeout time would probably also work.

> But given that KATO can be really long as its trying to catch
> communication failures, and as our ccr controller should not have comm
> issues, it should be fairly quick. So rather than a 2min KATO, why not
> 10-15s ?
Ugh. 2 minute KATO? Have you seen that in the field? I've
seen 5-30 seconds.

> This gets a little crazy as it takes me down paths of why not
> fire off multiple CCRs (via different ctlrs) to the subsystem at short
> intervals (the timeout) to finally find one that completes quickly and
> then start CQT.
This is an interesting idea. That said, there was concern in the
group that controllers would have a low CCRL (like, 4). And I would
expect some paths down to be correlated (when connected to
an HA pair subsystem).

I was not sure why the expected limit would be low; the
implementation I am considering should have a rather large limit,
so I like your idea.

> And if nothing completes quickly bound the whole thing
> to fencing start+KATO+CQT ?
Well, 2x or 3x KATO.

Sincerely,
Randy Jennings