Re: [PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT

From: Mohamed Khalfella

Date: Mon Feb 16 2026 - 13:45:53 EST

On Mon 2026-02-16 13:54:18 +0100, Hannes Reinecke wrote:
> On 2/14/26 05:25, Mohamed Khalfella wrote:
> > TP8028 Rapid Path Failure Recovery does not define how much time the
> > host should wait for CCR operation to complete. It is reasonable to
> > assume that CCR operation can take up to ctrl->cqt. Update wait time for
> > CCR operation to be max(ctrl->cqt, ctrl->kato).
> >
> > Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
> > ---
> > drivers/nvme/host/core.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> > index 0680d05900c1..ff479c0263ab 100644
> > --- a/drivers/nvme/host/core.c
> > +++ b/drivers/nvme/host/core.c
> > @@ -631,7 +631,7 @@ static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
> > if (result & 0x01) /* Immediate Reset Successful */
> > goto out;
> >
> > - tmo = secs_to_jiffies(ictrl->kato);
> > + tmo = msecs_to_jiffies(max(ictrl->cqt, ictrl->kato * 1000));
> > if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
> > ret = -ETIMEDOUT;
> > goto out;
>
> That is not my understanding. I was under the impression that CQT is the
> _additional_ time a controller requires to clear out outstanding
> commands once it detected a loss of communication (ie _after_ KATO).
> Which would mean we have to wait for up to
> (ctrl->kato * 1000) + ctrl->cqt.

At this point the source controller knows about communication loss. We
do not need kato wait. In theory we should just wait for CQT.
max(cqt, kato) is a conservative guess I made.

>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Kernel Storage Architect
> hare@xxxxxxx +49 911 74053 688
> SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
> HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich