Re: [PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT

From: Hannes Reinecke

Date: Fri Feb 20 2026 - 02:23:45 EST

On 2/20/26 02:22, James Smart wrote:

On 2/17/2026 7:35 AM, Mohamed Khalfella wrote:

On Tue 2026-02-17 08:09:33 +0100, Hannes Reinecke wrote:

On 2/16/26 19:45, Mohamed Khalfella wrote:

On Mon 2026-02-16 13:54:18 +0100, Hannes Reinecke wrote:

On 2/14/26 05:25, Mohamed Khalfella wrote:

TP8028 Rapid Path Failure Recovery does not define how much time the
host should wait for CCR operation to complete. It is reasonable to
assume that CCR operation can take up to ctrl->cqt. Update wait time for
CCR operation to be max(ctrl->cqt, ctrl->kato).

Signed-off-by: Mohamed Khalfella <mkhalfella@xxxxxxxxxxxxxxx>
---
    drivers/nvme/host/core.c | 2 +-
    1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0680d05900c1..ff479c0263ab 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -631,7 +631,7 @@ static int nvme_issue_wait_ccr(struct nvme_ctrl *sctrl, struct nvme_ctrl *ictrl)
        if (result & 0x01) /* Immediate Reset Successful */
            goto out;
-    tmo = secs_to_jiffies(ictrl->kato);
+    tmo = msecs_to_jiffies(max(ictrl->cqt, ictrl->kato * 1000));
        if (!wait_for_completion_timeout(&ccr.complete, tmo)) {
            ret = -ETIMEDOUT;
            goto out;

That is not my understanding. I was under the impression that CQT is the
_additional_ time a controller requires to clear out outstanding
commands once it detected a loss of communication (ie _after_ KATO).
Which would mean we have to wait for up to
(ctrl->kato * 1000) + ctrl->cqt.

At this point the source controller knows about communication loss. We
do not need kato wait. In theory we should just wait for CQT.
max(cqt, kato) is a conservative guess I made.

Not quite. The source controller (on the host!) knows about the
communication loss. But the target might not, as the keep-alive
command might have arrived at the target _just_ before KATO
triggered on the host. So the target is still good, and will
be waiting for _another_ KATO interval before declaring
a loss of communication.
And only then will the CQT period start at the target.

Randy, please correct me if I'm wrong ...

wait_for_completion_timeout(&ccr.complete, tmo)) waits for CCR operation
to complete. The wait starts after CCR command completed successfully.
IOW, it starts after the host received a CQE from source controller on
the target telling us all is good. If the source controller on the target
already know about loss of communication then there is no need to wait
for KATO. We just need to wait for CCR operation to finish because we
know it has been started successfully.

The specs does not tell us how much time to wait for CCR operation to
complete. max(cqt, kato) is an estimate I think reasonable to make.

So, we've sent CCR, received a CQE for the CCR within KATO (timeout in nvme_issue_wait_ccr()), then are waiting another max(KATO, CQT) for the io to die.

As CQT is the time to wait once the ctrl is killing the io, and as the response indicated it certainly passed that point, a minimum of CQT should be all that is needed. Why are we bringing KATO into the picture?

Well, a successful CCR completion (without the IRS bit) just indicates
that the controller has started aborting commands.
The host has still to wait for that the finish.
The controller signals command abort completion via an AEN and
corresponding logpage. For which we have to wait for up to CQT.

But as commands are involved (we have to wait for the AEN) the
actual waiting time is max(KATO,CQT).

-- this takes me over to patch 8 and the timeout on CCR response being KATO:
Why is KATO being used ? nothing about getting the response says it is related to the keep alive. Keepalive can move along happily while CCR hangs out and really has nothing to do with KATO.

The keepalive timeout is a measure for connectivity loss.
Or, more general, the minimal time each side is required to wait before
declaring any command as 'lost' (a bit like R_A_TOV ...).
So sending the CCR command (and waiting for the response) is governed
by KATO.

If using the rationale of a keepalive cmd processing - has roundtrip time and minimal and prioritized processing, as CCR needs to do more and as the spec allows holding on to always return 1, it should be KATO+<something>, where <something> is no more than CQT.

Again, this is not so much about the keepalive command but rather about
the _time_ each side is required to wait for the keepalive response.

Technically you are correct, though, and CCR should be treated just
like any other command. But the problem currently is that the nvme
timeout handler triggers on _command timeout_, not on KATO timeout.
We're trying to change that, but it takes time ...

But given that KATO can be really long as its trying to catch communication failures, and as our ccr controller should not have comm issues, it should be fairly quick. So rather than a 2min KATO, why not 10-15s ? This gets a little crazy as it takes me down paths of why not fire off multiple CCRs (via different ctlrs) to the subsystem at short intervals (the timeout) to finally find one that completes quickly and then start CQT. And if nothing completes quickly bound the whole thing to fencing start+KATO+CQT ?

As it currently stands, CCR is only useful if the entire execution time
is significantly shorter than KATO.
In the current model error handling starts once KATO timeout triggers;
then CCR is sent and we're waiting for the AEN for max(CQT,KATO) before
retrying commands.
(I _think_ to be absolutely correct we would have to wait for CQT + KATO, but that's beside the point).

So the main difference between the current error handling is the
additional waiting time for CCR or max(CQT,KATO) if CCR fails.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich