Re: [PATCH v3 18/21] nvme: Update CCR completion wait timeout to consider CQT

From: Hannes Reinecke

Date: Mon Mar 02 2026 - 02:35:46 EST


On 2/27/26 04:05, Randy Jennings wrote:
On Thu, Feb 19, 2026 at 11:25 PM Hannes Reinecke <hare@xxxxxxx> wrote:

On 2/20/26 03:01, Randy Jennings wrote:
Hannes,

(ctrl->kato * 1000) + ctrl->cqt
As Mohamed pointed out, we have already received a response from a CCR
command. The CCR, once accepted, communicates the death of the
connection to the impacted controller and starts the cleanup tracked
by CQT. So, no need to wait for the impacted controller to figure out
the connection is down.

The max(cqt, kato) was just to give some wait time that should allow
issuing a CCR again from a different controller (in case of losing
communication with this one). It certainly does not need to be longer
than cqt (and it should be no longer than the remaining duration of
time-based retry; that should get addressed at some point). I cannot
remember why kato (if larger; I expect it would be smaller) made sense
at the time.

Because we have to wait for the AEN, at which point KATO comes into
play yet again.
So max(CQT, KATO) is the appropriate waiting time for that.
I see your point. It could take ~KATO time for the AEN to show up after
the CCR operation finishes. Technically true. However, if responses
are taking KATO time to get back to the host, I think would rather retry
on a more healthy link.

Sure. But currently we don't have a policy for this; for us the
AEN is just a normal completion, for which we have to wait until
the KATO interval is exhausted.

We really should have a session or BOF about CCR handling at LSF.

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich