Re: [PATCH v2 08/14] nvme: Implement cross-controller reset recovery
From: Hannes Reinecke
Date: Wed Feb 11 2026 - 10:19:30 EST
On 2/11/26 04:44, Randy Jennings wrote:
On Wed, Feb 4, 2026 at 3:24 PM Mohamed KhalfellaAh, well; that's me being mainly focused on command timeouts.
<mkhalfella@xxxxxxxxxxxxxxx> wrote:
It is not true that CCR failing means something odd is going on. In a
On Wed 2026-02-04 02:10:48 +0100, Hannes Reinecke wrote:
On 2/3/26 21:00, Mohamed Khalfella wrote:
On Tue 2026-02-03 06:19:51 +0100, Hannes Reinecke wrote:[ .. ]
On 1/30/26 23:34, Mohamed Khalfella wrote:
Yes. But I guess my point here is that we should differentiate between+ timeout = nvme_fence_timeout_ms(ictrl);
+ dev_info(ictrl->device, "attempting CCR, timeout %lums\n", timeout);
+
+ now = jiffies;
+ deadline = now + msecs_to_jiffies(timeout);
+ while (time_before(now, deadline)) {
+ sctrl = nvme_find_ctrl_ccr(ictrl, min_cntlid);
+ if (!sctrl) {
+ /* CCR failed, switch to time-based recovery */
+ return deadline - now;
+ }
+
+ ret = nvme_issue_wait_ccr(sctrl, ictrl);
+ if (!ret) {
+ dev_info(ictrl->device, "CCR succeeded using %s\n",
+ dev_name(sctrl->device));
+ nvme_put_ctrl_ccr(sctrl);
+ return 0;
+ }
+
+ /* CCR failed, try another path */
+ min_cntlid = sctrl->cntlid + 1;
+ nvme_put_ctrl_ccr(sctrl);
+ now = jiffies;
+ }
That will spin until 'deadline' is reached if 'nvme_issue_wait_ccr()'
returns an error. _And_ if the CCR itself runs into a timeout we would
never have tried another path (which could have succeeded).
True. We can do one thing at a time in CCR time budget. Either wait for
CCR to succeed or give up early and try another path. It is a trade off.
'CCR failed to be sent' and 'CCR completed with error'.
The logic above treats both the same.
Yes.
I'd rather rework this loop to open-code 'issue_and_wait()' in the loop,
and only switch to the next controller if the submission of CCR failed.
Once that is done we can 'just' wait for completion, as a failure there
will be after KATO timeout anyway and any subsequent CCR would be pointless.
If I understood this correctly then we will stick with the first sctrl
that accepts the CCR command. We wait for CCR to complete and give up on
fencing ictrl if CCR operation fails or times out. Did I get this correctly?
If a CCR could be send but the controller failed to process it something
very odd is ongoing, and it's extremely questionable whether a CCR to
another controller would be succeeding. That's why I would switch to the
next available controller if we could not _send_ the CCR, but would
rather wait for KATO if CCR processing returned an error.
But the main point is that CCR is a way to _shorten_ the interval
(until KATO timeout) until we can start retrying commands.
If the controller ran into an error during CCR processing chances
are that quite some time has elapsed already, and we might as well
wait for KATO instead of retrying with yet another CCR.
Got it. I updated the code to do that.
tightly-coupled storage HA pair, hopefully, all the NVMe controllers
will be able to figure out the status of the other NVMe controllers.
However, I know of multiple systems (one of which I care about) where
the NVMe controllers may have no way of figuring out the state of some
other NVMe controllers. In that case, the log page entry indicates
that the CCR might succeed on some other NVMe controller (and in these
systems, I expect they would not be able to be particularly specific
about which one). Very little time will elapse for that to happen.
It is important for those systems to have a retry on another NVMe
controller.
If we get an NVMe status back indicating we should retry on
another controller then clearly we should be doing that.
The comment above was primarily geared for a CCR command for
which we do _not_ get a result back.
Or, put it another way: as long as we're within the KATO timeout
range we should retry the CCR command on another path.
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@xxxxxxx +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich